the dirty work -- why data must be reconciled

28
Grab some coffee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 09-May-2015

1.093 views

Category:

Technology


0 download

DESCRIPTION

The Briefing Room with Eric Kavanagh and the PSI-KORS Institute Live Webcast Nov. 12, 2013 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=7727087&rKey=66b1fa7d82868199 Let's face it -- most enterprise information systems are a mess. That's often due to grunt work which was overlooked months or years ago and had nothing to do with you, except that you inherited it. Some mistakes can be swept under the rug for a while, but sooner or later, garbage in results in very expensive garbage out. Register for this episode of the Briefing Room to hear Senior Analyst Eric Kavanagh outline a roadmap from the past into the possible futures of the information economy. He'll be briefed by Dr. Geoffrey Malafsky, Founder and Data Scientist for the PSI-KORS Institute, a new organization focused on data reconciliation. Malafsky will share his institute's methodology and explain how the process of doing the dirty work can yield tremendous benefits. Visit InsideAnalysis.com for more information

TRANSCRIPT

Page 1: The Dirty Work -- Why Data Must Be Reconciled

Grab some coffee and enjoy the pre-show banter before the top of the hour!

Page 2: The Dirty Work -- Why Data Must Be Reconciled

The Briefing Room

The Dirty Work – Why Data Must Be Reconciled

Page 3: The Dirty Work -- Why Data Must Be Reconciled

Twitter Tag: #briefr

The Briefing Room

Welcome

Host & Analyst: Eric Kavanagh

Guest: Geoffrey Malafsky

Page 4: The Dirty Work -- Why Data Must Be Reconciled

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 5: The Dirty Work -- Why Data Must Be Reconciled

Twitter Tag: #briefr

The Briefing Room

Data Reconciliation

GIGO GARBAGE

DATA

GARBAGE RESULTS

GARBAGE RESULTS

PERFECT MODEL

GARBAGE MODEL

PERFECT DATA

Garbage In Garbage Out

Page 6: The Dirty Work -- Why Data Must Be Reconciled
Page 7: The Dirty Work -- Why Data Must Be Reconciled

§ Current  data  is  disjointed  and  of  low  quality  § Variable  use  and  meaning  among  systems  even  for  “same”  data  elements  

§ Undocumented  defini=ons  and  data  mgmt  processes  § Errors  in  data  systems  § Disagreement  among  data  systems  § Lack  of  exis=ng  descrip=ons  for  key  readiness  use  cases  

§ Legacy  data  systems  have  failed  to  overcome  these  problems  despite  several  years  of  new    marts/houses/brokers/IPTs/applica=ons  

Page 8: The Dirty Work -- Why Data Must Be Reconciled

8  

1.  Wall Street Journal, CIO‘s Big Problem with Big Data, 2012-08-02 2.  Forbes, The CEO/CMO Dilemma: So Much Data, So Little Impact, 2012-07-18

“Many CIOs believe data is inexpensive because storage has become inexpensive. But data is inherently messy – it can be wrong, it can be duplicative, and it can be irrelevant – which means it requires handling, which is where the real expenses come in. ‘The cost of more data is the application and the computing power and the processes to reconcile all these things’,” "While there are a myriad of analytical tools that can be leveraged, a recent study indicated that more than 70% of CMOs feel they are underprepared to manage the explosion of data and ‘lack true insight.’ “

Page 9: The Dirty Work -- Why Data Must Be Reconciled

§ Suffix  in  source  A,  prefix  in  B,  neither  in  C  for  same  (part  number,  =tle,  …)?  § Conflict  syntac=cally  (simplest  case)  and  seman=cally  (most  difficult)  § Other  tools  &  methods  never  solve  this  because  they  deal  with  the  obstacles  independently  or  not  at  all:    Data  values  out-­‐of-­‐sync  with  metadata,  data  models  

Copyright  Phasic  Systems  Inc  2013  9  

NKY HomeSeekers Texas Different  Meanings  (Legal  and  Business  Ac=vi=es)    

1.  Create  table  –  =tle  aligned  to  business  =  Garage  2.  Create  vocabulary:  spaces.descrip=on,  spaces.na=onal,  spaces.state,  .  3.  Define  ETL  logic  4.  Merge  in  warehouse  and  process  in  virtualiza=on  layer  5.  Change  as  needed  

Page 10: The Dirty Work -- Why Data Must Be Reconciled

§ Data  Ra=onaliza=on  is  the  process  of  building  and  managing  a  con=nuously  adap=ve  data  environment  that  fuels  current  and  future  business  needs  for  decision  making  and  system  opera=ons  

§  It  ensures  data  (i.e.  not  just  metadata)  is  as  accurate,  meaningful,  and  useful  as  possible  while  con=nuously  adjus=ng  to  improve  and  add  capability  

§  It  provides  collabora=ve  management  of  data  assets,  the  designs  governing  who,  why,  and  how  of  data  ,  and  the  where,  when,  how  of  data  use  in  opera=onal  systems  

§  It  solves  the  great  challenge  of  mapping  all  source  values  to  each  target  along  the  en=re  complex  paths  of  enterprise  data  use  § Consolidated  values  when  possible  with  con=nuous  improvement  § Simplified  and  adap=ve  mapping  with  Corporate  NoSQL  

10  

Page 11: The Dirty Work -- Why Data Must Be Reconciled

Design  Ra-onaliza-on  Issues  

•  Mul=ple  data  models  •  Conflic=ng  defini=ons  •  Similar,  supposedly  similar,  opera=onally  

dis=nct  values  •  Unknown  business  logic  •  Mul=ple  ETL  mappings  

System  Ra-onaliza-on  Issues  

•  Mul=ple  database  systems  •  Conflic=ng  formats  •  Redundant  storage  •  Unsynchronized  values  •  Mul=ple  integra=on  points  

Copyright  Phasic  Systems  Inc  2013  11  

Design  Ra-onaliza-on  •  Consolidated,  adap=ve  data  models  •  Standardized  defini=ons  •  Synchronized  dis=nct  opera=onal  values  •  Managed  business  logic  •  Coordinated  ETL  mappings  

System  Ra-onaliza-on  •  Consolidated,  adap=ve  systems  •  Common,  interoperable  formats  •  Common  storage  •  Synchronized  interfaces  •  Coordinated  integra=on  

Page 12: The Dirty Work -- Why Data Must Be Reconciled

Ra=onalized  Data=Meaningful  Analysis,  Decision  Support,  Enterprise  Applica=ons  

Copyright  Phasic  Systems  Inc  2013  12  

Page 13: The Dirty Work -- Why Data Must Be Reconciled

13  

§ Example  from  DARPA  Evidence  Extrac=on  &  Link  Discovery  

§ Today’s  Situa=on:    ~10k  messages/day  from  mul=ple  sources  read  by  mul=ple  analysts  and  analyzed  in  mul=ple  manual  non-­‐integrated  tools  

§ Similar  to  Social  Network  Analysis  

Page 14: The Dirty Work -- Why Data Must Be Reconciled

Copyright  Phasic  Systems  Inc  2013  14  

Complicated  Mixture  of  Commercial,  Custom,  Legacy,  Services  Applica=ons,  Data  Stores  

Page 15: The Dirty Work -- Why Data Must Be Reconciled

15  Copyright  Phasic  Systems  Inc  2013  

Page 16: The Dirty Work -- Why Data Must Be Reconciled

16

Costs  Business  Alignment:  Goal,  Capability,  Architecture  Data  Assets:  Systems,  Owners,  Use  

Page 17: The Dirty Work -- Why Data Must Be Reconciled

Copyright  Phasic  Systems  Inc  2013  17  

Page 18: The Dirty Work -- Why Data Must Be Reconciled

The Ψ–KORS™ System Model

18

Point-select data models, codes, entities  

Copyright  Phasic  Systems  Inc  2013  

Page 19: The Dirty Work -- Why Data Must Be Reconciled

19

Corporate NoSQL™

Page 20: The Dirty Work -- Why Data Must Be Reconciled

20  Copyright  Phasic  Systems  Inc  2013  

§ DOD  CIO  § Adap=vely  blend  financial  and  program  data  from  mul=ple  sources  with  unclear,  undocumented  alignment  and  integra=on  logic  (i.e.  this  is  an  intelligence  challenge)  into  BI  tools  (QlikView,  Tableau,  PentaHo,  Excel  Web  Apps-­‐Sharepoint)  

§ Export  Development  Canada  § Ra=onalize  core  data  distributed  and  undocumented  to  feed  cross-­‐enterprise  governance  and  develop  Enterprise  Data  Model  with  seman=cally  adjudicated  canonical  en==es  

Page 21: The Dirty Work -- Why Data Must Be Reconciled

§ Challenge:  Complicated  environment  with  conflic=ng  data  values,  standards,  business  uses  cases,  and  lack  of  documenta=on.  Data  owned  by  4  major  organiza=on,  in  mul=ple  Warehouses  and  data  stores,  redundant  non-­‐reconciled  sets  of  data  

§ Requirement:  Integrated,  common,  accurate  data  to  enable  new  Integrated  workforce  planning,  training,  management  applica=on  (“Sailor  of  the  Future”)  for  1  million  people  

§ Prior  Ac-vi-es:  10+  years  of  system  integra=on,  data  warehouse,  data  governance  efforts  à  no  improvement,  poor  coordina=on  across  organiza=ons  and  systems  

21  

Page 22: The Dirty Work -- Why Data Must Be Reconciled

§ Yet,  there  were  problems  with  the  most  basic  data  fields,  which  for  the  Navy,  include  things  like    § billet  (effec=vely  a  job  but  also  includes  other  characteris=cs),    

§ rank  (similar  to  seniority  but  with  formal  rules  that  change  over  =me),    

§ ra=ng  (similar  to  voca=onal  ability  but  also  with  changing  rules),    

§ and  even  the  primary  iden=fier  of  a  person  the  Social  Security  Number  (SSN).    

22  

Page 23: The Dirty Work -- Why Data Must Be Reconciled

§ Bridge  Organiza=ons,  Processes,  Technologies  to  Data  Concepts  

23  

Page 24: The Dirty Work -- Why Data Must Be Reconciled

24  

Logical  Models  derive  directly  from  conceptual  and  use  business  terms  

Page 25: The Dirty Work -- Why Data Must Be Reconciled

•  Promulgate  key  technologies  to  help  field  overcome  major  obstacles  •  Iden=fy  cause  and  existence  of  seman=c  conflicts  •  Determine  op=ons  •  Promote  enterprise  decision  making  on  solu=on  •  Implement  solu=on  into  opera=onal  data  •  Visible  direct  line  from  governance  to  data  modeling  to  integra=on  to  database  engineering  to  analysis  and  back  again  

•  Rapid  cycle  =me:  iden=fy,  assess,  decide,  execute  con=nuously  in  natural  organiza=onal  =meline  (days/weeks)  

•  Community  version  DataStar  for  non-­‐commercial  use  •  Collabora=ve  community  communica=on  and  design  of  common,  seman=cally  clear  Corporate  NoSQL  models  

Page 26: The Dirty Work -- Why Data Must Be Reconciled

Twitter Tag: #briefr

The Briefing Room

Page 27: The Dirty Work -- Why Data Must Be Reconciled

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

November: DATA DISCOVERY & VISUALIZATION

December: INNOVATORS

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

Page 28: The Dirty Work -- Why Data Must Be Reconciled

Twitter Tag: #briefr

The Briefing Room

Thank You for Your

Attention