mag-2011 white paper - rethinking data...

8
Rethinking Data Warehousing

Upload: lethien

Post on 28-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

 

 

 

Rethinking  Data  Warehousing      

   

|          Rethinking  Data  Warehousing      2  

Good  decisions  start  with  organized  data  

It’s  a  fact:  Better  data  analysis  results  in  better  decisions.  And,  good  

decisions  have  a  direct  impact  on  improved  business  results.  According  to  

the  Atre  Group,  80%  of  the  costs  associated  with  data  analytics  is  spent  on  

organizing  the  data  before  any  meaningful  analysis  can  be  done.      

The  high  cost  of  organizing  data  can  seem  

excessive.  If  your  organization  has  captured  

and  stored  the  data,  how  can  it  cost  so  much  

to  organize  it?    The  short  answer  is:  garbage  in,  

garbage  out.  There  are  many  essential  tasks  

involved  in  organizing  and  preparing  data  for  

analysis  that  companies  often  overlook.  The  

higher  the  potential  value  derived  from  the  analysis,  the  more  

sophisticated  the  work  required  to  structure  and  organize  the  data.  

Complex  analytics,  such  as  customer  profitability,  require  more  insight  

into  current  and  historic  data,  and  data  from  multiple  functional  areas  

within  the  company.  The  more  global  the  analysis,  such  as,  consolidated  

general  ledgers,  the  greater  the  need  for  access  to  homogenized  data  

from  a  variety  of  different  systems.    

 

80%  of  the  cost  of  analytics  is  spent  on  organizing  data      

|          Rethinking  Data  Warehousing      3  

5  tasks  to  organizing  your  data  

Research  shows  that  the  effort  and  costs  associated  with  organizing  the  

data  for  analysis  do  not  always  align  with  the  potential  analytical  value.  In  

other  words,  you  can  spend  a  lot  of  time  and  effort  organizing  data  

without  achieving  better  results  from  your  analysis.  

What’s  more  important  are  the  capabilities  designed  into  the  data  

warehouse  that  constructs  and  organizes  the  data.  Simply  put,  the  more  

intelligent  the  data  warehouse  the  less  effort  and  cost  required  to  do  high  

value  analytics.  To  better  understand  the  relationship  between  cost  and  

the  design  of  a  data  warehouse  solution  it’s  important  to  examine  the  

necessary  tasks  in  more  detail.    

Task  #1:   Identifying  the  source  data  

The  source  data  comes  in  many  different  forms  and  is  often  stored  in  

cryptic  fields  and  tables  deep  within  ERP  systems.  For  example,  the  JD  

Edwards  Business  Unit  table  is  named  F0006  with  the  Business  Unit  Field  

named  MCMCU—a  less  than  intuitive  abbreviation.  Also  problematic  are  

seemingly  easy  queries  for  things  like  “customer  ship  to”  or  “customer  

sold  to.”    These  commonly  used  queries  need  to  deal  with  data  

fragmentation  and  can  require  50  or  so  joins  of  various  data  elements.  

   

|          Rethinking  Data  Warehousing      4  

Task  #2:  Exporting  data   into  a  common  model  

Data  may  exist  on  one  system  or  multiple  disparate  systems  across  the  

company.  The  systems  may  all  be  the  same  ERP  (type  and  version)  but  

more  likely  the  data  exists  on  a  myriad  of  different  ERP  or  other  systems.  

Exporting  source  data  from  an  ERP  system  to  a  data  warehouse  is  

conceptually  simple.  But  someone  needs  to  first  find  the  data,  put  the  

data  into  a  form  that  is  more  easily  understood,  then  consolidate  the  data  

from  multiple  sources  into  a  common  data  model.  This  can  be  left  to  the  

analysts,  or  done  by  building  deep  knowledge  and  experience  into  the  

data  warehouse.  This  knowledge  will  be  unique  for  each  source  system.  

Task  #3:    Keeping  data  current  and  updated  

Moving  the  data  once  is  not  sufficient  since  the  data  is  only  current  for  an  

instant.  Periodically  updates  must  be  scheduled,  or  a  technique  must  be  

developed  to  continuously  update  the  data.    

Task  #4:   Identifying  how  data  sets  are  related  

Data  is  often  structured  by  functional  area,  such  as  sales  data,  inventory  

data,  or  financial  data.  However,  high  value  analysis,  like  customer  or  

product  profitability,  requires  analysis  across  functional  data  sets.  Proper  

customer  analysis  requires  both  detailed  current  and  historic  information.    

An  analyst  can  take  these  data  sets  into  consideration,  but  this  can  create  

bottlenecks  and  introduces  the  potential  for  error.    Alternatively,  this  

capability  to  understand  data  relationships  and  associations  can  be  

integrated  into  a  data  warehouse  design.    

|          Rethinking  Data  Warehousing      5  

Task  #5:  Standardize  reporting  

Finally,  a  single  analyst  may  believe  the  analytics  they  are  performing  are  

new  and  novel  when,  in  fact,  their  analysis  has  been  done  many  times  

before.  A  well-­‐designed  system  should  pre-­‐calculate  a  host  of  values  and  

produce  a  standard  set  of  reports  and  dashboards  to  make  incremental  

analysis  more  efficient  across  the  enterprise.    

 

What  level  of  analytics  do  you  need?  

Different  analytics  systems  take  different  approaches  in  how  to  address  

the  above  needs.  Some  BI  tools,  e.g.  Tableau  or  QlikView,  are  designed  for  

agility  and  flexibility  and  allow  for  quick  analysis.  These  solutions  are  most  

appropriate  in  department  settings  where  data  relationships,  

comprehensive  data  sets,  data  currency,  and  data  governance  are  not  the  

highest  priority.    However,  these  tools  quickly  run  out  of  steam  when  

dealing  with  broader  company  analytics.    

Most  ERP  vendors  offer  tools,  e.g.  OBIA,  that  focus  on  providing  a  subset  

of  the  data  housed  on  their  system.  These  offerings  do  not  address  the  

need  for  the  higher  value  analytics  because  of  the  limited  data  and  data  

source.    In  addition,  it  is  not  easy  to  provide  real  time  updates  or  cross-­‐

dimensional  capabilities  when  using  these  tools.  

   

|          Rethinking  Data  Warehousing      6  

Trade  offs  between  BI  Toolsets  

  BI  Tool  [1]   OBIA  [2]   RD  [3]  

Multi-­‐source  data   No   No   Yes  

Cross  dimensional  Analysis   No   No   Yes  

Reports  &  Dashboards   Yes   Yes   Yes  

Continuous  Data  Update   No   No   Yes  

Transform  Data   Yes   No   Yes  

Complete  Data  Set   No   No   Yes  

 BI  Toolsets  examples  1.  BI  tools,  e.g.  QlikView    2.  Vendor  specif ic  tools,  e.g.  OBIA    3.  Purpose  build  data  warehouse,  e.g.  RapidDecision  EDW      

Finally,  there  are  custom  data  warehouses  that  ensure  a  holistic  approach  

to  the  organizing  the  data  and  data  governance.  Clearly  any  of  the  above  

tasks  can  be  dealt  with,  since  it  is  only  a  matter  of  software,  but  the  world  

is  littered  with  companies  that  have  tried  to  take  on  this  seemingly  simple  

task  and  later  understood  the  experience  and  sophistication  required  to  

successfully  build  a  holistic  data  warehouse.  

 

|          Rethinking  Data  Warehousing      7  

RapidDecision:  A  Holistic  Approach  to  Data  Organization  &  Governance  

RapidDecision’s  innovative  design  addresses  the  need  for  agility,  flexibility  

and  the  requirement  of  integrating  data  from  multiple,  disparate  sources.  

RapidDecision  represents  a  fundamental  breakthrough  that  addresses  the  

underlying  challenges  of  previous  generations  of  data  warehouses.    

At  the  core  is  a  unified  data  model  designed  with  a  deep  understanding  of  

the  data  structures  used  by  Oracle  in  their  JD  Edwards,  PeopleSoft  and  

EBS  ERP  systems.  RapidDecision  extracts  data  from  obscure  locations  

within  the  Oracle  ERP,  transforms  the  data  into  more  easily  understood  

formats  and  populates  the  proprietary  data  model.  While  RapidDecision  

was  purpose  built  for  Oracle  ERP  systems  it  can  also  support  data  from  

other  ERP  vendors  and  non-­‐ERP  sources.  

RapidDecision  ensures  data  is  continuously  updated  with  a  patent  pending  

technique  that  mitigates  impacts  to  system  performance.  The  result  is  a  

data  warehouse  that  has  the  most  comprehensive  data  set,  100%  of  all  

historic  and  operational  transaction  data,  and  is  continuously  updated.  

The  system  pre-­‐calculates  a  vast  number  of  items  to  reduce  the  time  and  

effort  for  the  analyst,  and  creates  a  portfolio  of  reports  by  subject  area,  

such  as  sales  or  general  ledger,  or  cross  functional  department.  The  

results  also  include  metadata.  

|          Rethinking  Data  Warehousing      8  

In  the  past,  it  was  believed  that  systems  must  either  optimize  flexibility  

and  agility  by  providing  limited  subsets  of  data  or  maximize  data  

consistency  with  large  structured  data  warehouses.  An  innovative  

alternative  exists  that  provides  the  best  of  both  worlds  and  offers  new  

levels  of  intelligence  and  analytic  capabilities.