game changed – how hadoop is reinventing enterprise thinking

35
Grab some coffee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 01-Jul-2015

64 views

Category:

Technology


0 download

DESCRIPTION

The Briefing Room with Dr. Robin Bloor and RedPoint Global Live Webcast on April 8, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=cfa1bffdd62dc6677fa225bdffe4a0b9 The innovation curve often arcs slowly before picking up speed. Companies that harness a major transformation early in the game can make serious headway before challengers enter the picture. The world of Hadoop features several of these upstarts, each of which uses the open-source foundation as an engine to drive vastly greater performance to a wide range of services, and even create new ones. Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how the Hadoop engine is being used to architect a new generation of enterprise applications. He’ll be briefed by George Corugedo, RedPoint Global CTO and Co-founder, who will showcase how enterprises can cost-effectively take advantage of the scalability, processing power and lower costs that Hadoop 2.0/YARN applications offer by eliminating the long-term expense of hiring MapReduce programmers. Visit InsideAnlaysis.com for more information.

TRANSCRIPT

Page 1: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Grab some coffee and enjoy the pre-show banter before the top of the hour!

Page 2: Game Changed – How Hadoop is Reinventing Enterprise Thinking

The Briefing Room

Game Changed: How Hadoop is Reinventing Enterprise Thinking

Page 3: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Page 4: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 5: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

Topics

This Month: BIG DATA

May: DATABASE

June: ANALYTICS & MACHINE LEARNING

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

Page 6: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

Big Data

Page 7: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor

Page 8: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

RedPoint Global

! RedPoint Global is a data management and integrated marketing technology company

!   Its Convergent Marketing Platform™ offers products designed for data management, collaboration and architecture integration.

! RedPoint Data Management for Hadoop is YARN-compliant and enables analysts to access and manipulate data directly within the Hadoop cluster.

Page 9: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

Guest: George Corugedo

George Corugedo is Chief Technology Officer & Co-Founder at RedPoint Global Inc. A mathematician and seasoned technology executive, George has over 20 years of business and technical expertise. As co-founder and CTO of RedPoint Global, George is responsible for leading the development of the RedPoint Convergent Marketing Platform™. A former math professor, George left academia to co-found Accenture’s Customer Insight Practice, which specialized in strategic data utilization, analytics and customer strategy. Previous positions include director of client delivery at ClarityBlue, Inc., a provider of hosted customer intelligence solutions to enterprise commercial entities, and COO/CIO of Riscuity, a receivables management company specializing in the utilization of analytics to drive collections.

Page 10: Game Changed – How Hadoop is Reinventing Enterprise Thinking

RedPoint  Overview  for  Bloor  Group  

Page 11: Game Changed – How Hadoop is Reinventing Enterprise Thinking

11 RedPoint Global Inc. 8 April 2014 © Confidential

Overview  -­‐    What  is  Hadoop/Hadoop  2.0  

Hadoop  1.0  •  All  opera?ons  based  on  Map  Reduce  

•  Intrinsic  inconsistency  of  code  based  solu?ons  

•  Highly  skilled  and  expensive  resources  needed  

•  3rd  party  applica?ons  constrained  by  the  need  to  generate  code  

Lower cost scaling

No need for structure

Ease of data capture

Hadoop  2.0  •  Introduc?on  of  the  YARN:                                                        

“a  general-­‐purpose,  distributed,  applica?on  management  framework  that  supersedes  the  classic  Apache  Hadoop  MapReduce  framework  for  processing  data  in  Hadoop  clusters.”  

•  Mature  applica?ons  can  now  operate  directly  on  Hadoop  

•  Reduce  skill  requirements  and  increased  consistency  

Page 12: Game Changed – How Hadoop is Reinventing Enterprise Thinking

12 RedPoint Global Inc. 8 April 2014 © Confidential

Overview  –  Challenges  to  Adop?on  

•  Severe  shortage  of  MR  skilled  resources  •  Very  expensive  resources  and  hard  to  retain  •  Inconsistent  skills  lead  to  inconsistent  results  •  Under  u?lizes  exis?ng  resources  •  Prevents  broad  leverage  of  investments  across  enterprise  

Skills  Gap  

•  A  nascent  technology  ecosystem  around  Hadoop  •  Emerging  technologies  only  address  narrow  slivers  of  func?onality  •  New  applica?ons  are  not  enterprise  class  •  Legacy  applica?ons  have  built  short  term  capabili?es  

Maturity  &  Governance  

•  Data  is  not  useful  in  its  raw  state,  it  must  be  turned  into  informa?on  •  Benefit  of  Hadoop  is  that  same  data  can  be  used  from  many  perspec?ves  •  Analysts  must  now  do  the  structuring  of  the  data  based  on  intended  use  of  the  data  

Data  Into  Informa?on  

Page 13: Game Changed – How Hadoop is Reinventing Enterprise Thinking

13 RedPoint Global Inc. 8 April 2014 © Confidential

How  RedPoint  Achieves  this  

First  YARN  compliant  ETL/data  quality    toolset  on  the  market  –  brings  together    both  Big  Data  and  tradiGonal  data  to  create    Big  InformaGon!  

•  Customer  or  Party  Data  

•  Processing  Speed  •  Match  Quality  

•  Ease  of  Use  

by                                            in:  RANKED  

#1   The  power  to  make  your  data  the  biggest  asset  your  organiza?on  has  

Page 14: Game Changed – How Hadoop is Reinventing Enterprise Thinking

14 RedPoint Global Inc. 8 April 2014 © Confidential

Key  features  of  RedPoint  Data  Management  

Master  Key  Management  

ETL  &  ELT   Data  Quality  

Web  Services  Integra?on  

Integra?on  &  Matching  

Process  Automa?on    &  Opera?ons  

• Profiling,  reads/writes,  transforma?ons  •  Single  project  for  all  jobs  

• Cleanse  data  • Parsing,  correc?on  • Geo-­‐spa?al  analysis  

• Grouping  •  Fuzzy  match  

• Create  keys  • Track  changes  • Maintain  matches    over  ?me  

• Consume  and  publish  • HTTP/HTTPS  protocols  • XML/JSON/SOAP  formats  

•  Job  scheduling,  monitoring,  no?fica?ons  • Central  point  of  control  

All  func(ons  can  be  used    on  both    TRADITIONAL  and    BIG  DATA  

Creates  clean,  integrated,  ac/onable  data  –  quickly,  reliably  and  at  low  cost  

Page 15: Game Changed – How Hadoop is Reinventing Enterprise Thinking

15 RedPoint Global Inc. 8 April 2014 © Confidential

Spotlight  on  RedPoint  Data  Management  for  Hadoop  

For  data  management  in  Hadoop:  

• Easy-­‐to-­‐use  interface  • Leverages  exis?ng  skills  • Executes  in  Hadoop  2.0  (using  YARN  architecture)  • Fast  –  no  MapReduce  • Can  combine  Big  Data    with  tradi?onal  data  • Data  becomes  ac?onable  by  RedPoint  Interac?on  

WITH  REDPOINT  

the  only  pure  YARN  data  management  pla?orm  

Makes  Hadoop  data  management  easy,  fast,  low-­‐cost.  Makes  Big  Data  clean,  integrated,  usable.  

You  get  more  out  of  your  Big  Data  investment.  

Use  MapReduce  x complex  x requires  new  skills  x  inefficient  execu?on  

Move  data  out  of  Hadoop  x extra  ?me  and  effort  x extra  storage  (expensive)  x defeats  the  purpose  of  Hadoop  

PREVIOUS  OPTIONS  

Page 16: Game Changed – How Hadoop is Reinventing Enterprise Thinking

16 RedPoint Global Inc. 8 April 2014 © Confidential

Data  Management  on  Hadoop  

Par??oning  AM  /  Tasks  

Execu?on  AM  /  Tasks   Data  I/O   Key  /  Split  

Analysis  

Parallel  Sec?on  

Par??on  Data  server  

YARN   HDFS/MapReduce  

Page 17: Game Changed – How Hadoop is Reinventing Enterprise Thinking

17 RedPoint Global Inc. 8 April 2014 © Confidential

Resource  Manager  

Launches  Tasks  

Node  Manager  

DM  App  Master  

DM  Task  

Node  Manager  

DM  Task  

DM  Task  

Node  Manager  

DM  Task  

DM  Task  

Launches  DM  App  Master  

Data  Management  Designer  

DM  ExecuGon  

Server  

Parallel  Sec?on  

Running  DM  Task  

12

3

RedPoint  DM  for  Hadoop:  Processing  Flow  

Page 18: Game Changed – How Hadoop is Reinventing Enterprise Thinking

18 RedPoint Global Inc. 8 April 2014 © Confidential

The  Data  Management  designer  

Page 19: Game Changed – How Hadoop is Reinventing Enterprise Thinking

19 RedPoint Global Inc. 8 April 2014 © Confidential

DM  Parallel  Sec?on  on  Hadoop  

Page 20: Game Changed – How Hadoop is Reinventing Enterprise Thinking

20 RedPoint Global Inc. 8 April 2014 © Confidential

DM  Hadoop  Sehngs  

Page 21: Game Changed – How Hadoop is Reinventing Enterprise Thinking

21 RedPoint Global Inc. 8 April 2014 © Confidential

RedPoint  

Benchmarks  –  Project  Gutenberg  

Map  Reduce   Pig  

Sample  MapReduce  (small  subset  of  the  entire  code  which  totals  nearly  150  lines):  public  static  class  MapClass extends  Mapper<WordOffset, Text, Text, IntWritable> {   private  final  static  String delimiters = "',./<>?;:\"[]{}-=_+()&*%^#$!@`~ \\|«»¡¢£¤¥¦©¬®¯±¶·¿";   private  final  static  IntWritable one = new  IntWritable(1);   private  Text word = new  Text();   public  void  map(WordOffset key, Text value, Context context) throws  IOException, InterruptedException { String line = value.toString();   StringTokenizer itr = new  StringTokenizer(line, delimiters);   while  (itr.hasMoreTokens()) {   word.set(itr.nextToken());   context.write(word, one);   }   }  }    

Sample  Pig  script  without  the  UDF:  SET  pig.maxCombinedSplitSize 67108864  SET  pig.splitCombination true  A = LOAD  '/testdata/pg/*/*/*';  B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS  word;  C = FOREACH B GENERATE UPPER(word) AS  word;  D = GROUP  C BY  word;  E = FOREACH D GENERATE COUNT(C) AS  occurrences, group;  F = ORDER  E BY  occurrences DESC;  STORE F INTO  '/user/cleonardi/pg/pig-count';

>150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code

6 hours of development 3 hours of development 15 min. of development

3 hours runtime 15 minutes runtime 3 minutes runtime

Extensive optimization needed

User Defined Functions required prior to running script

No tuning or optimization required

Page 22: Game Changed – How Hadoop is Reinventing Enterprise Thinking

22 RedPoint Global Inc. 8 April 2014 © Confidential

Who  Should  Care  

! Companies  interested  in  exploring  the  promise  of  Big  Data  Analy?cs  and  need  an  easy  way  to  get  started.  

 ! Companies  already  inves?ng  heavily  inves?ng  in  Big  Data  Analy?cs  technologies  but  are  stuck  due  to  the  shortage  of  skilled  resources  

! Large  organiza?ons  that  are  focused  on  “Opera?onal  Offloading”  and  need  to  achieve  it  cost  effec?vely  

! Companies  who  recognize  that  much  of  the  data  that  lands  in  Hadoop  is  external  to  the  organiza?on  and  need  to  have  Data  Quality  and  proper  data  governance  applied  to  their  Hadoop  data.  

Page 23: Game Changed – How Hadoop is Reinventing Enterprise Thinking

23 RedPoint Global Inc. 8 April 2014 © Confidential

Why  RedPoint  

! Directly  overcomes  the  Hadoop  skills  gap  ! Reduced  TCO  because  exis?ng  resources  can  be  leveraged  ! Increased  produc?vity  and  consistency  of  solu?ons  ! Only  pure  YARN  Data  Quality  applica?on  on  the  market  ! Delivers  enterprise  grade  data  quality  and  governance  into  the  Hadoop  cluster  

Page 24: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

Page 25: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Where Is That Elephant Going?

Robin Bloor, Ph.D.

Page 26: Game Changed – How Hadoop is Reinventing Enterprise Thinking

The Key-Value Store is Back!

u  General purpose key-value stores used to be called ISAM files

u  They were available on Mainframes (VSAM) and DEC VAX (RMS) and other minicomputers

u  But not on Unix or Windows or Linux

u  Well now they’re back, and they’re scalable

WHAT DID WE LIKE ABOUT THEM?

Page 27: Game Changed – How Hadoop is Reinventing Enterprise Thinking

The Open Source Landscape

u  Hadoop + components • The data reservoir • The archive store • The analytics sandbox

u  Machine Learning Algorithms • Raw power

u  The R Language • Over 1 million users

These are COMPONENTS of a solution

Page 28: Game Changed – How Hadoop is Reinventing Enterprise Thinking

A Process Not an Activity

u  Data Analytics is a multi-disciplinary end-to-end process

u  Until recently it was a walled-garden, but the walls were torn down by… • Data availability •  Scalable technology • Open source tool

u  Hadoop has a role here

Page 29: Game Changed – How Hadoop is Reinventing Enterprise Thinking

The Evolution of Hadoop

u There were many components before YARN and Tez

u  But YARN and Tez have changed the picture

u MapReduce is now an option

u Most likely Hadoop will become the default scale out file system and the OS for data flow

Page 30: Game Changed – How Hadoop is Reinventing Enterprise Thinking

The Hadoop Ecosystem

u  Even though it may not seem so, Hadoop is in its infancy

u  Hadoop’s popularity guarantees its future

u  Its future is also guaranteed by its commercial ecosystem

u  That’s the Open Source Way

Page 31: Game Changed – How Hadoop is Reinventing Enterprise Thinking

u  Do you see Hadoop as a replacement for the data warehouse?

u  Which specific components of the Hadoop ecosystem do you always (or nearly always) employ?

u  Which other technologies/products do you integrate with?

u  How does a RedPoint engagement normally pan out?

Page 32: Game Changed – How Hadoop is Reinventing Enterprise Thinking

u  What do you see as the natural business applications for Hadoop (and its ecosystem)?

u  Do you think there any natural industry specific (i.e., vertical) applications?

u  Which companies/technologies do you see as competitive with RedPoint

Page 33: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

Page 34: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA

May: DATABASE

June: ANALYTICS & MACHINE LEARNING

Page 35: Game Changed – How Hadoop is Reinventing Enterprise Thinking

Twitter Tag: #briefr

The Briefing Room

THANK YOU for your

ATTENTION!