hug slides on nfs and odbc

26
Using Standard FileBased Applica4ons and SQLBased Tools with Hadoop

Upload: mapr-technologies

Post on 25-Jun-2015

539 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: HUG slides on NFS and ODBC

1  ©MapR  Technologies  

Using  Standard  File-­‐Based  Applica4ons  and  SQL-­‐Based  

Tools  with  Hadoop  

Page 2: HUG slides on NFS and ODBC

2  ©MapR  Technologies  

Who  am  I?  

§  Keys  Botzum  §  [email protected]  §  Senior  Principal  Technologist,  MapR  Technologies  

hBp://www.mapr.com/company/events/speaking/dc-­‐hug-­‐9-­‐18-­‐12  

Page 3: HUG slides on NFS and ODBC

3  ©MapR  Technologies  

The  MapR  Distribu4on  for  Apache  Hadoop  

§  The  open,  enterprise-­‐grade  distribuLon  for  Apache  Hadoop  –  Open  source  components  •  Hive,  Pig,  Cascading,  HBase,  ZooKeeper,  Oozie,  Flume,  Sqoop,  Whirr,  …  

–  Enhancements  to  make  Hadoop  more  open  and  enterprise-­‐grade  

§  Growing  fast  and  a  recognized  leader  

Page 4: HUG slides on NFS and ODBC

4  ©MapR  Technologies  

MapR  in  the  Cloud  

 §  Available  as  a  service  with  Amazon  ElasLc  MapReduce  (EMR)  –  hBp://aws.amazon.com/elasLcmapreduce/mapr  

 

§  Available  as  a  service  with  Google  Compute  Engine    

Page 5: HUG slides on NFS and ODBC

5  ©MapR  Technologies  

MapR  

Make  Hadoop  more  open  

Make  Hadoop  enterprise-­‐grade  

This  presentaLon  

•  High  Availability  •  Scalability  •  Management  tools  –  Web,  CLI,  REST  •  Data  ProtecLon  –  snapshots  &  mirroring  •  Performance  

Page 6: HUG slides on NFS and ODBC

6  ©MapR  Technologies  

Not  All  Applica4ons  Use  the  Hadoop  APIs  

ApplicaLons  and  libraries  that  use  files  and/or  SQL  •  These  are  not  legacy  

applicaLons,  they  are  valuable  applicaLons  

ApplicaLons  and  libraries  that  use  the  Hadoop  APIs    

30  years  100,000s  applicaLons  

10,000s  libraries  10s  programming  languages  

 

Page 7: HUG slides on NFS and ODBC

7  ©MapR  Technologies  

Hadoop  Needs  Industry-­‐Standard  Interfaces  

• MapReduce  and  HBase  applicaLons  • Mostly  custom-­‐built  

Hadoop  API  

•  File-­‐based  applicaLons  •  Supported  by  most  operaLng  systems  NFS  

•  SQL-­‐based  tools  •  Supported  by  most  BI  applicaLons  and  query  builders  

ODBC  

Page 8: HUG slides on NFS and ODBC

8  ©MapR  Technologies  

NFS  

Page 9: HUG slides on NFS and ODBC

9  ©MapR  Technologies  

Your  Data  is  Important  

§  HDFS-­‐based  Hadoop  distribuLons  do  not  (cannot)  properly  support  NFS  

§  Your  data  is  important,  it  drives  your  business  –  make  sure  you  can  access  it  – Why  store  your  data  in  a  system  which  cannot  be  accessed  by  95%  of  the  world’s  applicaLons  and  libraries?  

§  Access  to  HDFS  source  code  !=  access  to  your  data  

Page 10: HUG slides on NFS and ODBC

10  ©MapR  Technologies  

The  NFS  Protocol  

§  RFC  1813  

§  Very  simple  protocol  

§  Random  reads/writes  –  Read  count  bytes  from  offset  offset  of  file  file  

– Write  buffer  data  to    offset  offset  of  a  file  file  

§  HDFS  does  not  support  random  writes  so  it  cannot  support  NFS  

 

WRITE3res  NFSPROC3_WRITE(WRITE3args)  =  7;    struct  WRITE3args  {          nfs_fh3          file;          offset3          offset;          count3            count;          stable_how    stable;          opaque            data<>;  };    READ3res  NFSPROC3_READ(READ3args)  =  6;    struct  READ3args  {          nfs_fh3    file;          offset3    offset;          count3      count;  };  

Page 11: HUG slides on NFS and ODBC

11  ©MapR  Technologies  

Hadoop  Was  Designed  to  Support  Mul4ple  Storage  Layers  

HDFS  

o.a.h.hd

fs.Distrib

uted

FileSystem

 

NFS  interface  

Hadoop  FileSystem  API  

S3  

o.a.h.fs.s3n

aLve.NaL

veS3FileSystem

 

Local  File  System  

o.a.h.fs.LocalFileSystem

 

FTP  

o.a.h.fs.qp.FTPFileSystem

 

MapR  storage  layer  

com.m

apr.fs.MapRFileSystem

 

o.a.h.fs.FileSystem  Interface  MapReduce  

Page 12: HUG slides on NFS and ODBC

12  ©MapR  Technologies  

One  NFS  Gateway  

What  about  scalability  and  high  availability?  

Page 13: HUG slides on NFS and ODBC

13  ©MapR  Technologies  

Mul4ple  NFS  Gateways  

Page 14: HUG slides on NFS and ODBC

14  ©MapR  Technologies  

Mul4ple  NFS  Gateways  with  Load  Balancing  

Page 15: HUG slides on NFS and ODBC

15  ©MapR  Technologies  

Mul4ple  NFS  Gateways  with  NFS  HA  (VIPs)  

Page 16: HUG slides on NFS and ODBC

16  ©MapR  Technologies  

Customer  Examples:  Import/Export  Data  

§  Network  security  vendor  –  Network  packet  captures  from  switches  are  streamed  into  the  cluster  –  New  paBern  definiLons  are  loaded  into  online  IPS  via  NFS  

§  Online  measurement  company  –  Clickstreams  from  applicaLon  servers  are  streamed  into  the  cluster  

§  SaaS  company  –  ExporLng  a  database  to  Hadoop  over  NFS  

§  Ad  exchange  –  Bids  and  transacLons  are  streamed  into  the  cluster  

Page 17: HUG slides on NFS and ODBC

17  ©MapR  Technologies  

Customer  Examples:  Produc4vity  and  Opera4ons  

§  Retailer  –  OperaLonal  scripts  are  easier  with  NFS  than  HDFS  +  MapReduce  •  chmod/chown,  file  system  searches/greps,  perl,  awk,  tab-­‐complete  

–  Consolidate  object  store  with  analyLcs  

§  Credit  card  company  –  User  and  project  home  directories  on  Linux  gateways  •  Local  files,  scripts,  source  code,  …  •  Administrators  manage  quotas,  snapshots/backups,  …  

§  Large  Internet  company  recommendaLon  system  – Web  server  serve  MapReduce  results    (item  relaLonships)  directly  from  cluster  

§  Email  markeLng  company  –  Object  store  with  HBase  and  NFS  

Page 18: HUG slides on NFS and ODBC

18  ©MapR  Technologies  

ODBC  

Page 19: HUG slides on NFS and ODBC

19  ©MapR  Technologies  

ODBC  

§  ODBC  –  Open  DataBase  ConnecLvity  – Open  standard  API  for  accessing  a  SQL-­‐based  backend  – Developed  by  Microsoq  and  Simba  Technologies  in  1992  

§  Flagship  API  for  SQL-­‐based  BI  and  reporLng  –  Excel,  Tableau,  MicroStrategy,  Crystal  Reports,  …  

§  Advanced  ODBC  drivers  use  the  latest  3.52  specificaLon  

Page 20: HUG slides on NFS and ODBC

20  ©MapR  Technologies  

MapR  ODBC  Driver  

§  MapR  provides  a  Hive  ODBC  3.52  driver  –  Developed  in  partnership  with  ODBC  inventor  Simba  Technologies  –  Compliant  with  latest  ODBC  3.52  specificaLon  •  32-­‐  and  64-­‐bit  plavorm  support  •  Windows  and  Linux  

§  Enables  direct  SQL  access  to  MapR-­‐stored  data  by  translaLng  SQL  to  HiveQL  

§  SQLizer  enables  seamless  connecLvity  –  Provides  ANSI  SQL-­‐92  front-­‐end  –  Targeted  for  exisLng  apps  that  generate  standard  SQL  queries  –  Transforms  SQL  query  into  HiveQL  query  

Page 21: HUG slides on NFS and ODBC

21  ©MapR  Technologies  

Example:  Tableau  

Page 22: HUG slides on NFS and ODBC

22  ©MapR  Technologies  

Example:  Open  source  query  builder  (Kaimon)  

Page 23: HUG slides on NFS and ODBC

23  ©MapR  Technologies  

Example:  MicrosoW  Excel  

Page 24: HUG slides on NFS and ODBC

24  ©MapR  Technologies  

In  Summary  

§  Open  standards  are  important  §  SupporLng  exisLng  applicaLons  and  tools  that  support  those  standards  is  valuable  –  Preserves  investment  in  tools  –  Preserves  investment  in  custom  applicaLons  that  proceeded  Hadoop  –  Leverages  skills  you  already  have  

Page 25: HUG slides on NFS and ODBC

25  ©MapR  Technologies  

Join  MapR  

§  Join  the  fastest  growing  Hadoop  company  

§  Open  posiLons  in  every  discipline  –  Engineers  –  SoluLon  Architects  –  Product  Management  

§  Email  [email protected]  

Page 26: HUG slides on NFS and ODBC

26  ©MapR  Technologies  

Time  for  Ques4ons  

§  Download  slides  or  send  me  an  email  –  hBp://www.mapr.com/company/events/speaking/dc-­‐hug-­‐9-­‐18-­‐12    

§  Download  MapR  to  learn  more  – www.mapr.com/download