hug slides on nfs and odbc

Post on 25-Jun-2015

539 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1  ©MapR  Technologies  

Using  Standard  File-­‐Based  Applica4ons  and  SQL-­‐Based  

Tools  with  Hadoop  

2  ©MapR  Technologies  

Who  am  I?  

§  Keys  Botzum  §  kbotzum@maprtech.com  §  Senior  Principal  Technologist,  MapR  Technologies  

hBp://www.mapr.com/company/events/speaking/dc-­‐hug-­‐9-­‐18-­‐12  

3  ©MapR  Technologies  

The  MapR  Distribu4on  for  Apache  Hadoop  

§  The  open,  enterprise-­‐grade  distribuLon  for  Apache  Hadoop  –  Open  source  components  •  Hive,  Pig,  Cascading,  HBase,  ZooKeeper,  Oozie,  Flume,  Sqoop,  Whirr,  …  

–  Enhancements  to  make  Hadoop  more  open  and  enterprise-­‐grade  

§  Growing  fast  and  a  recognized  leader  

4  ©MapR  Technologies  

MapR  in  the  Cloud  

 §  Available  as  a  service  with  Amazon  ElasLc  MapReduce  (EMR)  –  hBp://aws.amazon.com/elasLcmapreduce/mapr  

 

§  Available  as  a  service  with  Google  Compute  Engine    

5  ©MapR  Technologies  

MapR  

Make  Hadoop  more  open  

Make  Hadoop  enterprise-­‐grade  

This  presentaLon  

•  High  Availability  •  Scalability  •  Management  tools  –  Web,  CLI,  REST  •  Data  ProtecLon  –  snapshots  &  mirroring  •  Performance  

6  ©MapR  Technologies  

Not  All  Applica4ons  Use  the  Hadoop  APIs  

ApplicaLons  and  libraries  that  use  files  and/or  SQL  •  These  are  not  legacy  

applicaLons,  they  are  valuable  applicaLons  

ApplicaLons  and  libraries  that  use  the  Hadoop  APIs    

30  years  100,000s  applicaLons  

10,000s  libraries  10s  programming  languages  

 

7  ©MapR  Technologies  

Hadoop  Needs  Industry-­‐Standard  Interfaces  

• MapReduce  and  HBase  applicaLons  • Mostly  custom-­‐built  

Hadoop  API  

•  File-­‐based  applicaLons  •  Supported  by  most  operaLng  systems  NFS  

•  SQL-­‐based  tools  •  Supported  by  most  BI  applicaLons  and  query  builders  

ODBC  

8  ©MapR  Technologies  

NFS  

9  ©MapR  Technologies  

Your  Data  is  Important  

§  HDFS-­‐based  Hadoop  distribuLons  do  not  (cannot)  properly  support  NFS  

§  Your  data  is  important,  it  drives  your  business  –  make  sure  you  can  access  it  – Why  store  your  data  in  a  system  which  cannot  be  accessed  by  95%  of  the  world’s  applicaLons  and  libraries?  

§  Access  to  HDFS  source  code  !=  access  to  your  data  

10  ©MapR  Technologies  

The  NFS  Protocol  

§  RFC  1813  

§  Very  simple  protocol  

§  Random  reads/writes  –  Read  count  bytes  from  offset  offset  of  file  file  

– Write  buffer  data  to    offset  offset  of  a  file  file  

§  HDFS  does  not  support  random  writes  so  it  cannot  support  NFS  

 

WRITE3res  NFSPROC3_WRITE(WRITE3args)  =  7;    struct  WRITE3args  {          nfs_fh3          file;          offset3          offset;          count3            count;          stable_how    stable;          opaque            data<>;  };    READ3res  NFSPROC3_READ(READ3args)  =  6;    struct  READ3args  {          nfs_fh3    file;          offset3    offset;          count3      count;  };  

11  ©MapR  Technologies  

Hadoop  Was  Designed  to  Support  Mul4ple  Storage  Layers  

HDFS  

o.a.h.hd

fs.Distrib

uted

FileSystem

 

NFS  interface  

Hadoop  FileSystem  API  

S3  

o.a.h.fs.s3n

aLve.NaL

veS3FileSystem

 

Local  File  System  

o.a.h.fs.LocalFileSystem

 

FTP  

o.a.h.fs.qp.FTPFileSystem

 

MapR  storage  layer  

com.m

apr.fs.MapRFileSystem

 

o.a.h.fs.FileSystem  Interface  MapReduce  

12  ©MapR  Technologies  

One  NFS  Gateway  

What  about  scalability  and  high  availability?  

13  ©MapR  Technologies  

Mul4ple  NFS  Gateways  

14  ©MapR  Technologies  

Mul4ple  NFS  Gateways  with  Load  Balancing  

15  ©MapR  Technologies  

Mul4ple  NFS  Gateways  with  NFS  HA  (VIPs)  

16  ©MapR  Technologies  

Customer  Examples:  Import/Export  Data  

§  Network  security  vendor  –  Network  packet  captures  from  switches  are  streamed  into  the  cluster  –  New  paBern  definiLons  are  loaded  into  online  IPS  via  NFS  

§  Online  measurement  company  –  Clickstreams  from  applicaLon  servers  are  streamed  into  the  cluster  

§  SaaS  company  –  ExporLng  a  database  to  Hadoop  over  NFS  

§  Ad  exchange  –  Bids  and  transacLons  are  streamed  into  the  cluster  

17  ©MapR  Technologies  

Customer  Examples:  Produc4vity  and  Opera4ons  

§  Retailer  –  OperaLonal  scripts  are  easier  with  NFS  than  HDFS  +  MapReduce  •  chmod/chown,  file  system  searches/greps,  perl,  awk,  tab-­‐complete  

–  Consolidate  object  store  with  analyLcs  

§  Credit  card  company  –  User  and  project  home  directories  on  Linux  gateways  •  Local  files,  scripts,  source  code,  …  •  Administrators  manage  quotas,  snapshots/backups,  …  

§  Large  Internet  company  recommendaLon  system  – Web  server  serve  MapReduce  results    (item  relaLonships)  directly  from  cluster  

§  Email  markeLng  company  –  Object  store  with  HBase  and  NFS  

18  ©MapR  Technologies  

ODBC  

19  ©MapR  Technologies  

ODBC  

§  ODBC  –  Open  DataBase  ConnecLvity  – Open  standard  API  for  accessing  a  SQL-­‐based  backend  – Developed  by  Microsoq  and  Simba  Technologies  in  1992  

§  Flagship  API  for  SQL-­‐based  BI  and  reporLng  –  Excel,  Tableau,  MicroStrategy,  Crystal  Reports,  …  

§  Advanced  ODBC  drivers  use  the  latest  3.52  specificaLon  

20  ©MapR  Technologies  

MapR  ODBC  Driver  

§  MapR  provides  a  Hive  ODBC  3.52  driver  –  Developed  in  partnership  with  ODBC  inventor  Simba  Technologies  –  Compliant  with  latest  ODBC  3.52  specificaLon  •  32-­‐  and  64-­‐bit  plavorm  support  •  Windows  and  Linux  

§  Enables  direct  SQL  access  to  MapR-­‐stored  data  by  translaLng  SQL  to  HiveQL  

§  SQLizer  enables  seamless  connecLvity  –  Provides  ANSI  SQL-­‐92  front-­‐end  –  Targeted  for  exisLng  apps  that  generate  standard  SQL  queries  –  Transforms  SQL  query  into  HiveQL  query  

21  ©MapR  Technologies  

Example:  Tableau  

22  ©MapR  Technologies  

Example:  Open  source  query  builder  (Kaimon)  

23  ©MapR  Technologies  

Example:  MicrosoW  Excel  

24  ©MapR  Technologies  

In  Summary  

§  Open  standards  are  important  §  SupporLng  exisLng  applicaLons  and  tools  that  support  those  standards  is  valuable  –  Preserves  investment  in  tools  –  Preserves  investment  in  custom  applicaLons  that  proceeded  Hadoop  –  Leverages  skills  you  already  have  

25  ©MapR  Technologies  

Join  MapR  

§  Join  the  fastest  growing  Hadoop  company  

§  Open  posiLons  in  every  discipline  –  Engineers  –  SoluLon  Architects  –  Product  Management  

§  Email  jobs@mapr.com  

26  ©MapR  Technologies  

Time  for  Ques4ons  

§  Download  slides  or  send  me  an  email  –  hBp://www.mapr.com/company/events/speaking/dc-­‐hug-­‐9-­‐18-­‐12    

§  Download  MapR  to  learn  more  – www.mapr.com/download  

top related