integrated data warehouse with hadoop and oracle database
DESCRIPTION
Use cases and advice on integrating Hadoop with the enterprise data warehouseTRANSCRIPT
![Page 1: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/1.jpg)
With Oracle Database and Hadoop
Building the Integrated Data Warehouse
Gwen Shapira, Senior Consultant
![Page 2: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/2.jpg)
© 2012 – Pythian
Why Pythian • Recognized Leader:
• Global industry leader in data infrastructure managed services and consulting with expertise in Oracle, Oracle Applications, Microsoft SQL Server, MySQL, big data and systems administration
• Work with over 200 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments
• Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. Employ 8
Oracle ACEs/ACE Directors • Hold 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata,
Oracle GoldenGate & Oracle RAC
• Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special
projects or emergency response
![Page 3: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/3.jpg)
© 2012 – Pythian
About Gwen Shapira • Oracle ACE Director • 13 Years with pager
• 7 as Oracle DBA
• Senior Consultant:
• Has MacBook, will travel.
• @gwenshap
• http://www.pythian.com/news/author/shapira/
![Page 4: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/4.jpg)
© 2012 – Pythian
Agenda • What is Big Data?
• Why do we care about Big Data?
• Why your DWH needs Hadoop?
• Examples of Hadoop in the DWH
• How to integrate Hadoop into your DWH
• Avoiding major pitfalls
![Page 5: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/5.jpg)
What is Big Data?
![Page 6: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/6.jpg)
© 2012 – Pythian
MORE DATA THAN YOU CAN HANDLE
![Page 7: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/7.jpg)
© 2012 – Pythian
MORE DATA THAN RELATIONAL DATABASES CAN HANDLE
![Page 8: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/8.jpg)
© 2012 – Pythian
MORE DATA THAN RELATIONAL DATABASES CAN HANDLE CHEAPLY
![Page 9: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/9.jpg)
© 2012 – Pythian
Data Arriving at fast Rates Typically unstructured Stored without aggregation
Analyzed in Real Time For Reasonable Cost
![Page 10: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/10.jpg)
© 2012 – Pythian
Where does Big Data come from?
• Social media • Enterprise transactional data • Consumer behaviour • Multimedia • Sensors and embedded devices • Network devices
![Page 11: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/11.jpg)
© 2012 – Pythian
Why all the Excitement?
![Page 12: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/12.jpg)
© 2012 – Pythian
Complex Data Architecture
![Page 13: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/13.jpg)
Your DWH needs Hadoop
![Page 14: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/14.jpg)
© 2012 – Pythian
Big Problems with Big Data • It is:
• Unstructured
• Unprocessed
• Un-aggregated
• Un-filtered
• Repetitive
• And generally messy.
Oh, and there is a lot of it.
![Page 15: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/15.jpg)
© 2012 – Pythian
Technical Challenges • Storage capacity
• Storage throughput
• Pipeline throughput
• Processing power
• Parallel processing
• System Integration
• Data Analysis
Scalable storage
Massive Parallel Processing
Ready to use tools
![Page 16: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/16.jpg)
© 2012 – Pythian
Hadoop Principles Bring Code to Data Share Nothing
![Page 17: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/17.jpg)
© 2012 – Pythian
Hadoop in a Nutshell
Map-Reduce: Framework for writing massively parallel jobs
HDFS: ���Replicated Distributed Big-Data File System
![Page 18: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/18.jpg)
© 2012 – Pythian
Hadoop Benefits • Reliable solution based on unreliable hardware
• Designed for large files
• Load data first, structure later
• Designed to maximize throughput of large scans
• Designed to maximize parallelism
• Designed to scale
• Flexible development platform
• Solution Ecosystem
![Page 19: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/19.jpg)
© 2012 – Pythian
Hadoop Limitations • Hadoop is scalable but not fast • Batteries not included • Instrumentation not included either • Well-known reliability limitations
![Page 20: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/20.jpg)
Use Cases and Customer Stories
Hadoop in the Data Warehouse
![Page 21: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/21.jpg)
© 2012 – Pythian
ETL for Unstructured Data
Logs Web servers, app server, clickstreams
Flume Hadoop Cleanup,
aggregation Longterm storage
DWH BI,
batch reports
![Page 22: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/22.jpg)
© 2012 – Pythian
ETL for Structured Data
OLTP Oracle, MySQL,
Informix…
Sqoop, Perl
Hadoop Transformation
aggregation Longterm storage
DWH BI,
batch reports
![Page 23: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/23.jpg)
© 2012 – Pythian
Bring the World into Your Datacenter
![Page 24: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/24.jpg)
© 2012 – Pythian
Rare Historical Report
![Page 25: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/25.jpg)
© 2012 – Pythian
Find Needle in Haystack
![Page 26: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/26.jpg)
© 2012 – Pythian
We are not doing SQL anymore
![Page 27: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/27.jpg)
Connecting the (big) Dots
![Page 28: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/28.jpg)
© 2012 – Pythian
Sqoop Queries
![Page 29: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/29.jpg)
© 2012 – Pythian
Sqoop is Flexible (for import)
• Select <columns> from <table> where <condition> • Or <write your own query>
• Split column
• Parallel
• Incremental
• File formats
![Page 30: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/30.jpg)
© 2012 – Pythian
Sqoop Import Examples
• Sqoop import -‐-‐connect jdbc:oracle:thin:@//dbserver:1521/masterdb -‐-‐username hr -‐-‐table emp -‐-‐where “start_date > ’01-‐01-‐2012’”
• Sqoop import jdbc:oracle:thin:@//dbserver:1521/masterdb -‐-‐username myuser -‐-‐table shops -‐-‐split-‐by shop_id -‐-‐num-‐mappers 16
Must be indexed or partitioned to avoid 16 full table scans
![Page 31: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/31.jpg)
© 2012 – Pythian
Less Flexible Export
• 100 row batch inserts • Commit every 100 batches
• Parallel export
• Update mode Example:
sqoop export -‐-‐connect jdbc:oracle:thin:@//dbserver:1521/masterdb -‐-‐table bar -‐-‐export-‐dir /results/bar_data
![Page 32: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/32.jpg)
© 2012 – Pythian
Fuse-DFS • Mount HDFS on Oracle server:
• sudo yum install hadoop-0.20-fuse
• hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>
• Use external tables to load data into Oracle • File Formats may vary • All ETL best practices apply
![Page 33: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/33.jpg)
© 2012 – Pythian
Oracle Loader for Hadoop • Load data from Hadoop into Oracle • Map-Reduce job inside Hadoop • Converts data types. • Partitions and sorts • Direct path loads • Reduces CPU utilization on database
![Page 34: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/34.jpg)
© 2012 – Pythian
Oracle Direct Connector to HDFS • Create external tables of files in HDFS • PREPROCESSOR HDFS_BIN_PATH:hdfs_stream • All the features of External Tables • Tested (by Oracle) as 5 times faster (GB/s) than FUSE-DFS
![Page 35: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/35.jpg)
© 2012 – Pythian
Big Data Appliance and Exadata
![Page 36: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/36.jpg)
How not to Fail
![Page 37: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/37.jpg)
© 2012 – Pythian
Data that belongs in RDBMS
![Page 38: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/38.jpg)
© 2012 – Pythian
Prepare for Migration
![Page 39: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/39.jpg)
© 2012 – Pythian
Use Hadoop Efficiently • Understand your bottlenecks:
• CPU, storage or network?
• Reduce use of temporary data:
• All data is over the network
• Written to disk in triplicate.
• Eliminate unbalanced workloads
• Offload work to RDBMS
• Fine-tune optimization with Map-Reduce
![Page 40: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/40.jpg)
© 2012 – Pythian
Your Data is NOT as BIG
as you think
![Page 41: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/41.jpg)
© 2012 – Pythian
Getting Started
• Pick a Business Problem • Acquire Data
• Use right tool for the job • Hadoop can start on the cheap
• Integrate the systems
• Analyze data • Get operational
![Page 42: Integrated Data Warehouse with Hadoop and Oracle Database](https://reader034.vdocument.in/reader034/viewer/2022042714/54c658814a795989488b4572/html5/thumbnails/42.jpg)
© 2012 – Pythian
Thank you and Q&A
http://www.pythian.com/news/
http://www.facebook.com/pages/The-Pythian-Group/163902527671
@pythian
http://www.linkedin.com/company/pythian
1-877-PYTHIAN
To contact us…
To follow us…
@pythianjobs