biwa2015-end-to-endhadoopdevelopmentusingodiobieeandoraclebigdata-150129142834-conversion-gate01.pdf
TRANSCRIPT
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
End-to-End Hadoop Development using OBIEE, ODI and Oracle Big Data Mark Rittman, CTO, Rittman Mead BIWA Summit, Jan 2015, San Francisco
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
About the Speaker
Mark Rittman, Co-Founder of Rittman Mead Oracle ACE Director, specialising in Oracle BI&DW 14 Years Experience with Oracle Technology Regular columnist for Oracle Magazine Author of two Oracle Press Oracle BI books Oracle Business Intelligence Developers Guide Oracle Exalytics Revealed Writer for Rittman Mead Blog :http://www.rittmanmead.com/blog
Email : [email protected] Twitter : @markrittman
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
About Rittman Mead
Oracle BI and DW Gold partner Winner of five UKOUG Partner of the Year awards in 2013 - including BI World leading specialist partner for technical excellence, solutions delivery and innovation in Oracle BI
Approximately 80 consultants worldwide All expert in Oracle BI and DW Offices in US (Atlanta), Europe, Australia and India Skills in broad range of supporting Oracle tools: OBIEE, OBIA ODIEE Essbase, Oracle OLAP GoldenGate Endeca
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
The Oracle Reference DW Architecture c. 2013
Centred on the Oracle RDBMS and structured data All incoming data is modelled at load point, schemas assigned, stored in RDBMS layers BI metadata layer and ability to federate at BI query stage Data storage capacity limited by RDBMS
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
and now this happened
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Todays Oracle Information Management ArchitectureActionable
Events
Event Engine Data Reservoir
Data Factory Enterprise Information Store
Reporting
Discovery Lab
Actionable Information
ActionableInsights
Input Events
Execution
Innovation
Discovery Output
Events & Data
Structured Enterprise Data
Other Data
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Todays Layered Data Warehouse Architecture
Virtu
aliz
atio
n &
Q
uery
Fed
erat
ion
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
Information Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data Science
Data Engines & Poly-structured sources
Content
Docs Web & Social Media
SMS
Structured Data Sources
Operational Data COTS Data Master & Ref. Data Streaming & BAM
Immutable raw data reservoir Raw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores to support specific discovery objectives
Project based data stored to facilitate rapid content / presentation delivery
Data Sources
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
The Oracle Data Warehousing Platform - 2014
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Combining Oracle RDBMS with Hadoop + NoSQL
High-value, high-density data goes into Oracle RDBMS Better support for fast queries, summaries, referential integrity etc
Lower-value, lower-density data goes into Hadoop + NoSQL Also provides flexible schema, more agile development
Successful next-generation BI+DW projects combine both - neither on their own is sufficient
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracles Big Data Products
Oracle Big Data Appliance Optimized hardware for Hadoop processing Cloudera Distribution incl. Hadoop Oracle Big Data Connectors, ODI etc
Oracle Big Data Connectors Oracle Big Data SQL Oracle NoSQL Database Oracle Data Integrator Oracle R Distribution OBIEE, BI Publisher and Endeca Info Discovery
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data SQL
Part of Oracle Big Data 4.0 (BDA-only) Also requires Oracle Database 12c, Oracle Exadata Database Machine
Extends Oracle Data Dictionary to cover Hive Extends Oracle SQL and SmartScan to Hadoop Extends Oracle Security Model over Hadoop Fine-grained access control Data redaction, data masking
Uses fast c-based readers where possible(vs. Hive MapReduce generation) Map Hadoop parallelism to Oracle PQ Big Data SQL engine works on top of YARN Like Spark, Tez, MR2
Exadata Storage Servers
HadoopCluster
Exadata DatabaseServer
Oracle Big Data SQL
SQL Queries
SmartScan SmartScan
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Productising the Next-Generation IM Architecture
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Still a Key Role for Data Integration, and BI ToolsFast, scaleable low-cost / flexible-schema data capture using Hadoop + NoSQL (BDA) Long-term storage of the most important downstream data - Oracle RBDMS (Exadata) Fast analysis + business-friendly interface : OBIEE, Endeca (Exalytics), RTD etc
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
OBIEE for Enterprise Analysis Across all Data Sources
Dashboards, analyses, OLAP analytics, scorecards, published reporting, mobile
Presented as an integrated business semantic model Optional mid-tier query acceleration using Oracle Exalytics In-Memory Machine
Access data from RBDMS, applications, Hadoop, OLAP, ADF BCs etc
Enterprise SemanticBusiness Model
Business PresentationLayer (Reports, Dashboards)
In-Memory Caching Layer
ApplicationSources
Hadoop /NoSQL Sources
DW / OLAP Sources
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Bringing it All Together : Oracle Data Integrator 12c
ODI provides an excellent framework for running Hadoop ETL jobs ELT approach pushes transformations down to Hadoop - leveraging power of cluster
Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation Whilst still preserving RDBMS push-down Extensible to cover Pig, Spark etc
Process orchestration Data quality / error handling Metadata and model-driven
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Typical RM Project BDA Topology
Starter BDA rack, or full rack Kerberos-secured usingincluded KDC server
Integration with corporate LDAPfor Cloudera Manager, Hue etc
Developer access through Hue,Beeline, R Studio
End-user access throughOBIEE, Endeca and other tools With final datasets usuallyexported to Exadata or Exalytics
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Working with Oracle Big Data Appliance
Dont underestimate the value of pre-integrated - massive time-saver for client No need to integrate Big Data Connectors, ODI Agent etc with HDFS, Hive etc etc
Single support route - raise SR with Oracle, they will route to Cloudera if needed Single patch process for whole cluster - OS, CDH etc etc Full access to Cloudera Enterprise features Otherwise just another CDH cluster in terms of SSH access etc We like it ;-)
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data Connectors
Oracle-licensed utilities to connect Hadoop to Oracle RBDMS Bulk-extract data from Hadoop to Oracle, or expose HDFS / Hive data as external tables Run R analysis and processing on Hadoop
Leverage Hadoop compute resources to offload ETL and other work from Oracle RBDMS Enable Oracle SQL to access and load Hadoop data
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Working with the Oracle Big Data Connectors
Oracle Loader for Hadoop, Oracle SQL Connector for HDFS - rarely used Sqoop works both way (Oracle>Hadoop, Hadoop>Oracle) and is good enough OSCH replaced by Oracle Big Data SQL for direct Oracle>Hive access
Oracle R Advanced Analytics for Hadoop has been very useful though Run MapReduce jobs from R Run R functions across Hive tables
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle R Advanced Analytics for Hadoop Key Features
Run R functions on Hive Dataframes Write MapReduce functions in R
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Data Loading into Hadoop
Default load type is real-time, streaming loads, using Apache Flume, Apache Kafka etc Batch / bulk loads only typically used to seed system
Variety of sources including web log activity, event streams Target is typically HDFS (Hive) or HBase Data typically lands in raw state Lots of files and events, need to be filtered/aggregated Typically semi-structured (JSON, logs etc) High volume, high velocity
-Which is why we use Hadoop rather thanRBDMS (speed vs. ACID trade-off)
Economics of Hadoop means its often possible toarchive all incoming data at detail level
Loading Stage
Real-Time Logs / Events
File / UnstructuredImports
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
GoldenGate for Continuous Streaming to Hadoop
Oracle GoldenGate is also an option, for streaming RDBMS transactions to Hadoop Leverages GoldenGate & HDFS / Hive Java APIs Sample Implementations on MOS Doc.ID 1586210.1 (HDFS) and 1586188.1 (Hive) Likely to be formal part of GoldenGate in future release - but usable now Can also integrate with Flume for delivery to HDFS - see MOS Doc.ID 1926867.1
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
ODI on Hadoop - Big Data Projects Discover ETL Tools
ODI provides an excellent framework for running Hadoop ETL jobs ELT approach pushes transformations down to Hadoop - leveraging power of cluster
Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation Whilst still preserving RDBMS push-down Extensible to cover Pig, Spark etc
Process orchestration Data quality / error handling Metadata and model-driven
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
ODI on Hadoop - How Well Does It Work?
Very good for set-based processing of Hadoop data (HiveQL) Can run python, R etc scripts as procedures
Brings metadata and team-based ETL development to Hadoop Process orchestration, error-handling etc Rapid innovation from the ODI Product Dev team - Spark KMs etc coming soon But requires Hadoop devs to learn ODI, or add ODI developer to the project
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data SQL and ODI12c
Hive, and MapReduce, are well suited to batch-type ETL jobs, but Not all join types are available in Hive - joins must be equality joins Any data from external Oracle RDBMS sources has to be staged in Hadoop before joining Limited set of HiveQL functions vs. Oracle SQL Oracle-based mappings have to importHive data into DB before accessing it
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Adding Big Data SQL to ODI Hadoop Mappings
Makes it easy to map Hadoop (Hive) data into Oracle-based mappings Gives Hive mappings the ability to use Oracle SQL including range of joins
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data SQL and Data Integration
Gives us the ability to easily bring in Hadoop (Hive) data into Oracle-based mappings Allows us to create Hive-based mappings that use Oracle SQL for transforms, joins Faster access to Hive data for real-time ETL scenarios Through Hive, bring NoSQL and semi-structured data access to Oracle ETL projects For our scenario - join weblog + customer data in Oracle RDBMS, no need to stage in Hive
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Custom KM : LKM Hive to Oracle (Big Data SQL)
ODI12c Big Data SQL example on BigDataLite VM uses a custom KM for Big Data SQL LKM Hive to Oracle (Big Data SQL) - KM code downloadable from java.net Allows Hive+Oracle joins by auto-creating ORACLE_HIVE extttab definition to enable Big Data SQL Hive table access
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Options for Sharing Hadoop Output with Wider Audience
During the discovery phase of a Hadoop project, audience are likely technical Most comfortable with data analyst tools, command-line, low-level access to the data
During the exploitation phase, audience will be less technical Emphasis on graphical tools, and integration with wider reporting toolset + metadata
Three main options for visualising and sharing Hadoop data 1.Coming Soon - Oracle Big Data Discovery (Endeca on Hadoop) 2.OBIEE reporting against Hadoop
direct using Hive/Impala, or Oracle Big Data SQL 3.OBIEE reporting against an export of the
Hadoop data, on Exalytics / RDBMS
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Coming Soon - Oracle Big Data Discovery
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Coming Soon : Oracle Data Enrichment Cloud Service
Cloud-based service for loading, enriching, cleansing and supplementing Hadoop data Part of the Oracle Data Integration product family Used up-stream from Big Data Discovery Aims to solve the data quality problem for Hadoop
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
New in OBIEE 11.1.1.7 : Hadoop Connectivity through Hive
MapReduce jobs are typically written in Java, but Hive can make this simpler Hive is a query environment over Hadoop/MapReduce to support SQL-like queries Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automaticallycreates MapReduce jobs against data previously loaded into the Hive HDFS tables
Approach used by ODI and OBIEE to gain access to Hadoop data Allows Hadoop data to be accessed just like any other data source
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
OBIEE 11.1.1.7 / HiveServer2 ODBC Driver Issue
Most customers using BDAs are using CDH4 or CDH5 - which uses HiveServer2 OBIEE 11.1.1.7 only ships/supports HiveServer1 ODBC drivers But OBIEE 11.1.1.7 on Windows can use the Cloudera HiveServer2 ODBC drivers which isnt supported by Oracle but works!
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Using Big Data SQL with OBIEE11g
Preferred Hadoop access path for customers with Oracle Big Data Appliance is Big Data SQL Oracle SQL Access to both relational, and Hive/NoSQL data sources Exadata-type SmartScan against Hadoop datasets
Response-time equivalent to Impala or Hive on Tez No issues around HiveQL limitations Insulates end-users around differencesbetween Oracle and Hive datasets
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Finally What Keeps the CIO Awake at Night
Security and Privacy Regulations Are we analysing and sharing data in compliance with privacy regulations?
-And if we are - would customers think our use of it is ethical? Do I know if the data in my Hadoop cluster is *really* secure?
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Hadoop Security By Default
Connections between Hadoop services, and by users to services, arent authenticated Security is fragmented : HDFS, Hive, OS user accounts, Hue, CM all separate models No single place to define security policies, groups, access rights No single tool to audit access and permissions By default, everything is open and trusted - reflects roots in academia, R&D, marketing depts
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data SQL : Single RBDMS/Hadoop Security Model
Potential to extend Oracle security model over Hadoop (Hive) data Masking / Redaction VPD FGAC
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Summary
Hadoop and Oracle Big Data Appliance are increasingly appearing in BI+DW Projects Gives DW projects the ability to store more data, cheaper and more flexibly than before Enables non-relational (SQL) query tools and analysis techniques (R, Spark etc) Extends BIs capability to report and analyze across wider data sources Maturity varies widely in terms of tool maturity, and Oracle integration with Hadoop Trend is for Oracle to productize big data, creating tools + products around Oracle BDA We are probably at early stages - but very interesting times to be an Oracle BI+DW dev!
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Thank You for Attending!
Thank you for attending this presentation, and more information can be found at http://www.rittmanmead.com
Contact us at [email protected] or [email protected] Look out for our book, Oracle Business Intelligence Developers Guide out now! Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)
-
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
End-to-End Hadoop Development using OBIEE, ODI and Oracle Big Data Mark Rittman, CTO, Rittman Mead BIWA Summit, Jan 2015, San Francisco