US Office:1355 Market Street, #488San Francisco, CA 94103
German Office:Katharinenstr. 1504109 Leipzig, Germany
Beyond the Data LakeSimplifying data integration for the modern age
Matthias Korn | Technical [email protected]
The Challenge
Gartner 2014: “VARIETY is the biggest challenge.”
“When asked about the dimensions of data organizations strugglewith most, 49% answered variety, while 35%answered volume and 16% velocity.”
Integration using the Data Warehouse
Data is integrated by copying it into a central repositoryApproach: ETL processStructure is applied in the repositoryBI users query Data Marts
Why do so many DWH projects fail
Inflexible; costly modifications
Labour-intensive setup and maintenance
77% failure rate*
Slow data-to-actionable-insights (6 to 9+ months)
Data Lake – getting data in is pretty easy…
Databases
Web API
Sensor Data
Server logs
Clickstream Data
Unique identifie
r provide
d
Metadata tags
provied
Original data
structure
…but making sense of it is the challenge
Business User
?
Approaches to data fishing
Situation improved with YARNApache Mahout, HBase, Hive, Pig and MapReduceData Marts are createdBI user‘s report tools query Data MartsWait, didn‘t they do this before already?
„Transform“ just changed its position: ETL -> ELT
Data Marts have to be created by Data ScientistsBI users can‘t do new thingsNo permission conceptA lot of the stored data is never used, eating up the low storage costs
The Logical Data Warehouse
Introduced by Gartner in 2012new data management architecture for analyticsUses repositories just like the EDWAdds distributed processesAdds virtualization of data sources
Logical Data Warehouse (LDW)
What does the Logical Data Warehouse do?
LDW knows where the data is stored instead of copying itRepositories are used for datasources that are too slowPresents all data in a single virtual databaseQuickly reacts to changes in data models of source systems
Advantages of the Logical Data Warehouse
Real time data available and ready for analysisImmediately productiveLogical Data ModelPermission conceptWebservicesWrite to connected systems
Example data flow in an LDW
Distributed queryBI frontend aware of all data sources - creates SQL statementPerformance optimization engine replicates data only if needed
Conclusion
Logical Warehouse holds enormous promiseFlexibility and real-time access give an advantageUse Hadoop for batch jobs rather than integrationWe dataconomy!
US Office:1355 Market Street, #488San Francisco, CA 94103
German Office:Katharinenstr. 1504109 Leipzig, Germany
DataVirtualityThanks for your attention!
Visit our stand in the exhibition [email protected]