"beyond the data lake", matthias korn, technical consultant at datavirtuality

15
US Office: 1355 Market Street, #488 San Francisco, CA 94103 German Office: Katharinenstr. 15 04109 Leipzig, Germany Beyond the Data Lake Simplifying data integration for the modern age Matthias Korn | Technical Consultant [email protected]

Upload: dataconomy-media

Post on 16-Apr-2017

478 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

US Office:1355 Market Street, #488San Francisco, CA 94103

German Office:Katharinenstr. 1504109 Leipzig, Germany

Beyond the Data LakeSimplifying data integration for the modern age

Matthias Korn | Technical [email protected]

Page 2: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

The Challenge

Gartner 2014: “VARIETY is the biggest challenge.”

“When asked about the dimensions of data organizations strugglewith most, 49% answered variety, while 35%answered volume and 16% velocity.”

Page 3: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

Integration using the Data Warehouse

Data is integrated by copying it into a central repositoryApproach: ETL processStructure is applied in the repositoryBI users query Data Marts

Page 4: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

Why do so many DWH projects fail

Inflexible; costly modifications

Labour-intensive setup and maintenance

77% failure rate*

Slow data-to-actionable-insights (6 to 9+ months)

Page 5: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

Data Lake – getting data in is pretty easy…

Databases

Web API

Sensor Data

Server logs

Clickstream Data

Unique identifie

r provide

d

Metadata tags

provied

Original data

structure

Page 6: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

…but making sense of it is the challenge

Business User

?

Page 7: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

Approaches to data fishing

Situation improved with YARNApache Mahout, HBase, Hive, Pig and MapReduceData Marts are createdBI user‘s report tools query Data MartsWait, didn‘t they do this before already?

Page 8: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

„Transform“ just changed its position: ETL -> ELT

Data Marts have to be created by Data ScientistsBI users can‘t do new thingsNo permission conceptA lot of the stored data is never used, eating up the low storage costs

Page 9: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

The Logical Data Warehouse

Introduced by Gartner in 2012new data management architecture for analyticsUses repositories just like the EDWAdds distributed processesAdds virtualization of data sources

Page 10: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

Logical Data Warehouse (LDW)

Page 11: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

What does the Logical Data Warehouse do?

LDW knows where the data is stored instead of copying itRepositories are used for datasources that are too slowPresents all data in a single virtual databaseQuickly reacts to changes in data models of source systems

Page 12: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

Advantages of the Logical Data Warehouse

Real time data available and ready for analysisImmediately productiveLogical Data ModelPermission conceptWebservicesWrite to connected systems

Page 13: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

Example data flow in an LDW

Distributed queryBI frontend aware of all data sources - creates SQL statementPerformance optimization engine replicates data only if needed

Page 14: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

Conclusion

Logical Warehouse holds enormous promiseFlexibility and real-time access give an advantageUse Hadoop for batch jobs rather than integrationWe dataconomy!

Page 15: "Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality

US Office:1355 Market Street, #488San Francisco, CA 94103

German Office:Katharinenstr. 1504109 Leipzig, Germany

DataVirtualityThanks for your attention!

Visit our stand in the exhibition [email protected]