lecture @dhbw: data warehouse part li: frontendbuckenhofer/20182dwh/bucken... · 2018-12-04 · •...
TRANSCRIPT
A company of Daimler AG
LECTURE @DHBW: DATA WAREHOUSE
PART LI: FRONTENDANDREAS BUCKENHOFER, DAIMLER TSS
ABOUT ME
https://de.linkedin.com/in/buckenhofer
https://twitter.com/ABuckenhofer
https://www.doag.org/de/themen/datenbank/in-memory/
http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/
https://www.xing.com/profile/Andreas_Buckenhofer2
Andreas BuckenhoferSenior DB [email protected]
Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics
ANDREAS BUCKENHOFER, DAIMLER TSS GMBH
Data Warehouse / DHBWDaimler TSS 3
“Forming good abstractions and avoiding complexity is an essential part of a successful data architecture”
Data has always been my main focus during my long-time occupation in the area of data integration. I work for Daimler TSS as Database Professional and Data Architect with over 20 years of experience in Data Warehouse projects. I am working with Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things, experiment, and program every day.
I share my knowledge in internal presentations or as a speaker at international conferences. I'm regularly giving a full lecture on Data Warehousing and a seminar on modern data architectures at Baden-Wuerttemberg Cooperative State University DHBW. I also gained international experience through a two-year project in Greater London and several business trips to Asia.
I’m responsible for In-Memory DB Computing at the independent German Oracle User Group (DOAG) and was honored by Oracle as ACE Associate. I hold current certifications such as "Certified Data Vault 2.0 Practitioner (CDVP2)", "Big Data Architect“, „Oracle Database 12c Administrator Certified Professional“, “IBM InfoSphere Change Data Capture Technical Professional”, etc.
DHBWDOAG
Contact/Connect
As a 100% Daimler subsidiary, we give
100 percent, always and never less.
We love IT and pull out all the stops to
aid Daimler's development with our
expertise on its journey into the future.
Our objective: We make Daimler the
most innovative and digital mobility
company.
NOT JUST AVERAGE: OUTSTANDING.
Daimler TSS Data Warehouse / DHBW 4
INTERNAL IT PARTNER FOR DAIMLER
+ Holistic solutions according to the Daimler guidelines
+ IT strategy
+ Security
+ Architecture
+ Developing and securing know-how
+ TSS is a partner who can be trusted with sensitive data
As subsidiary: maximum added value for Daimler
+ Market closeness
+ Independence
+ Flexibility (short decision making process,
ability to react quickly)
Daimler TSS 5Data Warehouse / DHBW
Daimler TSS
LOCATIONS
Data Warehouse / DHBW
Daimler TSS China
Hub Beijing
10 employees
Daimler TSS Malaysia
Hub Kuala Lumpur
42 employeesDaimler TSS IndiaHub Bangalore22 employees
Daimler TSS Germany
7 locations
1000 employees*
Ulm (Headquarters)
Stuttgart
Berlin
Karlsruhe
* as of August 2017
6
• After the end of this lecture you will be able to
• Understand function of Frontend Tools
• Understand the necessity for Information Design
WHAT YOU WILL LEARN TODAY
Data Warehouse / DHBWDaimler TSS 7
LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 8
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging Layer(Input Layer)
OLTP
OLTP
Core Warehouse
Layer(Storage
Layer)
Mart Layer(Output Layer)
(Reporting Layer)
Integration Layer
(Cleansing Layer)
Aggregation Layer
Metadata Management
Security
DWH Manager incl. Monitor
VISUALIZATION IN THE USUAL CASE OF LIFE
Data Warehouse / DHBWDaimler TSS 9
RUSSIAN CAMPAIGN OF NAPOLEON
Data Warehouse / DHBWDaimler TSS 10
Source: https://de.wikipedia.org/wiki/Charles_Joseph_Minard
MAPPING THE 1854 LONDON CHOLERA OUTBREAK
Data Warehouse / DHBWDaimler TSS 11
Source: https://www1.udel.edu/johnmack/frec682/cholera/
MAPPING THE 1854 LONDON CHOLERA OUTBREAK
Data Warehouse / DHBWDaimler TSS 12
EXCERCISE: VISUALIZE AS MUCH AS POSSIBLE
Data Warehouse / DHBWDaimler TSS 13
Umsatz in €
2014 2015 2016
Kanada 16.000 14.000 17.000
England 8.000 9.000 8.000
Frankreich 7.000 4.000 5.000
USA 60.000 85.000 90.000
Deutschland 4.000 10.000 15.000
Australien 10.000 8.000 15.000
Umsatz 105.000 130.000 150.000
POSSIBLE SOLUTION 1
14Data Warehouse / DHBWDaimler TSS
POSSIBLE SOLUTION 2
15Data Warehouse / DHBWDaimler TSS
• Reporting (Standard, ad-hoc)
• OLAP
• Dashboards, Scorecards
• Advanced Analytics / Data Mining / Text Mining
• Search & Discovery
INTERFACE TO THE END USER
Data Warehouse / DHBWDaimler TSS 16
Standard Reports
• Prepared static reports that can be executed at request by end users
• Are executed at the end of an ETL process and e.g. send by email to end users
• Normally based on fact tables and its dimensions
• Reports are often lists similar to Excel-Sheets but can also contain graphics (e.g. line charts)
Ad-hoc Reports
• End users create their own reports („Self service“)
REPORTING (STANDARD, AD-HOC)
Data Warehouse / DHBWDaimler TSS 17
ROLAP / MOLAP Client Frontend
• Prepared cubes (multidimensional or relational fact tables)
• User can perform interactive analysis of data
• Rollup / drill-down
• Pivot
• Slicing
• Dicing
OLAP
Data Warehouse / DHBWDaimler TSS 18
„Progress reports“
Provide an overall view of KPIs (Key Performance Indicators)
Combination of several elements from Reporting and/or OLAP (e.g. line charts) into an overall view (like a „cockpit“)
Dashboard is more focused on operational goals
• High-level overview what is happening
Scorecard is more focused on strategic goals
• Plan a strategy and identify why something happens
DASHBOARDS, SCORECARDS
Data Warehouse / DHBWDaimler TSS 19
Software is eating the world
Machine learning will eat software• For many tasks, it’s easier to collect the data than to explicitly write
the program, e.g. face recognition or chess/go
• On the other hand, data collection isn’t always easy, e.g. billing SW
ADVANCED ANALYTICS / DATA MINING / TEXT MININGTHE FUTURE OF SOFTWARE DEVELOPMENT?
Data Warehouse / DHBWDaimler TSS 20
Source: https://www.oreilly.com/ideas/what-machine-learning-means-for-software-development
MACHINE LEARNING WILL CHANGE SW DEVELOPMENT
Data Warehouse / DHBWDaimler TSS 21
Source: https://www.oreilly.com/ideas/what-machine-learning-means-for-software-development Source: https://twitter.com/DynamicWebPaige/status/915326707107844097
• Google’s Jeff Dean has reported that 500 lines of TensorFlow code has replaced 500,000 lines of code in Google Translate
• Don’t understate the difficulty of training a neural network of any complexity, but neither should we underestimate the problem of managing and debugging a gigantic codebase
• The developer has to become a teacher, a curator of training data, and an analyst of results
Not just numerical data
Analysis of new data types gets more and more important
• Text
• GPS coordinates
• Pictures
• Videos
Data can be available in RDBMS (e.g. text modules/indexes available), Hadoop or SQL DBs
SEARCH & DISCOVERY
Data Warehouse / DHBWDaimler TSS 22
MANY GRAPHICAL ELEMENTS TO USE IN REPORTS
Data Warehouse / DHBWDaimler TSS 23
Source: https://github.com/d3/d3/wiki/Gallery
MANY GRAPHICAL ELEMENTS … CHAMBER OF HORROR
Data Warehouse / DHBWDaimler TSS 24
Source: Hichert / Faisst, http://www.backup-page.hichert.com/
DO YOU USE 3D USUALLY ?
Data Warehouse / DHBWDaimler TSS 25
Some remarks about previous slide
• 3D elements introduce clutter and give not more information
• Pie chart most often does not make sense
• Line chart barely readable
• Labels are placed outside of the graphic
• Tachometer costs a lot of space and show
• Too much color in general
• Color without meaning, e.g. red should be used for alarms / errors
MANY GRAPHICAL ELEMENTS … CHAMBER OF HORROR
Data Warehouse / DHBWDaimler TSS 26
DID YOU KNOW? PIZZA IS A REAL-TIME CHART OF HOWMUCH PIZZA IS LEFT
Data Warehouse / DHBWDaimler TSS 27
STORY TELLING WITH APPROPRIATE VISUALIZATION
Famous example by Hans Rosling (watch 3:08 onwards)
https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?language=de
Data Warehouse / DHBWDaimler TSS 28
Information design is the practice of presenting information in a way that fosters efficient and effective understanding of it.(source: Wikipedia, https://en.wikipedia.org/wiki/Information_design )
Some authors are well known for their criticism of many graphical representations - they provide rules for good information design
• Edward Tufte
• Stephen Few
• Rolf Hichert
INFORMATION DESIGN
Data Warehouse / DHBWDaimler TSS 29
WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?
30Data Warehouse / DHBWDaimler TSS
WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?EYE TRACKING
31Data Warehouse / DHBWDaimler TSS
WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?IMPROVED VERSION
32Data Warehouse / DHBWDaimler TSS
WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?EYE TRACKING
33Data Warehouse / DHBWDaimler TSS
Define standards, e.g.
• use always the same colors and with care, e.g.
• red = negative
• green = positive
• pie charts are rarely useful and should be avoided
• better use bar chart or line chart
• No 3D elements as these elements don’t enhance information but introduce clutter
• Standardize abbreviations, e.g. PY = previous year
INFORMATION DESIGNREDUCE TO THE ESENTIALS
Data Warehouse / DHBWDaimler TSS 34
EYE-TRACKING - BEFORE AND AFTER
35Data Warehouse / DHBWDaimler TSS
TABLE WITH INTEGRATED BAR CHARTS
Data Warehouse / DHBWDaimler TSS 36
Source: Hichert, http://www.hichert.com/de/resource/table-template-02/
Consumers / BI Users
• use reports, OLAP and dashboards to obtain information
Power Users
• Use reports , OLAP and dashboards to obtain information
• Create new reports and dashboards
Data Scientists
• Statistical / mathematical geeks
• Analyze / explore data
• Need to analyze raw (non-cleansed, non-transformed) data
BI END USER ROLES
Data Warehouse / DHBWDaimler TSS 37
SUMMARYINFORMATION IS BEAUTIFUL
Data Warehouse / DHBWDaimler TSS 38
Source: https://www.youtube.com/watch?v=hOex1iU57iw
Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99
[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle
Data Warehouse / DHBWDaimler TSS 39
THANK YOU
• Machine learning will no doubt change software development in significant ways
• Software developers will put much more effort into data collection and preparation
• Developers will have to do more than just collect data; they’ll have to build data pipelines and the infrastructure to manage those pipelines. We’ve called this “data engineering”
• Data engineers will be responsible for maintaining the data pipeline: ingesting data, cleaning data, feature engineering, and model discovery
LEARN HOW TO REPLACE CODE
Data Warehouse / DHBWDaimler TSS 40
Source: https://www.oreilly.com/ideas/what-machine-learning-means-for-software-development