lecture @dhbw: data warehouse part li: frontendbuckenhofer/20182dwh/bucken... · 2018-12-04 · •...

40
A company of Daimler AG LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTEND ANDREAS BUCKENHOFER, DAIMLER TSS

Upload: others

Post on 03-Jun-2020

18 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

A company of Daimler AG

LECTURE @DHBW: DATA WAREHOUSE

PART LI: FRONTENDANDREAS BUCKENHOFER, DAIMLER TSS

Page 2: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

ABOUT ME

https://de.linkedin.com/in/buckenhofer

https://twitter.com/ABuckenhofer

https://www.doag.org/de/themen/datenbank/in-memory/

http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/

https://www.xing.com/profile/Andreas_Buckenhofer2

Andreas BuckenhoferSenior DB [email protected]

Since 2009 at Daimler TSS Department: Big Data Business Unit: Analytics

Page 3: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

ANDREAS BUCKENHOFER, DAIMLER TSS GMBH

Data Warehouse / DHBWDaimler TSS 3

“Forming good abstractions and avoiding complexity is an essential part of a successful data architecture”

Data has always been my main focus during my long-time occupation in the area of data integration. I work for Daimler TSS as Database Professional and Data Architect with over 20 years of experience in Data Warehouse projects. I am working with Hadoop and NoSQL since 2013. I keep my knowledge up-to-date - and I learn new things, experiment, and program every day.

I share my knowledge in internal presentations or as a speaker at international conferences. I'm regularly giving a full lecture on Data Warehousing and a seminar on modern data architectures at Baden-Wuerttemberg Cooperative State University DHBW. I also gained international experience through a two-year project in Greater London and several business trips to Asia.

I’m responsible for In-Memory DB Computing at the independent German Oracle User Group (DOAG) and was honored by Oracle as ACE Associate. I hold current certifications such as "Certified Data Vault 2.0 Practitioner (CDVP2)", "Big Data Architect“, „Oracle Database 12c Administrator Certified Professional“, “IBM InfoSphere Change Data Capture Technical Professional”, etc.

DHBWDOAG

xing

Contact/Connect

Page 4: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

As a 100% Daimler subsidiary, we give

100 percent, always and never less.

We love IT and pull out all the stops to

aid Daimler's development with our

expertise on its journey into the future.

Our objective: We make Daimler the

most innovative and digital mobility

company.

NOT JUST AVERAGE: OUTSTANDING.

Daimler TSS Data Warehouse / DHBW 4

Page 5: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

INTERNAL IT PARTNER FOR DAIMLER

+ Holistic solutions according to the Daimler guidelines

+ IT strategy

+ Security

+ Architecture

+ Developing and securing know-how

+ TSS is a partner who can be trusted with sensitive data

As subsidiary: maximum added value for Daimler

+ Market closeness

+ Independence

+ Flexibility (short decision making process,

ability to react quickly)

Daimler TSS 5Data Warehouse / DHBW

Page 6: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Daimler TSS

LOCATIONS

Data Warehouse / DHBW

Daimler TSS China

Hub Beijing

10 employees

Daimler TSS Malaysia

Hub Kuala Lumpur

42 employeesDaimler TSS IndiaHub Bangalore22 employees

Daimler TSS Germany

7 locations

1000 employees*

Ulm (Headquarters)

Stuttgart

Berlin

Karlsruhe

* as of August 2017

6

Page 7: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

• After the end of this lecture you will be able to

• Understand function of Frontend Tools

• Understand the necessity for Information Design

WHAT YOU WILL LEARN TODAY

Data Warehouse / DHBWDaimler TSS 7

Page 8: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE

Data Warehouse / DHBWDaimler TSS 8

Data Warehouse

FrontendBackend

External data sources

Internal data sources

Staging Layer(Input Layer)

OLTP

OLTP

Core Warehouse

Layer(Storage

Layer)

Mart Layer(Output Layer)

(Reporting Layer)

Integration Layer

(Cleansing Layer)

Aggregation Layer

Metadata Management

Security

DWH Manager incl. Monitor

Page 9: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

VISUALIZATION IN THE USUAL CASE OF LIFE

Data Warehouse / DHBWDaimler TSS 9

Page 10: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

RUSSIAN CAMPAIGN OF NAPOLEON

Data Warehouse / DHBWDaimler TSS 10

Source: https://de.wikipedia.org/wiki/Charles_Joseph_Minard

Page 11: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

MAPPING THE 1854 LONDON CHOLERA OUTBREAK

Data Warehouse / DHBWDaimler TSS 11

Source: https://www1.udel.edu/johnmack/frec682/cholera/

Page 12: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

MAPPING THE 1854 LONDON CHOLERA OUTBREAK

Data Warehouse / DHBWDaimler TSS 12

Page 13: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

EXCERCISE: VISUALIZE AS MUCH AS POSSIBLE

Data Warehouse / DHBWDaimler TSS 13

Umsatz in €

2014 2015 2016

Kanada 16.000 14.000 17.000

England 8.000 9.000 8.000

Frankreich 7.000 4.000 5.000

USA 60.000 85.000 90.000

Deutschland 4.000 10.000 15.000

Australien 10.000 8.000 15.000

Umsatz 105.000 130.000 150.000

Page 14: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

POSSIBLE SOLUTION 1

14Data Warehouse / DHBWDaimler TSS

Page 15: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

POSSIBLE SOLUTION 2

15Data Warehouse / DHBWDaimler TSS

Page 16: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

• Reporting (Standard, ad-hoc)

• OLAP

• Dashboards, Scorecards

• Advanced Analytics / Data Mining / Text Mining

• Search & Discovery

INTERFACE TO THE END USER

Data Warehouse / DHBWDaimler TSS 16

Page 17: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Standard Reports

• Prepared static reports that can be executed at request by end users

• Are executed at the end of an ETL process and e.g. send by email to end users

• Normally based on fact tables and its dimensions

• Reports are often lists similar to Excel-Sheets but can also contain graphics (e.g. line charts)

Ad-hoc Reports

• End users create their own reports („Self service“)

REPORTING (STANDARD, AD-HOC)

Data Warehouse / DHBWDaimler TSS 17

Page 18: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

ROLAP / MOLAP Client Frontend

• Prepared cubes (multidimensional or relational fact tables)

• User can perform interactive analysis of data

• Rollup / drill-down

• Pivot

• Slicing

• Dicing

OLAP

Data Warehouse / DHBWDaimler TSS 18

Page 19: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

„Progress reports“

Provide an overall view of KPIs (Key Performance Indicators)

Combination of several elements from Reporting and/or OLAP (e.g. line charts) into an overall view (like a „cockpit“)

Dashboard is more focused on operational goals

• High-level overview what is happening

Scorecard is more focused on strategic goals

• Plan a strategy and identify why something happens

DASHBOARDS, SCORECARDS

Data Warehouse / DHBWDaimler TSS 19

Page 20: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Software is eating the world

Machine learning will eat software• For many tasks, it’s easier to collect the data than to explicitly write

the program, e.g. face recognition or chess/go

• On the other hand, data collection isn’t always easy, e.g. billing SW

ADVANCED ANALYTICS / DATA MINING / TEXT MININGTHE FUTURE OF SOFTWARE DEVELOPMENT?

Data Warehouse / DHBWDaimler TSS 20

Source: https://www.oreilly.com/ideas/what-machine-learning-means-for-software-development

Page 21: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

MACHINE LEARNING WILL CHANGE SW DEVELOPMENT

Data Warehouse / DHBWDaimler TSS 21

Source: https://www.oreilly.com/ideas/what-machine-learning-means-for-software-development Source: https://twitter.com/DynamicWebPaige/status/915326707107844097

• Google’s Jeff Dean has reported that 500 lines of TensorFlow code has replaced 500,000 lines of code in Google Translate

• Don’t understate the difficulty of training a neural network of any complexity, but neither should we underestimate the problem of managing and debugging a gigantic codebase

• The developer has to become a teacher, a curator of training data, and an analyst of results

Page 22: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Not just numerical data

Analysis of new data types gets more and more important

• Text

• GPS coordinates

• Pictures

• Videos

Data can be available in RDBMS (e.g. text modules/indexes available), Hadoop or SQL DBs

SEARCH & DISCOVERY

Data Warehouse / DHBWDaimler TSS 22

Page 23: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

MANY GRAPHICAL ELEMENTS TO USE IN REPORTS

Data Warehouse / DHBWDaimler TSS 23

Source: https://github.com/d3/d3/wiki/Gallery

Page 24: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

MANY GRAPHICAL ELEMENTS … CHAMBER OF HORROR

Data Warehouse / DHBWDaimler TSS 24

Source: Hichert / Faisst, http://www.backup-page.hichert.com/

Page 25: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

DO YOU USE 3D USUALLY ?

Data Warehouse / DHBWDaimler TSS 25

Page 26: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Some remarks about previous slide

• 3D elements introduce clutter and give not more information

• Pie chart most often does not make sense

• Line chart barely readable

• Labels are placed outside of the graphic

• Tachometer costs a lot of space and show

• Too much color in general

• Color without meaning, e.g. red should be used for alarms / errors

MANY GRAPHICAL ELEMENTS … CHAMBER OF HORROR

Data Warehouse / DHBWDaimler TSS 26

Page 27: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

DID YOU KNOW? PIZZA IS A REAL-TIME CHART OF HOWMUCH PIZZA IS LEFT

Data Warehouse / DHBWDaimler TSS 27

Page 28: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

STORY TELLING WITH APPROPRIATE VISUALIZATION

Famous example by Hans Rosling (watch 3:08 onwards)

https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen?language=de

Data Warehouse / DHBWDaimler TSS 28

Page 29: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Information design is the practice of presenting information in a way that fosters efficient and effective understanding of it.(source: Wikipedia, https://en.wikipedia.org/wiki/Information_design )

Some authors are well known for their criticism of many graphical representations - they provide rules for good information design

• Edward Tufte

• Stephen Few

• Rolf Hichert

INFORMATION DESIGN

Data Warehouse / DHBWDaimler TSS 29

Page 30: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?

30Data Warehouse / DHBWDaimler TSS

Page 31: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?EYE TRACKING

31Data Warehouse / DHBWDaimler TSS

Page 32: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?IMPROVED VERSION

32Data Warehouse / DHBWDaimler TSS

Page 33: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

WHICH PRODUCTGROUP HAS THE HIGHEST WIN IN JUNE?EYE TRACKING

33Data Warehouse / DHBWDaimler TSS

Page 34: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Define standards, e.g.

• use always the same colors and with care, e.g.

• red = negative

• green = positive

• pie charts are rarely useful and should be avoided

• better use bar chart or line chart

• No 3D elements as these elements don’t enhance information but introduce clutter

• Standardize abbreviations, e.g. PY = previous year

INFORMATION DESIGNREDUCE TO THE ESENTIALS

Data Warehouse / DHBWDaimler TSS 34

Page 35: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

EYE-TRACKING - BEFORE AND AFTER

35Data Warehouse / DHBWDaimler TSS

Page 36: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

TABLE WITH INTEGRATED BAR CHARTS

Data Warehouse / DHBWDaimler TSS 36

Source: Hichert, http://www.hichert.com/de/resource/table-template-02/

Page 37: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Consumers / BI Users

• use reports, OLAP and dashboards to obtain information

Power Users

• Use reports , OLAP and dashboards to obtain information

• Create new reports and dashboards

Data Scientists

• Statistical / mathematical geeks

• Analyze / explore data

• Need to analyze raw (non-cleansed, non-transformed) data

BI END USER ROLES

Data Warehouse / DHBWDaimler TSS 37

Page 38: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

SUMMARYINFORMATION IS BEAUTIFUL

Data Warehouse / DHBWDaimler TSS 38

Source: https://www.youtube.com/watch?v=hOex1iU57iw

Page 39: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

Daimler TSS GmbHWilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSSDomicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle

Data Warehouse / DHBWDaimler TSS 39

THANK YOU

Page 40: LECTURE @DHBW: DATA WAREHOUSE PART LI: FRONTENDbuckenhofer/20182DWH/Bucken... · 2018-12-04 · • Tachometer costs a lot of space and show • Too much color in general • Color

• Machine learning will no doubt change software development in significant ways

• Software developers will put much more effort into data collection and preparation

• Developers will have to do more than just collect data; they’ll have to build data pipelines and the infrastructure to manage those pipelines. We’ve called this “data engineering”

• Data engineers will be responsible for maintaining the data pipeline: ingesting data, cleaning data, feature engineering, and model discovery

LEARN HOW TO REPLACE CODE

Data Warehouse / DHBWDaimler TSS 40

Source: https://www.oreilly.com/ideas/what-machine-learning-means-for-software-development