arnd schirrmann airbus group innovations · airbus group innovations. airbus group in figures •...

17
Big data analytics Arnd SCHIRRMANN Airbus Group Innovations

Upload: others

Post on 26-Sep-2020

11 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Big data analytics

Arnd SCHIRRMANN

Airbus Group Innovations

Page 2: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Airbus Group in Figures

• European multi-national aerospace and defense corporation

• 64 B€ revenue (2015)

• 3.3 B€ R&D (2015)

• 136k employees

• Airbus (commercial aircrafts) >8000 aircrafts in service>1000 aircrafts ordered / year, 629 aircrafts delivered in 2014

Page 3: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Big Data – Important Element in the Digitalization Strategy

Technology Solutions

Bu

sin

ess

Op

po

rtu

nit

ies

J.Botti, Airbus Group CTO (2015)

SHM / IVHMMaintenance Cyber Security

Intelligence

Fleet managementGround operations

Flight Test Analysis

Production / DSS

Quality management

Collaboration

Page 4: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

J.Botti, Airbus Group CTO (2015)

Big Data Example – Aircraft Operations

Page 5: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

J.Botti, Airbus Group CTO (2015)

Big Data in the Airbus Group Strategy

Page 6: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Big Data Analytics

Optional Information:

Speaker Name: Norman Paton

Speaker Affiliation: University of Manchester

Page 7: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Data Wrangling: The Elephant in the Room for Big Data

Norman Paton

University of Manchester

Page 8: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Data Wrangling

• Definitions:• a process of iterative data exploration and transformation that enables analysis

[1].

• the process of manually converting or mapping data from one "raw" form into another format that allows for more convenient consumption of the data with the help of semi-automated tools [2].

[1] S. Kandal, et al., Research Directions in Data Wrangling: Vizualizations andTransformations for usable and credible data, Information Visualization, 10(4), 271-288, 2011. [2] http://en.wikipedia.org/wiki/Data_wrangling, 12th May 2015.

Page 9: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Extract, Transform and Load - 1

• Of course, this is not completely new, and Extract, Transform and Load (ETL) tools have been around for a significant time.

• ETL tools support source wrapping, warehouse population, workflow languages, etc.

• ETL vendors also have “big data” offerings.

www.talend.com

Page 10: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Extract, Transform and Load - 2• ETL tools are clearly useful, with products from database vendors and data

integration companies.

• ETL tools emerged to support data warehousing, and thus typically have roots in enterprise settings.

• ETL tools typically involve significant manual effort.

• ETL costs no doubt vary widely from project to project, but are quoted as representing up to 80% of the development time in warehousing projects [1].

• In addition to problem-specific data warehouses, ETL techniques can be applied to create more generic data lakes.

[1] S. Kandal, et al., Research Directions in Data Wrangling: Vizualizations and Transformations for usable and credible data, Information Visualization, 10(4), 271-288, 2011.

Page 11: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Big Data – does it make a difference?

• Big data is sometimes characterised by the 4 V’s:• Volume – the scale of the data,

• Velocity – speed of change,

• Variety – different forms of data, and

• Veracity – uncertainty of data.

• So size matters but, it isn't everything.

• Data wrangling for big data must address all four V’s at the same time.

• Classical, substantially manual ETL may struggle with numerous and rapidly changing sources.

Page 12: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Case Study: e-Commerce

• If you run an e-Commerce site, then you need to be able to understand pricing trends among your competitors.

• Example: “Product prices are constantly monitored by active market research…, allowing us to bring the best deals to you, the customer”.

• This may involve getting to grips with:• Volume: thousands of sites;

• Velocity: sites, site descriptions and contents changing;

• Variety: in format, content, user community, …; and

• Veracity: unavailability, inconsistent descriptions, …

• Manual attempts at data wrangling are likely to be expensive, partial, unreliable, poorly targeted, …

Page 13: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Data Wrangling Research and Development

• So data wrangling in its wider sense is a research challenge, currently without a community or established priorities.

• The need seems to be great:• data scientists spend from 50 percent to 80 percent of their time collecting and preparing unruly

digital data. [1]

• The user community may consist principally of small organisations:• the overwhelming majority of information economy businesses – 95% of the 120,000 enterprises

in the sector – employ fewer than 10 people. [2]

[1] S. Lohr. For big-data scientists, ‘janitor work’ is key hurdle to insights. http://nyti.ms/1Aqif2X, 2014.[2] UK Department for Business, Innovation & Skills. Information economy strategy. http://bit.ly/1W4TPGU, 2013.

Page 14: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Example Data Wrangling Steps

Source Selection

Data Extraction

Matching

Mapping Generation

Duplicate Detection

There is no fixed data wrangling

lifecycle

All of these steps have tended to

require expert input

Data Fusion

Page 15: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Challenges for Data Wrangling

• What challenges need to be addressed to bring about cost-effective data wrangling that takes account of the 4 V’s? Here are some:• The need to automate as many steps within a data wrangling process as

possible (source selection, data transformation, data deduplication, …).• The need to make well informed compromises … because perfect solutions

will typically be out of reach.• The need to adopt an incremental, pay-as-you-go approach … so that

automatically generated solutions can be refined in the light of feedback.• The need to extend the boundaries … to include sources in the wild, relating

an organisation’s data to that of others.• The need to use all the available data … so that the rich data environment is

an asset during automation.

Page 16: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

Conclusions

• Data wrangling is a problem and an opportunity:• A problem because the 4V’s of big data may all be present together a lot of

the time, undermining manual approaches.

• An opportunity because if we can make data wrangling much more cost effective, all sorts of hitherto impractical tasks come into reach.

Additional information:Furche, T., Gottlob, G., Libkin, L., Orsi, G., Paton, N.W., Data Wrangling for Big Data: Challenges and Opportunities. Proc. 19th International Conference on Extending Database Technology (EDBT), 473-478, 2016.vada.org.uk

Page 17: Arnd SCHIRRMANN Airbus Group Innovations · Airbus Group Innovations. Airbus Group in Figures • European multi-national aerospace and defense corporation • 64 B€revenue (2015)

AcknowledgementsThis work is funded by the UK Engineering and Physical Sciences Research Council, through the VADA Programme Grant on Value Added Data Systems: Principles and Architecture, to:

Georg Gottlob, Thomas Lukasiewicz,Dan Olteanu, Giorgio Orsi, Tim Furche

Norman Paton, Alvaro Fernandes,John Keane

Leonid Libkin, Wenfei Fan, Peter Buneman, Sebastian Maneth

In cooperation with: