Big data analytics
Arnd SCHIRRMANN
Airbus Group Innovations
Airbus Group in Figures
• European multi-national aerospace and defense corporation
• 64 B€ revenue (2015)
• 3.3 B€ R&D (2015)
• 136k employees
• Airbus (commercial aircrafts) >8000 aircrafts in service>1000 aircrafts ordered / year, 629 aircrafts delivered in 2014
Big Data – Important Element in the Digitalization Strategy
Technology Solutions
Bu
sin
ess
Op
po
rtu
nit
ies
J.Botti, Airbus Group CTO (2015)
SHM / IVHMMaintenance Cyber Security
Intelligence
Fleet managementGround operations
Flight Test Analysis
Production / DSS
Quality management
Collaboration
J.Botti, Airbus Group CTO (2015)
Big Data Example – Aircraft Operations
J.Botti, Airbus Group CTO (2015)
Big Data in the Airbus Group Strategy
Big Data Analytics
Optional Information:
Speaker Name: Norman Paton
Speaker Affiliation: University of Manchester
Data Wrangling: The Elephant in the Room for Big Data
Norman Paton
University of Manchester
Data Wrangling
• Definitions:• a process of iterative data exploration and transformation that enables analysis
[1].
• the process of manually converting or mapping data from one "raw" form into another format that allows for more convenient consumption of the data with the help of semi-automated tools [2].
[1] S. Kandal, et al., Research Directions in Data Wrangling: Vizualizations andTransformations for usable and credible data, Information Visualization, 10(4), 271-288, 2011. [2] http://en.wikipedia.org/wiki/Data_wrangling, 12th May 2015.
Extract, Transform and Load - 1
• Of course, this is not completely new, and Extract, Transform and Load (ETL) tools have been around for a significant time.
• ETL tools support source wrapping, warehouse population, workflow languages, etc.
• ETL vendors also have “big data” offerings.
www.talend.com
Extract, Transform and Load - 2• ETL tools are clearly useful, with products from database vendors and data
integration companies.
• ETL tools emerged to support data warehousing, and thus typically have roots in enterprise settings.
• ETL tools typically involve significant manual effort.
• ETL costs no doubt vary widely from project to project, but are quoted as representing up to 80% of the development time in warehousing projects [1].
• In addition to problem-specific data warehouses, ETL techniques can be applied to create more generic data lakes.
[1] S. Kandal, et al., Research Directions in Data Wrangling: Vizualizations and Transformations for usable and credible data, Information Visualization, 10(4), 271-288, 2011.
Big Data – does it make a difference?
• Big data is sometimes characterised by the 4 V’s:• Volume – the scale of the data,
• Velocity – speed of change,
• Variety – different forms of data, and
• Veracity – uncertainty of data.
• So size matters but, it isn't everything.
• Data wrangling for big data must address all four V’s at the same time.
• Classical, substantially manual ETL may struggle with numerous and rapidly changing sources.
Case Study: e-Commerce
• If you run an e-Commerce site, then you need to be able to understand pricing trends among your competitors.
• Example: “Product prices are constantly monitored by active market research…, allowing us to bring the best deals to you, the customer”.
• This may involve getting to grips with:• Volume: thousands of sites;
• Velocity: sites, site descriptions and contents changing;
• Variety: in format, content, user community, …; and
• Veracity: unavailability, inconsistent descriptions, …
• Manual attempts at data wrangling are likely to be expensive, partial, unreliable, poorly targeted, …
Data Wrangling Research and Development
• So data wrangling in its wider sense is a research challenge, currently without a community or established priorities.
• The need seems to be great:• data scientists spend from 50 percent to 80 percent of their time collecting and preparing unruly
digital data. [1]
• The user community may consist principally of small organisations:• the overwhelming majority of information economy businesses – 95% of the 120,000 enterprises
in the sector – employ fewer than 10 people. [2]
[1] S. Lohr. For big-data scientists, ‘janitor work’ is key hurdle to insights. http://nyti.ms/1Aqif2X, 2014.[2] UK Department for Business, Innovation & Skills. Information economy strategy. http://bit.ly/1W4TPGU, 2013.
Example Data Wrangling Steps
Source Selection
Data Extraction
Matching
Mapping Generation
Duplicate Detection
There is no fixed data wrangling
lifecycle
All of these steps have tended to
require expert input
Data Fusion
Challenges for Data Wrangling
• What challenges need to be addressed to bring about cost-effective data wrangling that takes account of the 4 V’s? Here are some:• The need to automate as many steps within a data wrangling process as
possible (source selection, data transformation, data deduplication, …).• The need to make well informed compromises … because perfect solutions
will typically be out of reach.• The need to adopt an incremental, pay-as-you-go approach … so that
automatically generated solutions can be refined in the light of feedback.• The need to extend the boundaries … to include sources in the wild, relating
an organisation’s data to that of others.• The need to use all the available data … so that the rich data environment is
an asset during automation.
Conclusions
• Data wrangling is a problem and an opportunity:• A problem because the 4V’s of big data may all be present together a lot of
the time, undermining manual approaches.
• An opportunity because if we can make data wrangling much more cost effective, all sorts of hitherto impractical tasks come into reach.
Additional information:Furche, T., Gottlob, G., Libkin, L., Orsi, G., Paton, N.W., Data Wrangling for Big Data: Challenges and Opportunities. Proc. 19th International Conference on Extending Database Technology (EDBT), 473-478, 2016.vada.org.uk
AcknowledgementsThis work is funded by the UK Engineering and Physical Sciences Research Council, through the VADA Programme Grant on Value Added Data Systems: Principles and Architecture, to:
Georg Gottlob, Thomas Lukasiewicz,Dan Olteanu, Giorgio Orsi, Tim Furche
Norman Paton, Alvaro Fernandes,John Keane
Leonid Libkin, Wenfei Fan, Peter Buneman, Sebastian Maneth
In cooperation with: