the value of digital technologies big data
Post on 02-Jun-2022
1 Views
Preview:
TRANSCRIPT
The Value of Digital TechnologiesBig Data
Sofia, 23 March 2018
Severino MeregalliScientific Coordinator – DEVO Lab
SDA Bocconi
THE BUSINESS CONTEXT: WHY DATA EXPLOITATION IS SO IMPORTANT
• Dynamism and complexity as structural elements
• Fuzzy business scenarios
• Complexity management and profit linkage
• The fall of management as a science and of prescriptive management
• The fall of the “legendary” long term strategic planning as an antidote to complexity
• The “evergreen” gap between Business Requirements and Information Systems
• Desperate search of insight and knowledge sources
THE (BIG) DATA LANDSCAPE
• Generating value from data and analytics is one of the pillars of competitive advantage
• Decision-making in complex and dynamic organizations calls for a full exploitation of data resources
• Progressive digitalization of businesses vs skills needed to take advantage of large and complex dataset
• Big Data, Data Discovery and Analytics have suffered all negative impacts due to hype and the rise of improvised players
• Wide range of high performing technologies and players
• Cost/benefit leverage calls for a deep understanding of the real opportunities and hurdles in Data exploitation
DATA EXPLOSION VS ABILITY TO EXECUTE
• There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.
• Organizations need not only to put the right talent and technology in place but also structure workflows and incentives to optimize the use of big data.
McKinsey Global Institute 2011
5
THE MEDIA HYPE
A CROWDED MARKET
7
HYPE… AS USUAL
Gartner Hype Cycle for Emerging Technologies, 2014
8
THE CALL FOR A MANAGERIAL APPROACH (DEVO LAB SDA BOCCONI)
Value
Shortcut
9
THE ISSUE
After the first wave of technology adoption for managing and analyzing large datasets, both the academic and the practitioner community acknowledged the risks of (another) «hype
driven» approach
10
THREE KEY TOPICS
Physical vs Social
Data quality
Context
• It is relatively less complex to get significant results when the focus of the analysis is on deterministic phenomena (Natural Sciences) rather than on Social Sciences
• In Natural Sciences it is possible to explain/understand a phenomenon by observing a singularity (ega star with an odd orbit) …the same does not apply to Social Sciences (eg trendsetter vs crazy behavior)
• Predictive analysis, as well as the mere understanding of phenomena impacted by social variables is still characterized by issues difficult to address, even when companies have large amounts of data and computing power
• The paradox is that in the digital world sometimes it is easier to influence behaviors rather than understand them
• The short term economic value is proportional to the difficulty of the task: higher in Social Sciences, lower in Natural Sciences
11
PHYSICAL VS. SOCIAL PHENOMENA
12
DATA QUALITY MANAGEMENT
• The stratification of large amount of data, with different formats, different scopes, emphasizes the old but evergreen concept of “Garbage-in, Garbage-OUT“
• Big Data tools and technologies have not yet solved this problem and, in some cases, it has been amplified by the presence of data from sources that are out of control (i.e. Social Networks)
• “Data Quality" attitude is a precondition to initiate a virtuous cycle of data value exploitation
• Technology is here to help, but we still have issues:
– uniqueness (single source of truth)
– accountability for data quality (not IT)
– consistency of goals between who produces and who analyses the data
– availability of consistent and shared data information (metadata)
– legal issues
13
UNDERSTANDING DOMAIN, CONTEXT AND DECODING RESULTS
• The breadth and variety of datasets allow analysts to find numerous correlations between variables, which can not be found in small datasets
• The issue is to understand which are the meaningful correlations to be considered, since.. the more the variables, the more correlations that can show significance
• Context is hard to interpret at scale and even harder to maintain when data are reduced to fit into a model. Obtaining and managing context data will be a challenge.
The more variables, the more correlations that can show significance. Falsity also grows
faster than information; it is nonlinear (convex) with respect to data
N. Taleb - Professor of risk engineering at New York University’s Polytechnic Institute.
14
THE MORE VARIABLES, THE MORE CORRELATIONS THAT CAN SHOW SIGNIFICANCE…
• Contextual data are scarce and very often not available or not consistent with the needs
• Each application domain requires to involve experts that know it from inside. Statistical “brute force” approach does not work well in Social Sciences
• The issue is to find the sweet spot between “obvious” and “false” findings
15
UNDERSTANDING DOMAIN, CONTEXT AND DECODING RESULTS
• Differentiate between physical and social phenomena
• Measure the "quality" of available data
– Accuracy– Reliability– Completeness– Consistency– Timeliness
• Consider the availability of domain experts / knowledge when dealing with social phenomena
16
THREE PILLARS FOR DATA VALUE
17
THE ANALYSIS MODEL
Data QualityValue
Level of DeterminismLow
High
Low High
Value
Social phenomena
Physical phenomena
POSSIBLE PATHS
Level of DeterminismLow
High
Low High
ValueData QualityValue
19
VOLVO CAR CORPORATION CASE HISTORY
The Company• Global leader in the automotive industry• Acquired by Geely Auto Group in 2010• Focus on quality and safety : Our vision is to design cars that should not crash. In the shorter perspective
the aim is that by 2020 no-one should be killed or injured in a Volvo car.
Scope of Work• Improving quality of data
collected from dealers, engineering, production and from diagnostic systems (DRO)
• Build and unified repository of integrated data
Achievements• Problem identification and
prioritization of maintenance activities
• Solving problems of quality during the production processes
• Warranty programs management accuracy
• Potential failure predictive analysis
The Needs• Analyze mechanical
performances of the vehicles in real driving conditions in order to improve design, production and after-sales service (warranty) processes
20
VOLVO CAR CORPORATION CASE HISTORY
Low
High
Low High
Value
FullPartialNull
Level of Determinism
Data QualityValue
21
SCE SMART CONNECT CASE HISTORY
The Company• Southern California Edison is the largest subsidiary of Edison International• For over a century, the company provides electricity to about 14 million customers in Southern
California (Central, Coastal & Southern California)
Scope of Work• Acquisition of data from
Smart Meter (720 readings per month per customer, about 5.6 billion of readings per month total)
• Smart meters data integration with expenses and demographic information
Achievements• Improvement in production
and distribution flow management
• Peak usage prediction
The Needs• Provide customers with a
weekly reporting of energy consumption, in order to gain expenses control
22
SOUTHERN CALIFORNIA EDISON CASE HISTORY
Low
High
Low
Value
Full
Level of Determinism
Data QualityValue
23
GDF SUEZ CASE HISTORY
The Company• French group, one of the main Utility worldwide (turnover of about 70 billion €)• Founded in 2008 after Suez and GDF merge• Core business: production and distribution of electricity, natural gas and renewable sources
Scope of Work• Customer size wasn’t
addressed consistently (admin vs. Commercial data)
• Improvement in: Data Quality, CRM & Billing integration, Marketing Campaigns
• Incremental understanding of customers’ related phenomena
Achievements• Customer’s value – based
segmentation
• Churn, due to customer’s relocation, prevention
• «Gas-only» customer’s acquisition (electricity)
The Needs• After liberalization of the
energy market in France, B2C (CH&P) Business Unit was willing to pursue the opportunity to grow in the electricity market leveraging their gas market share
• Understand customer segmentation, where to focus sales and marketing initiatives and how
24
GDF SUEZ CASE HISTORY
Low
High
Low High
Value
Level of Determinism
Data QualityValue
• The analysis of case studies highlights how a mature approach to Data Value bank on two main dimensions:
– data quality
– the ability to interpret /understand phenomena
• Thanks to the analysis of case histories, it has been possible to identify a first set of Data Value components
25
LESSON LEARNED FROM CASE HISTORIES
26
DATA VALUE LAYERSINTRINSIC VALUE
Data Model
Data Volume – Cross Section
Data Volume – Stock
Data Quality
Quantitative tools
Cognitive tools
Data Model
Data Volume – Cross Section
Data Volume – Stock
Data Quality
Quantitative tools
Cognitive tools
Physical Social
Domain Expertise
27
DATA VALUE LAYERSPOTENTIAL VALUE
Data
ToolsExpertise
Data Model
Data Volume – Cross Section
Data Volume – Stock
Data Quality
Quantitative tools
Cognitive tools
Data Model
Data Volume – Cross Section
Data Volume – Stock
Data Quality
Quantitative tools
Cognitive tools
Physical Social
Context - Data
Context - Models
Domain Expertise
Context - Data
Context - Models
Edge Computing
vs
Edge Organizations
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
29
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
Data ownership and side effects control
30
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
Data storage technologies evolution is much slower than data growth
Big Data, Machine Learning and Quantum Computing: the perfect storm ?
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
NEW FRONTIERS, NEW OPPORTUNITIES…NEW ISSUES
Research, consulting, teaching, software industry, working again without hackatons
(or with real rewards and better ethics)
34
SUMMARY AND RECOMMENDATIONS
• There are no "big data". We have only data which are manageable / unmanageable with state of the art technologies
• The real challenge is getting «Big Info» and take better decisions
• Natural and Social domains are different
• Data quality is still the precondition for any project
• Context understanding and contextual data are (in social applications) very often the real bottleneck
• Use a checklist to asses data value components before starting a project
• Only consider vendors that are able to provide fully integrated solutions to their data issues (no room for improvised players)
• Not to capitalize on data sets in Natural Sciences Domains is a big mistake …..transforming data sets in value in Social Sciences is (still) a big challenge
• Davenport T.H., Big Data at Work, Dispelling the Myths, Uncovering the Opportunities, Harvard Business Review Press, 2013
• Davenport T.H., Data Scientist: The Sexiest Job of the 21st Century, Harvard Business Review, October, 2012 http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
• Gartner Hype Cycle for Emerging Technologies, 2014• McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity,
2011• Redman T, Data’s Credibility Problem, Harvard Business Review, December 2013
http://hbr.org/2013/12/datas-credibility-problem/ar/1• Ross J.W., Beath C.M., Quaadgras A., You may not need Big Data after all, Harvard Business
Review, December 2013, http://hbr.org/2013/12/you-may-not-need-big-data-after-all/ar/1• Taleb N. N., Beware the Big Errors of ‘Big Data’, Wired, 2013 www.wired.com/2013/02/big-data-
means-big-errors-people/
REFERENCES
top related