Capturing Value from Big Data through
Data-Driven Business Models
Patterns from the Start-up world
Philipp Hartman,
Dr Mohamed Zaki and Prof Duncan McFarlane
Cambridge Service Alliance
University of Cambridge
“Data is the new oil”1
1 various authors, e.g. Clive Humby
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
2005 2010 2015 2020
Data volume per year (Exabytes)2
2 IDC's Digital Universe Study, December 2012
Two general areas can be identified where
big data creates value
3
How to get value from Big Data?
Optimization of existing business1
New Business Models1
Chesbrough, Rosenbloom (2002): Business model to capture value from an innovation Crisculo (2012): New technologies often first commercialized through start-up companies
1 cf. McKinsey 2011, IBM 2012, Davenport 2006, AT Kearney 2013, EMC 2012
Based on this motivation the research
question was developed
4
What types of business models that rely on data as a key resource (i.e. data-driven business models) can be found in start up companies?
How to analyse data-driven business
models?
Sub
q
ues
tio
ns
Data-driven business model framework
How to identify patterns?
Res
earc
h
Qu
est
ion
Clustering
The research was done in five steps
5
Case studies Finding Patterns
Data collection & coding
Build the framework
Literature Review
How to analyse data-driven business
models?
How to identify patterns?
The first step was a literature review with
three different topics
6
Lite
ratu
re R
evie
w
Big Data
Definition
Value Creation
Business Model
Definition
Business Model Frameworks
Related Work
Data driven business Models
Cloud business models
Case studies Finding Patterns
Data collection & coding
Build the framework
Literature Review
The first step was a literature review with
three different topics
7
Lite
ratu
re R
evie
w
Big Data
Definition
Value Creation
Business Model
Definition
Business Model Frameworks
Related Work
Data driven business Models
Cloud business models
Case studies Finding Patterns
Data collection & coding
Build the framework
Literature Review
Literature review: Business Model
8
Lite
ratu
re R
evie
w
Big Data
Definition
Value Creation
Business Model
Definition
Business Model Frameworks
Related Work
Data driven business Models
Cloud business models
Case studies Finding Patterns
Data collection & coding
Build the framework
Literature Review
Business model key components were
synthesized from existing frameworks
Existing Business Model Frameworks
- Chesbrough & Rosenbloom 2002 - Hedman & Kaling 2003 - Osterwalder 2004 - Morris 2005 - Johnson, Christensen et. al. 2008 - Al-Debei 2010 - Burkhart 2012
Val
ue
Cap
turi
ng
Val
ue
Cre
atio
n
Key Resources
Key Activities
Cost structure
Revenue Model
Customer Segment
Value Proposition
Business Model Definition Business Model Key Components
- No universally accepted definition of the concept (Weill, Malone et al. 2011)
- Most definitions refer to value creation & value capturing
Only a few papers are available in this field
10
Lite
ratu
re R
evie
w
Big Data
Definition
Value Creation
Business Model
Definition
Business Model Frameworks
Related Work
Data driven business Models
Cloud business models
Case studies Finding Patterns
Data collection & coding
Build the framework
Literature Review
The review was extend to cloud business
models
11
Lite
ratu
re R
evie
w
Big Data
Definition
Value Creation
Business Model
Definition
Business Model Frameworks
Related Work
Data driven business Models
Cloud business models
Case studies Finding Patterns
Data collection & coding
Build the framework
Literature Review
The literature review identified several gaps
12
• Little academic research on big data and value creation – mostly whitepapers
• Gap in literature: data-driven business models
• Otto, Aier (2013) interesting paper but limited to specific industry > no generalization possible
• Similar research for cloud business models (cf. Labes, Erek et. Al. 2013)
Case studies Finding Patterns
Data collection & coding
Build the framework
Literature Review
The framework was build from literature
starting from the key components
Data-Driven-Business Model
Data Sources
Internal existing data
Self-generated
Data
External
Acquired Data
Customer provided
Free available
Open Data
Social Media data Web
Crawled Data
Key Activity
Data Generation
Crowdsourcing
Tracking & Other Data
Acquisition
Processing
Aggregation
Analytics
descriptive
predictive
prescriptive Visualization
Distribution
Offering
Data
Information/Knowledge Non-Data
Product/Service
Target Customer
B2B
B2C
Revenue Model
Asset Sale
Lending/Renting/Leasing
Licensing
Usage fee
Subscription fee
Advertising
Specific cost advantage
Data-Driven Business Model
Data Sources
Key Activity
Offering
Target Customer
Revenue Model
Specific cost advantage
Data collection & coding
Case studies Finding Patterns
Literature Review Build the
framework
Features for each dimension
Data-Driven Business Model Framework
Business Model Key Components (Dimensions)
Data Sources Features for data sources
Synthesizing the different sources leads to
the taxonomy
14
Data Sources
Internal
existing data
Self-generated Data
External
Acquired Data
Customer provided
Free available
Open Data
Social Media data
Web Crawled Data
Dimension: Activities
15
Key Activity
Data Generation
Crowdsourcing
Tracking & Other
Data Acquisition
Processing
Aggregation
Analytics
descriptive
predictive
prescriptive Visualization
Distribution
Dimension: Offering
16
Offering
Data
Information/Knowledge
Non-Data Product/Service
Dimension: Revenue Model
17
Revenue Model
Asset Sale
Lending/Renting/Leasing
Licensing
Usage fee
Subscription fee
Advertising
Dimension: Target Customer
18
Target Customer
B2B
B2C
Data collection & coding
The final framework
19
Case studies Finding Patterns
Literature Review Build the
framework
Data-Driven-Business Model
Data Sources
Internal
existing data
Self-generated Data
External
Acquired Data
Customer provided
Free available
Open Data
Social Media data
Web Crawled Data
Key Activity
Data Generation
Crowdsourcing
Tracking & Other
Data Acquisition
Processing
Aggregation
Analytics
descriptive
predictive
prescriptive Visualization
Distribution
Offering
Data
Information/Knowledge
Non-Data Product/Service
Target Customer
B2B
B2C
Revenue Model
Asset Sale
Lending/Renting/Leasing
Licensing
Usage fee
Subscription fee
Advertising
Specific cost advantage
Data collection and coding
20
Case studies Finding Patterns
Build the framework
Literature Review Data collection
& coding
Data collection Data analysis Sampling
The data was generated using public
available sources
21
Tag: “big data” “big data analytics”
1329 companies
Data collection
Company information • Company websites • Press releases
Public sources
• Coding of sources using data driven business model framework
• Nvivo
Data analysis
299 Sources ~3 sources/comp
Sampling
100 Companies
cleaning
Random sample
100 binary feature vectors
Overall Analysis: Data Source
22
0% 10% 20% 30% 40% 50% 60%
Acquired Data
Customer&Partner-provided Data
Free available
Crowd Sourced
Tracked & Other
Note: Sum > 100% as companies might use multiple data sources
• >50% of companies rely on free available data
• >50% of companies use data provided by customers/partners
Overall Analysis: Key Activities
23
0% 10% 20% 30% 40% 50% 60% 70% 80%
Aggregation
Analytics
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
Data acquistion
Data generation
Data processing
Distribution
Visualization
• >70% of companies use analytics - mostly descriptive
Note: Sum > 100% as some companies rely on multiple revenue models
Overall Analysis: Revenue Model
24
0% 10% 20% 30% 40% 50%
Advertising
Asset Sales
Brokerage Fees
Lending Renting Leasing
Licensing
Subscription fee
Usage Fee
No information
• Majority of revenue models based on subscription and/or usage fee
• No information about the revenue model as many companies are in an early stage
Note: Sum > 100% as some companies rely on multiple revenue models
Overall Analysis: Target Customer
25
70%
17%
13%
B2B B2C both
• There seems to be a noteworthy predominance of B2B business models
• But no reference data found
BM patterns were identified using a
clustering approach
26
Ketchen, David J.; Shook, Christopher L. (1996): The Application of Cluster Analysis in Strategic Managment Reserach: An Analysis and Critique. In: Strat. Mgmt. J. 17 (6). Han, Jiawei; Kamber, Micheline (2011): Data mining. Concepts and techniques. Mooi, Erik; Sarstedt, Marko (2011): Cluster Analysis. In: A Concise Guide to Market Research. S. 237-284. Miligan, Glenn W. (1996): Clustering Validation: Results and Implications for Applied Analyses. In Phipps Arabie, Lawrence J. Hubert, Geert de Soete (Eds.): Clustering and classification. pp. 341–376.
Case studies Data collection
& coding Build the
framework Literature Review
Finding Patterns
2. Clustering method
1. Clustering Variables
3. Number of Clusters
4. Validate & Interpret C.
7 Business Model Cluster were identified
27
Cluster 1 2 3 4 5 6 7
Dat
a So
urc
e
Acquired Data 0 0 1 0 0 0 0
Customer-provided Data 0 1 1 0 0 1 1
Free available 1 0 1 0 1 0 1
CrowdSourced 0 0 0 0 0 0 0
Tracked, Generated & other 0 0 0 1 0 0 0
Key
Act
ivit
y Aggregation 1 0 0 0 0 1 1
Analytics 0 1 1 1 1 0 1
Data acquistion 0 0 1 0 0 0 0
Data generation 0 0 0 1 0 0 1
Number of companies 17 28 5 16 14 6 14
Type A B - C D E F
6 significant Business Model types were
identified
28
Type B: “Analytics-as-a-Service”
Type C: “Data generation & Analytics”
Type D: “Free Data Knowledge Discovery”
Type A: “Free Data Collector & Aggregator”
Type E: “Data Aggregation-as-a-Service”
Type F: “Multi-Source data mashup and analysis”
The 6 BM types are characterised by the key
activities and key data sources
29
Type F
Type A Type D
Type E Type B
Type C
Aggregation Analytics Data generation
Free
a
vaila
ble
C
ust
om
er
pro
vid
ed
Trac
ked
&
gen
erat
ed
Key activity
Key
Dat
a So
urc
e
Type D: “Free Data Knowledge Discovery”
1. DealAngel 2. Gild 3. Insightpool 4. Juristat 5. Market Prophit 6. MixRank 7. Numberfire 8. Olery 9. PeerIndex 10. PolyGraph 11. Review Signal 12. Tellagence 13. traackr 14. Trendspottr
- Free available
- Social Media - Open Data - Web Crawled
B2B B2C
Key Activities
Revenue Model
Key Data Source
- Analytics
Target Customer
0 5 10 15
Descriptive
Predictive
Prescriptive
0 2 4 6 8
Subscription
Usage Fee
Advertising
Brokearge Fees
No Information
Companies
Type D: Examples
31
“Using patent-pending technology, Gild evaluates the work of millions of developers so companies using Gild’s talent acquisition tools know who’s good and can target the right candidates.” • Key Data: Free available websites
(GitHub, Google Codes)
• Key Activities: Analytics
• Revenue Model: Monthly subscription
• Target Customer: B2B
“ Our goal is to provide the most accurate and honest reviews possible by using the data consumers create. We listen to the conversations, analyze them and visualize them for consumers.” • Key Data: Twitter
• Key Activities: Analytics
• Revenue Model: Advertising
• Target Customer: B2B (B2C)
Finding Patterns
The cases studies will be validated the
framework and the clustering
32
Data collection & coding
Build the framework
Literature Review Case studies
4 case studies with companies from the sample such as
Purpose:
1. Validate framework & clusters
2. Illustrate business model types through examples
3. Identify specific challenges
Summary
33
- Gap in literature identified
- RQ: What types of business models that rely on data as a key resource (i.e. data-driven business models) can be found in start up companies?
- 5 (7) different business model patterns identified
- Data-driven business model framework created to enable analysis
- Pattern identification through clustering
- Validation through Case-Studies
Limitations & Outlook
34
Limitations
• Only 100 samples
• Only start up companies • Bias of data source (AngelList)
• Statistical significance of
clustering result
• Only public available sources used
• No statement about success of a particular business model
Outlook/Next Steps
1. Improve validity of findings 1. Increase sample size to test
clusters 2. More Case-studies to
illustrate/validate clusters
2. Include established organizations
3. Develop methodology to judge (financial) performance of different business models
Appendix
35
Literature
36
Al-Debei, Mutaz M.; Avison, David (2010): Developing a unified framework of the business model concept. In Eur J Inf Syst 19 (3), pp. 359–376. Burkhart, Thomas; Wolter, Stephan; Schief, Markus; Krumeich, Julian; Di Valentin, Christina; Werth, Dirk et al. (2012): A comprehensive approach towards the structural description of business models. In : Proceedings of the International Conference on Management of Emergent Digital EcoSystems. New York, NY, USA: ACM (MEDES ’12), pp. 88–102. Chesbrough, H.; Rosenbloom, R. (2002): The role of the business model in capturing value from innovation: evidence from Xerox Corporation's technology spin-off companies. In Industrial and Corporate Change 11 (3), pp. 529–555. Criscuolo, Paola; Nicolaou, Nicos; Salter, Ammon (2012): The elixir (or burden) of youth? Exploring differences in innovation between start-ups and established firms. In Research Policy 41 (2), pp. 319–333.
Literature
37
Davenport, Thomas H.; Harris, Jeanne G. (2007): Competing on analytics. The new science of winning. Boston, Mass: Harvard Business School Press. Everitt, Brian; Landau, Sabine; Leese, Morven (2011): Cluster analysis. 5th ed. London, New York: Arnold; Oxford University Press. Hagen, Christian; Khan, Khalid; Ciobo, Marco; Miller, Jason; Wall, Dan; Evans, Hugo; Yadava, Ajay (2013): Big Data and the Creative Destruction of Today's Business Models. ATKearney. Han, Jiawei; Kamber, Micheline; Pei, Jian (2011): Data Mining. Concepts and Techniques. 3rd ed. Burlington: Elsevier Science. Hedman, Jonas; Kalling, Thomas (2003): The business model concept: theoretical underpinnings and empirical illustrations. In European Journal of Information Systems 12 (1), pp. 49–59.
Literature
38
Johnson, Mark W.; Christensen, Clayton M.; Kagermann, Henning (2008): Reinventing your business model. In Harvard Business Review 86 (12), pp. 57–68. Labes, Stine; Erek, Koray; Zarnekow, Ruediger (2013): Common Patterns of Cloud Business Models. In : AMCIS 2013 Proceedings. Manyika, James; Chui, Michael; Brown, Brad; Bughin, Jacques; Dobbs, Richard; Roxburgh, Charles; Hung Byres, Angela (2011): Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute. Morris, Michael; Schindehutte, Minet; Allen, Jeffrey (2005): The entrepreneur's business model: toward a unified perspective. In Special Section: The Nonprofit Marketing Landscape 58 (6), pp. 726–735. Negash, Solomon (2004): Business Intelligence. In Communications of the Association for Information Systems 13, pp. 177–195.
Literature
39
Osterwalder, Alexander (2004): The Business Model Ontology. A Proposition in Design Science Research. These. Ecole des Hautes Etudes Commerciales de l’Université de Lausanne, Lausanne. Otto, Boris; Aier, Stephan (2013): Business Models in the Data Economy: A Case Study from the Business Partner Data Domain. With assistance of Rainer Alt, Bogdan Franczyk. In : Proceedings of the 11th International Conference on Wirtschaftsinformatik (WI 2013) : The 11th International Conference on Wirtschaftsinformatik (WI 2013), vol. 1. Leipzig, pp. 475–489. Pham, D. T.; Dimov, S. S.; Nguyen, C. D. (2005): Selection of K in K-means clustering. In Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 219 (1), pp. 103–119. Pham, D. T.; Afify, A. A. (2007): Clustering techniques and their applications in engineering. In Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 221 (11), pp. 1445–1459.
Literature
40
Rousseeuw, Peter J. (1987): Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. In Journal of Computational and Applied Mathematics 20 (0), pp. 53–65. Schroeck, Michael; Shockley, Rebecca; Smart, Janet; Romero-Morales, Dolores; Tufano, Peter (2012): Analytics: The real-world use of big data. How innovative enterprises extract value from uncertain data. IBM Institute for Business Value; Saïd Business School at the University of Oxford. Singh, Ranjit; Singh, Kawaljeet (2010): A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing. In IJCSI International Journal of Computer Science I 7 (3), pp. 41–50. Weill, Peter; Malone, Thomas W.; Apel, Thomas G. (2011): The business models investors prefer. In MIT Sloan Management Review 52 (4), p. 17.
The Clustering Process
41
Variables relevant to determine clustering
(Miligan 1996)
#Variables has to match #samples
(Mooi 2011)
~ 2m samples for m variables:
6-7 variables
Avoid high correlation between variables (<0.9) (Mooi 2011)
2 Dimensions: “Data source” &
“Key Activity”
9 variables
max. correlation: 0,5
2. Clustering method
3. Number of Clusters
4. Validate & Interpret C.
1. Clustering Variables
The Clustering Process
42
Partitioning
Hierarchical
Density-based
Grid-based
Clustering Method
(Han 2011)
Proximity Measure
4. Validate & Interpret C.
1. Clustering Variables
3. Number of Clusters
2. Clustering method
K-Medoids
Include neg. match
Exclude neg. match
Euclidean Distance
There is no “one right solution” for the
number of clusters
43
large to reflect specific differences
k << n
1. Use a-priori knowledge to determine number of clusters
2. Visual approaches
3. Rule of thumb (Han 2011):
4. “Elbow” method
5. Statistical methods
𝑘 ~𝑛
2 → 𝑘 ~ 7
k?
2. Clustering method
4. Validate & Interpret C.
1. Clustering Variables
3. Number of Clusters
Several different approaches (Pham 2005, Mooi 2011, Han 2011, Everitt et. al. 2011):
“Elbow” method
44
“Elbow Method” (cf. Ketchen 1993, Mooi 2011): 1. Hierarchical clustering first 2. Plot agglomeration coefficient against number of clusters 3. Search for “elbows”
2. Clustering method
4. Validate & Interpret C.
1. Clustering Variables
3. Number of Clusters
“Elbow” method
45
0.000
0.500
1.000
1.500
2.000
2.500
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
Clustering Coefficient (distance)
<29 7 16
2. Clustering method
4. Validate & Interpret C.
1. Clustering Variables
3. Number of Clusters
Number of cluster k
Statistical Measure: Silhouette
46
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Silhouette Coefficient
2. Clustering method
4. Validate & Interpret C.
1. Clustering Variables
3. Number of Clusters
For datum i: Compares distance within its cluster to distance to nearest neigbouring cluster
−1 ≤ 𝑠(𝑖) ≤ 1
Silhouette Coefficient s(i)
Number of cluster k
Rousseeuw, Peter J. (1987): Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. In Journal of Computational and Applied Mathematics 20 (0).
The Clustering Process
47
0.335
-1 -0.5 0 0.5 1
Silhouette Value
(0.40) (0.20) - 0.20 0.40 0.60 0.80 1.00
16
111621263136414651566166717681869196
Silhouette
2. Clustering method
1. Clustering Variables
3. Number of Clusters
4. Validate & Interpret C.
good no cluster