data science unit introduction
TRANSCRIPT
Data Science StartupDiscussion Document
This overview is not intended to be a business case for data science. It is expected that you are already familiar with the value proposition. However, a reference to several case study examples has been included at the end of this document as a reminder of the broad applicability of the subject at hand.
The intent of this document is to set in motion the discussion for the creation of a startup in South Africa that is focused on data science. To be clear the objective of this startup is to:
Capture the best talent that exists in South Africa in data science
Be the leader in data science in Southern Africa and the go-to for organisations seeking services, products and training
Be a leader in the global data science marketplace by having the best people in the business and a competitive advantage over international firms based on lower people costs
Do not be deluded into thinking that this undertaking is at all easy. However challenges inherent in this undertaking are an opportunity as they serve as barriers to entry for those seeking to compete.
Intent
copyright Gregg Barrett August 2016
Example of some of the current uses of data science
- Detection of unauthorized trading activity - Accelerating biomedical research
- Identification of data abuse to protect sensitive information and intellectual property - Discovery of patterns of behaviour and links between key actors
- Preparing for major political and economic transformations - Anticipating emerging threats such as the planning of terrorist attacks
- Accurate rating for insurance underwriting - Improving patient outcomes
- Predicting disease outbreaks - Predicting the path of wildfires
- Detection of information security threats - Detection and elimination of sophisticated criminal activity
- Identification of poachers from real time drone footage and audio networks - Autonomous driving vehicles
- Managing datacentre infrastructure - Product recommendations
- Predicting part failure - Improving transportation efficiency
- Customer/contact centre support - Understanding consumer sentiment
- Market making in securities - Language translation
- Credit scoring - New craft beer recipes!
copyright Gregg Barrett August 2016
Describing data science is like trying to describe a sunset.
It should be easy, but somehow capturing the words is impossible
(Booz Allen Hamilton, 2015)
copyright Gregg Barrett August 2016
We shall use the following definition
Data science is the utilisation of a vast set of tools for modelling and understanding complex datasets.
To simplify matters we shall consider;
analytics
machine learning
artificial intelligence
and big data
as being part of our data science framework.
Data science is NOT:
fancy looking reports (product of SQL queries)
spiffy dashboards (sexy bar graphs and pie charts)
a wonderfully expensive Business Intelligence offering
copyright Gregg Barrett August 2016
The future of data science
What happened? A company that wasn’t even in your industry launched a new product and has completely flattened you. Sound familiar? It does for anyone who’s familiar with Uber. Uber first launched as a transportation service, using data and analytics to provide customers with easy, accessible and fast transportation directly from their phone. Now, Uber has since expanded to beyond just transportation, offering additional services from consumers’ phones such as meals and delivery. (IBM, 2016)
Some of the hottest, most critical domains in which data science will be applied in the coming years include:
Cybersecurity including advanced detection, modelling, prediction, and prescriptive analytics
Healthcare including genomics, precision medicine, population health, healthcare delivery, health data sharing and integration, health record mining, and wearable device analytics
IoT (Internet of Things) including sensor analytics, smart data, and emergent discovery alerting and response
Customer Engagement and Experience including 360-degree view, gamification, and just-in-time personalization
Smart X, where X = cities, highways, cars, delivery systems, supply chain, and more
Precision Y, where Y = medicine, farming, harvesting, manufacturing, pricing, and more
Personalized Z, where Z = marketing, advertising, healthcare, learning, and more
Human capital (talent) and organizational analytics
Societal good (Booz Allen Hamilton, 2015)
copyright Gregg Barrett August 2016
Examples of those with data science at their core
Two of the worlds most successful hedge funds:
Renaissance Technologies LLC
Bridgewater Associates
A British startup in 2010, acquired by Google in 2014 for around 600 million USD:
DeepMind
One of the first Data Science consulting firms founded in 1995:
Elder Research
A startup focused on autonomous driving:
comma.ai
A startup focused on cybersecurity:
SparkCognition
copyright Gregg Barrett August 2016
Fighting blind without data science
Float like a butterfly, sting like a bee,
for most firms in South Africa they can’t hit what they can’t see.
copyright Gregg Barrett August 2016
Why South Africa
Value proposition for data science in South Africa is no different from that in other countries.
Globally skills are in short supply and in South Africa the problem is even more acute.
For the handful or persons in South Africa with the necessary competence, opportunities abroad are compelling, as compensation is around 3 times what they would receive in South Africa.
Data science in South Africa is for the most part in a nascent state. Leading solution providers for example have no presence anywhere on the African continent:
MapR Cloudera Hortonworks
Datameer Trifacta Paxata
Palantir Elder Research Alpine Data Labs
RapidMiner SparkCognition Pivotal Software
For international organisations weakness in the South African economy and the South African rand make the value proposition of a South African based provider compelling.
copyright Gregg Barrett August 2016
It’s more about people than about machines
At the very core of this undertaking are people - they are the key to success. Only the truly brilliant will do. They are the outliers and are not easily sourced or recruited. Fortunately, these people tend to be averse to; “Fortunte 500”, “multinational”, “blue chip organisation”, which invoke thoughts of stifling bureaucracy and politics. A startup is what appeals to them, where they have their say, are individuals within a team, have a stake in something that can make a difference and where they can be themselves.
They are a rather scarce commodity in South Africa. However this presents an opportunity as the scarcity of talent serves as an impediment to firms seeking to compete and build competence in this space.
Capturing the best and the brightest in the data science market in South Africa is a primary objective.
What it takes to manage such an operation.
copyright Gregg Barrett August 2016
Winner-takes-all
In this field one brilliant person can deliver the work of 10 average persons. It is critical that every individual that is a part of this startup have skin-in-the-game through an equity stake. The equity position serves to attract and retain the people we seek.
People cost is the single largest cost, but also a source of competitive advantage. As a guide for data science positions in the United States:
Entry level position: 100 000 USD base salary
Mid-level position: 150 000 – 250 000 USD base salary
Senior level position: 300 000 – 500 000 USD base salary
South Africa cannot compete with such levels of compensation – a contributing factor why much of the talent leaves the country. We do not have to have such compensation levels however in order to be successful. It is estimated that we can comfortably operate at around 65% - 75% of the cost of a comparative firm in the US. A cost saving of 25% will be a major competitive differentiator and particularly attractive to international firms.
As a guide in South Africa we would aim for:
Entry level position: 600 000 ZAR base salary
Mid-level position: 800 000 ZAR base salary
Senior level position: 1 000 000 ZAR base salary
We believe the following strategy will be attractive:
compensation levels higher that what is currently offered by local organisations
an equity position
being part of a startup composed only of the best
an opportunity to make a major impactcopyright Gregg Barrett August 2016
About people
I said that the best in this business are a rather scarce commodity but what do they look like? Herewith are a couple of examples:
Gabor Melis
George Hotz
What are some of the skills that these persons possess? The document “The Quest for Unicorns” by Elder Research serves as a good starting point:
The Quest for Unicorns by Elder Research
The following article from The Economist gives some insight into just how intense the arms race for talent has become:
As Silicon Valley fights for talent, universities struggle to hold on to their stars
copyright Gregg Barrett August 2016
Options for South African organisations pushing forward on data science
1. Build the capability internally: Such an approach will be challenging, with most firms not even knowing where to start. The shortage of talent simply compounds the problem.
2. Retain the services of an outside firm: There are several outside of South Africa. Such an approach will be costly though due to dollar exposure. Therefore, the likely approach will be to restrict the search to the local market, supporting our proposition - and a proposition that conversely will be appealing to international firms.
3. Incubate/finance a separate entity and in so doing gain the necessary business capability as well as the added benefit of an equity position which could generate financial gain.
copyright Gregg Barrett August 2016
What is needed
The following options are being considered:
Startup via funding: funding the startup as a wholly separate entity for a three year stretch in exchange for an equity position
Startup via incubation: incubating the startup within an existing organisation, where the startup generates value for the organisation and where the organisation has an equity position in the startup with a view to a spin off once it has reached sufficient scale
Startup via initial clients: securing sufficient initial clients under contract to cover start-up costs
copyright Gregg Barrett August 2016
Budget
We are looking to put together a 5 to 10 person team. This would require a budget of 5 – 10 million ZAR a year for three years.
The budget calculation is rather straight forward:
5 million ZAR a year for a 5 person team
10 million ZAR a year for a 10 person team
The nature of the business means that it does not require investment in physical assets. Electricity and an internet connection for access to cloud infrastructure are the primary requirements. The startup is thus minimally exposed to risks in the South African operating environment. Further, cloud infrastructure requirements are scaled as and when needed – pay as you go.
Risk
Probability of increases in income tax and corporate tax rates in South Africa are viewed as a risk which could place upward pressure on operating costs. However there are options to mitigate this risk.
copyright Gregg Barrett August 2016
Revenue sources
Consulting
Strategy
Execution
Product
Product will be created as and when the need arises. However consulting would be the initial focus with product being a longer term focus.
Training
Approach
The approach is to be as agnostic as possible when it comes to platform/technology/products.
We would also seek to develop academic collaboration with the likes of UCT and WITS.
Example of the Bloomberg Labs Data Science program.
copyright Gregg Barrett August 2016
Areas for consulting
Cross Industry Standard Process for Data Mining (CRISP-DM) approach is a data mining process model that provides a reference methodology for conducting data mining. The tasks and output listed in the approach gives an example of areas where consulting work can be provided in executing a data science project.
Figure 1: Generic tasks (bold) and outputs (italic) of the CRISP-DM reference model
copyright Gregg Barrett August 2016
Optionality through data driven business models
There is a growing trend of data driven technology companies utilising their own solutions to compete with incumbents in the marketplace, as opposed to licensing their offerings to established incumbents. For example, let’s say that Google finds a new way to price and deliver insurance. An approach which is now seemingly more frequently being considered is rather than licensing it to an existing participant(s) in the insurance market, they setup their own insurance entity – with negative interest rates in many parts of the world, capital is abundant and operating licenses are not impossible to obtain.
Mondo is an example of such thinking:
Digital challenger bank Mondo just got its banking licence
Uber is another:
Uber’s First Self-Driving Fleet Arrives in Pittsburgh This Month
copyright Gregg Barrett August 2016
Data Charlatans
I spoke earlier of the need to recruit the best and the brightest. Why you ask? Get things wrong and at best you look silly at worst your blow things up:
Example of getting it wrong and looking silly:
John Gray: Steven Pinker is wrong about violence and war
Example of blowing up:
Recipe for Disaster: The Formula That Killed Wall Street
Big Data brings it’s own set of challenges:
Beware the Big Errors of ‘Big Data’
Beyond Big Data: Identifying Important Information for Real World Challenges
copyright Gregg Barrett August 2016
A note for insurance
Traditional actuarial approaches are no match for current data and computing resources available with the likes Gradient Boosting Machines, Neural Networks and ensembles of such providing far superior levels of accuracy.
“As more insurers use predictive analytics, those not doing so will be increasingly exposed to adverse selection because their market will be limited to a subsection for the general population that has worse-than-average loss ratios.” (Nyce, 2007)
Analytics has the potential to make a positive impact on virtually every aspect of the insurance life cycle.
Product development
Marketing and distribution
Pricing and underwriting
Risk control
Claims management
Performance management (Accenture, 2013, pg. 5)
For a more comprehensive overview of data science in insurance:
Value proposition of analytics in P&C insurance
copyright Gregg Barrett August 2016
Further reading of potential interest
Bridgewater Associates building an artificial intelligence competence: Bridgewater Is Said to Start Artificial-Intelligence Team
Bloomberg LP building a machine learning competence: Bloomberg and “the magic” of machine learning
Example of Google using it’s DeepMind unit to save on energy consumption: Google Cuts Its Giant Electricity Bill With DeepMind-Powered AI
Example of the arms race for data: Tiny Satellites: The Latest Innovation Hedge Funds Are Using to Get a Leg Up
copyright Gregg Barrett August 2016
Case studies on data science abound on the internet, for example:
Healthcare: When Health Care Gets a Healthy Dose of Data – Intermountain Healthcare
Industrial: The Industrial Internet – GE Digital
Automotive: The Connected Vehicle Data Platform – Ford Motor Company
Insurance: Geospatial Analytics – Progressive Insurance
Case Studies from MIT Sloan Management Review: MIT Sloan Management Review Case Studies
Case Studies from Elder Research:
Defense and intelligence: Automating Textual Data Discovery And Analysis
Nonprofit Service Organization: Determining Influential Factors for Conference Satisfaction
Pharmaceutical: Discovering the Efficacy of a New Drug
Retail, Consumer Electronics: Enhancing Customer Loyalty
Government, Healthcare: Improving Claims Approval Speed and Accuracy
Retail Banking, Financial Services: Improving Credit Card Risk Scoring
Telecommunications: Improving Customer Retention and Profitablity
Healthcare Insurance: Improving Provider Performance and Patient Outcomes
Retail Banking, Financial Services: Predicting Financial Account Churn
Oil and Gas: Predicting Natural Gas Well Freezing
Government: Prioritizing Building Lease Renewals
Healthcare Insurance: Prioritizing Long-Term Care Claims
Government: Reducing Fraud, Waste, and Abuse
Retail, Computer and Electronic, Product Manufacturing: Reducing Service Provider and Warranty Fraud
IT Management: Staffing Optimization
Insurance: Understanding Customer Sentiment
Retail, Commercial Software: Using Log Analytics to Improve User Experience
There are no shortage of conferences either, for example: Bloomberg Data for Good Exchange
Organized around the following topic areas
- Justice and fairness, including criminal justice, discrimination, algorithmic bias, workers’ rights, voting rights, etc.
- Economic development, including housing, job security, immigration, wages, challenges coming from the “gig” economy, remittance services, etc.
- Security and safety, including emergency services, cyber-attacks, dark web and illegal content, gun control, resilience, etc.
- Public service delivery, including transportation, sustainability, biodiversity and health monitoring, public health, etc.
copyright Gregg Barrett August 2016
Compiled by:
Gregg Barrett
copyright Gregg Barrett August 2016
Reference
Accenture. (2013). The digital insurer: achieving payback in insurance analytics. [pdf]. Retrieved from http://www.accenture.com/us-en/Pages/insight-payback-insurance-analytics.aspx
Booz Allen Hamilton. (2015). The field guide to data science. [pdf]. Retrieved from https://www.boozallen.com/content/dam/boozallen/documents/2015/12/2015-FIeld-Guide-To-Data-Science.pdf
CRISP-DM. (2000). Generic tasks (bold) and outputs (italic) of the CRISP-DM reference model. [Figure]. Retrieved from CRISP-DM. (2000). CRISP-DM 1.0. [pdf]. Retrieved from https://the-modeling-agency.com/crisp-dm.pdf
IBM. (2016). Why data science should be your priority. [pdf]. Retrieved from http://www.ibmbigdatahub.com/blog/why-data-science-should-be-your-top-priority
Nyce, C. (2007). Predictive analytics white paper. [pdf]. Retrieved from http://www.theinstitutes.org/doc/predictivemodelingwhitepaper.pdf
copyright Gregg Barrett August 2016