the path to be a data scientist
TRANSCRIPT
The path to be a
Data Scientist Poo Kuan Hoong, Ph.D Senior Manager Data Science, Nielsen Malaysia
Disclaimer: The views and opinions expressed in this slides are those of the author and do not necessarily reflect the official policy or position of Nielsen Malaysia. Examples of analysis performed within this slides are only examples. They should not be utilized in real-world analytic products as they are based only on very limited and dated open source information. Assumptions made within the analysis are not reflective of the position of Nielsen Malaysia.
Agenda
• What is a data scientist?
• What kinds of companies that employ data scientists?
• What are the key functions of data scientist?
• What type of work does a data scientist do?
• General Aptitude to be a data scientist
• What skillsets needed to be a data scientist?
• What is data science?
• Where do I begin?
• MDEC National Big App Challenge 3.0 Knowledge Sharing
Self Introduction Poo Kuan Hoong, http://www.linkedin.com/in/kuanhoong
• Senior Manager Data Science
• Senior Lecturer
• Chairperson Data Science Institute
• Coursera Facilitator
• Consultant
• Funding mentor
• Founder
• Speaker/Trainer
https://www.meetup.com/MY-RUserGroup/
https://www.facebook.com/rusergroupmalaysia/
What is a Data Scientist?
Data Scientist
The term "data scientist" has been around for years, and the various advanced analytics specialties that fall under it are even older.
However, due to recent explosion of data, the term has been used in the convergence of disciplines and that leads to the soaring popularity.
What are the job title?
• Data Scientist
• Data Engineer
• Big Data Engineer
• Machine Learning Scientist
• Business Analytics Specialist
• Data Visualization Developer
• BI Solutions Architect/ BI Specialist
• Operations Research Analyst
• Analytics Manager
• Machine Learning Engineer
• Statistician
• Business Intelligence (BI) Engineer
Why the Global Need?
Abundance of Data
Availability of affordable compute resources
Internet of Things (IoT) sensors data
950 Data Analyst (India)
8,411 Data Scientist (US)
808 Data Analyst (UK)
1,188 Data Manager (US)
81 Data Analyst (Australia)
80 in April 2015 1,500 by 2020
The Star, Friday, 24 April 2015 “Malaysia needs 1,500 data scientists by 2020”
What kinds of companies that employ data scientists?
MNC
Government
BANKS
What are the key functions of data scientist?
Key functions of data scientist
Devising Business
Strategies from the insights
Descriptive and Predictive
Analytics
Data Mining and Analysis
Design
Understanding the business
problem
Scenario 1: Customer Churn Analytics
Churn analytics • Predicting who will switch mobile operator
Customer churn - who do customers change operators?
• The top 3 reasons why subscribers change providers:
• They want a new handset
• They believe they pay too much for calls/data
• Providers do not offer additional loyalty benefits
Data Collection
Data Preprocessing
Attributes selection • Attribute 1 • Attribute 2 • Attribute 3
Algorithm
Training Model Score Model Apply Data /Test Data
Predicting Output
Initialization Step Learn Step Apply Step
Machine Learning Framework
Correlation Matrix
Feature selection
Models comparison
• Receiver operating characteristic curve (ROC curve) illustrates the performance of a binary classifier system as its discrimination threshold is varied.
Scenario 2: Market Basket Analysis
Market Basket Analysis Where should detergents be placed in the store to maximize sales?
Are bleach products purchased when detergents and orange juice are bought together?
Is cola typically purchased with bananas? Does the brand of cola make a difference?
How are the demographics of the neighbourhood affecting what customers are buying?
What type of work does a data scientist do?
http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#f37c7f758459
http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#f37c7f758459
General aptitude to be Data Scientist
Data Scientist
• Common sense • Curious mind • Clear and simplify
thought
• Love to solve puzzles
• Good listening, writing and communication skills
• Maths & Stats
• Business sense
I have 4 red, 18 black and 8 brown socks in my sock drawer. If it is completely dark and I cannot see the colour of the socks that I am picking, how many socks do I need to take from the drawer to be sure that I have at least one pair of socks that are the same colour?
What is the hidden number under the car?
What skillsets needed to be a data scientist?
Data scientist skillsets
• Data Mining
• Machine Learning
• R/Python
• Data Analysis
• Statistics
• SQL
• Java
• Algorithms
Image Source: http://imgur.com/hoyFT4t
What is the average salary?
Average salary: Data Scientist
What is data science?
Data Science
• Data science is as an evolutionary step in interdisciplinary fields like business analysis that incorporate computer science, modeling, statistics, analytics, and mathematics.
• At its core, data science involves using automated methods to analyze massive amounts of data and to extract knowledge from them.
• Drawing insight from a piece of data involves understanding how it fits into the larger picture of an organization,
Where do I begin?
Massive Open Online Course (MOOC)
• MSC Malaysia MyProCert (SRI) – Data Science Massive Open Online Courses (MOOC)
• The Center of Applied Data Science (MDEC & HRDF)
• John Hopkins University – Data Science Specialization
• University of Washington - Data Science at Scale Specialization
• Data Analyst Nanodegree - Udacity
• CSCI E-109 Data Science (Harvard Extension School)
• Machine Learning - Stanford University
BDA Undergraduate & Postgraduate Programme Undergraduate
• Multimedia University – Bachelor of Computer Science (Data Science Specialization)
• Sunway University - BSc (Hons) Information Systems (Business Analytics)
• Universiti Teknologi Malaysia (UTM), International Islamic University Malaysia, Monash University, University Institute Technology Mara (UiTM) & University Teknologi Petronas (UTP).
Postgraduate
• Big Data Analytics Post Graduate Programme
Kaggle
• Data sets, real problems, in unprocessed manner.
• Recommend to go through past competitions.
• Read through the forums with particular competitions to find out useful discussion and tips/hints that will be useful for solving future problems.
• https://www.kaggle.com/
UC Irvine Machine Learning Repository
• 360 data sets as a service to the machine learning community http://archive.ics.uci.edu/ml/
Open data
• Open data from various countries
• Malaysia - http://www.data.gov.my/
• Singapore - https://data.gov.sg/
MDEC National Big App Challenge 3.0
• June 4th – June 5th 2016, Berjaya Times Square
• The themes for AHKL2016 were as follows:
1. Big Data Analytics --- Powered by MDEC. Access to 65mil rows of real datasets sponsored by iProperty.com Malaysia
2. O2O Commerce --- Powered by MOLWallet MOLPay
3. Smart Living --- Powered by TIME Internet
National MDEC Big App Challenge 3.0
PropertySenze • B2B business model
• Provide machine learning and AI services to customers
• Visual Search
• Personalized customer experience
BUSINESS MODEL
Big Data becomes Smart Data
1. PropertySenze contracts with
property sites and property developers
to generate analytics and visual
search
5. Analytics at the fingertips for both buyers and sellers
2. PropertySenze’s machine learning algorithm
enables search and buy similar properties that user
sees on the sites, from user‐generated photos and from user‐uploaded images
3. Enhanced search experience and personalized results for users
7. PropertySenze verifies all
transactions and charges
commission fees every month
4. Improved platform that recognizes
properties for retrieval purposes or instant
purchases.
6. Improved user experience that leads to more
engagement and sale transactions
PropertySenze
Hackathon: Tips
• Have a well-shaped team with not more than one server-side developer with relevant experience, one good designer and one the amazing storyteller
• Understand the expected outcomes of the hackathon
• Develop something that everyone can see the benefits
• Have an impressive aim or objective
• Start promoting your product during the hackathon
• Hit the demo 100%. The pitch is for the product to shine
Thanks!
Questions?
@kuanhoong
https://www.linkedin.com/in/kuanhoong