sharing about my data science journey and what i do at lazada

62
Hi, I’m Eugene I’m here to share about my data science journey and what I do at Lazada 4 th April 2016 SMU Masters of IT in Business

Upload: eugene-yan-ziyou

Post on 21-Apr-2017

1.398 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Sharing about my data science journey and what I do at Lazada

Hi, I’m Eugene I’m here to share aboutmy data science journey andwhat I do at Lazada

4th April 2016SMU Masters of IT in Business

Page 2: Sharing about my data science journey and what I do at Lazada

Before I begin, any questions you would like addressed?

I’ll answer throughout my sharing.

Page 3: Sharing about my data science journey and what I do at Lazada

An introduction about myself

Page 4: Sharing about my data science journey and what I do at Lazada

Studied Psychology and Businessat Singapore Management University (SMU); wanted to usedata to create positive impact

Page 5: Sharing about my data science journey and what I do at Lazada

Did economic and political analysis at Ministry of Trade & Industry (MTI)

Page 6: Sharing about my data science journey and what I do at Lazada

Joined IBM to pursue passionin working with data

Page 7: Sharing about my data science journey and what I do at Lazada

First step into data science as a data analyst, where I…

Page 8: Sharing about my data science journey and what I do at Lazada

Developed dashboards and analytics for end-to-end supply chain optimization

Page 9: Sharing about my data science journey and what I do at Lazada

Worked on an anti-money laundering and entity resolution system for a global bank

Page 10: Sharing about my data science journey and what I do at Lazada

Collected and analyzed tweets to provide insight on tweet share and sentiment for electronics conglomerate

Page 11: Sharing about my data science journey and what I do at Lazada

Then, was transferred to workforce analytics team, working on data from IBM’s 450k employees to build…

Page 12: Sharing about my data science journey and what I do at Lazada

Forecast models for global job demand to optimize recruitment and workforce allocation

Page 13: Sharing about my data science journey and what I do at Lazada

Job recommendation engine to increase internal transfers, skill renewal, satisfaction, and reduce attrition

Page 14: Sharing about my data science journey and what I do at Lazada

Currently at Lazada’s Data Science team; more later

Page 15: Sharing about my data science journey and what I do at Lazada

My data science journey

Page 16: Sharing about my data science journey and what I do at Lazada

Skill sets needed to be a data analyst and how I acquired them

Page 17: Sharing about my data science journey and what I do at Lazada

Probability, statistics and experimental design from education in Psychology

Page 18: Sharing about my data science journey and what I do at Lazada

Technical skills in SPSS Statistics and R from undergraduate education in Psychology

Page 19: Sharing about my data science journey and what I do at Lazada

Written and verbal communication from essays and presentations (SMU), and briefs and stakeholder engagement with industry leaders (MTI)

Page 20: Sharing about my data science journey and what I do at Lazada

Teamwork from projects in SMU and MTI

Page 21: Sharing about my data science journey and what I do at Lazada

Skill sets needed to be a data scientist and how I acquired them

- Statistics- Experimental

Design- SPSS & R- Communication- Teamwork

Page 22: Sharing about my data science journey and what I do at Lazada

More R via MOOCs:- Data Analysis and statistical inference (Duke) - Computing for Data Analysis ( Johns Hopkins)

Page 23: Sharing about my data science journey and what I do at Lazada

Python via MOOCs: - Computer Science and Programming in Python (MIT)- Interactive programming in Python (Rice)

Page 24: Sharing about my data science journey and what I do at Lazada

SQL via any site with in-browser query engine

Page 25: Sharing about my data science journey and what I do at Lazada

Machine Learning via MOOCs:- Machine Learning (Stanford)- Statistical Learning (Stanford)- Social and Economic Networks (Stanford)- Text Mining and Analytics (Urbana-Champaign)

Page 26: Sharing about my data science journey and what I do at Lazada

Distributed storage and processing via MOOCs: - Mining Massive Datasets (Stanford)- Big data with Apache Spark (UC Berkeley) - Scalable Machine Learning with Apache Spark (UC Berkeley)

Page 27: Sharing about my data science journey and what I do at Lazada

Learning alone is insufficient; I also had to practice (a lot)

Page 28: Sharing about my data science journey and what I do at Lazada

Volunteer for things people don’t want to do- Volunteered for project on Twitter tracking with $0 budget

Page 29: Sharing about my data science journey and what I do at Lazada

Twitter project: Connect to API, download tweets 24/7 over 2 weeks, analyze tweets; learnt how to:- Work with APIs- Recover from failure automatically- Work with data that can’t fit in memory- Text analytics and sentiment analysis

Page 30: Sharing about my data science journey and what I do at Lazada

Volunteer with DataKind SG and helping NGOs tackle problems through data science

Page 31: Sharing about my data science journey and what I do at Lazada

Volunteer to facilitate Johns Hopkins Data Science Specialization (Statistical Inference)

Page 32: Sharing about my data science journey and what I do at Lazada

Kaggle meaningfully on competitions with real-world applications; competitions I’ve tried include…

Page 33: Sharing about my data science journey and what I do at Lazada

Otto Production Classification: Classifyproducts into 9 main product categories

Page 34: Sharing about my data science journey and what I do at Lazada

Springleaf Marketing Response:Predict if customers will respond to direct mail

Page 35: Sharing about my data science journey and what I do at Lazada

Telstra Network Disruptions:Predict severity of service disruption

Page 36: Sharing about my data science journey and what I do at Lazada

Skill sets to be a better data scientist (what I’m focusing on now)

- Statistics- Experimental

Design- SPSS & R- Communication- Teamwork

- Python- SQL- Machine Learning- Distribute Storage

& Processing

Page 37: Sharing about my data science journey and what I do at Lazada

Finding problems and opportunitiespeople overlook

Page 38: Sharing about my data science journey and what I do at Lazada

Proper software engineering

Page 39: Sharing about my data science journey and what I do at Lazada

Designing and buildingdata products end-to-end

Page 40: Sharing about my data science journey and what I do at Lazada

Building data products using Spark (Scala)

Page 41: Sharing about my data science journey and what I do at Lazada

My journey so far…

- Statistics- Experimental

Design- SPSS & R- Communication- Teamwork

- Python- SQL- Machine Learning- Distribute Storage

& Processing

- Finding use cases- Software Engineering- Designing data

products- Spark & Scala

Page 42: Sharing about my data science journey and what I do at Lazada

So what can you do?- Get very good at basic SQL- Get very good at either R or Python- Understand basic machine learning techniques- Understand distributed systems and processing- Improve communication by writing and sharing

- Get experience by doing projects on machine learning and distributed processing (e.g., Open data, Volunteering, Kaggle, etc)

Page 43: Sharing about my data science journey and what I do at Lazada

What I do at Lazada

Page 44: Sharing about my data science journey and what I do at Lazada

Lazada Data Science: Data Engineers, Scientists, Tool Developers

Page 45: Sharing about my data science journey and what I do at Lazada

A rough guide to each role

Collect, store, maintainEngineers

Explore, prepare, modelScientists

Expose, integrate, platform-izeTool Developers

Lines may blur between roles

Page 46: Sharing about my data science journey and what I do at Lazada

Problems we work on…

Page 47: Sharing about my data science journey and what I do at Lazada

Product-related:- Product Categorization- Attribute Extraction- Spam Detection- Image Quality Checking

Page 48: Sharing about my data science journey and what I do at Lazada

Consumer-related:- Recommendations- Product Ranking- Consumer Segmentation- Customer Lifetime Value

Page 49: Sharing about my data science journey and what I do at Lazada

Seller-related:- Price Elasticity- Detecting Counterfeits

Page 50: Sharing about my data science journey and what I do at Lazada

Operation-related:- Delivery time forecasting

Page 51: Sharing about my data science journey and what I do at Lazada

What I’m working on

Page 52: Sharing about my data science journey and what I do at Lazada

Product categorization

Product title & description

Machine Learning Categorization

Rules-based Categorization

CrowdCategorization

Product Category

Quality Checking and Validation

Sufficient confidence

If insufficient confidence

API for self-service

Production

Scheduled batch jobs

Product Category

Page 53: Sharing about my data science journey and what I do at Lazada

Product Ranking for onsite display

Product Data

Purchase Data

Behavioral Data (e.g., clickstream)

Other Data (e.g., ratings, etc)

Merging datasets

Feature Engineering

Model product rankings

Data Cleaning

Rule-based modifiers

Measurement & A/B Testing

Page 54: Sharing about my data science journey and what I do at Lazada

Recommendations for newsletter subscribers

Product Data

Purchase Data

Behavioral Data (e.g., clickstream)

Other Data (e.g., ratings, etc)

Merging datasets

Feature Engineering

Data Cleaning

Customer Segmentation

Forecasted Top Sellers

Recommendations Newsletter Creation

Measurement & A/B Testing

Rule-based modifiers

Page 55: Sharing about my data science journey and what I do at Lazada

How is my time spent

Page 56: Sharing about my data science journey and what I do at Lazada

Data Preparation,

50%

Modeling, 20%

Productionizing, 30%

Coding Breakdown

Majority of time spent coding (thankfully)

Coding, 55%

Engagment, 30%

Others, 15%

Page 57: Sharing about my data science journey and what I do at Lazada

Data Preparation- Merging data- Imputing nulls- Removing duplicates- Handling outliers- Fixing formats- Etc, etc, etc

Page 58: Sharing about my data science journey and what I do at Lazada

Building the model- Feature engineering- Machine learning- Validation- Iterate, iterate, iterate

Page 59: Sharing about my data science journey and what I do at Lazada

Deploying to production- Proof-of-concept- Developing API- Scheduling jobs- Continuous integration- Fixing bugs

Page 60: Sharing about my data science journey and what I do at Lazada

Engagement (with stakeholders)- Roadmap planning (quarterly)- Aligning solution with problem- Explaining and getting buy-in

Page 61: Sharing about my data science journey and what I do at Lazada

Other tasks- Providing assistance- Research and brainstorming- Team sharing

Page 62: Sharing about my data science journey and what I do at Lazada

Any further questions?

[email protected]@lazada.com