uc berkeley data science webinar

5
Berkeley DS Webinar June 1, 2016

Upload: alpine-data

Post on 09-Feb-2017

133 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: UC Berkeley Data Science Webinar

Berkeley DS Webinar

June 1, 2016

Page 2: UC Berkeley Data Science Webinar

COMPANY CONFIDENTIAL2

How business gets involved in the modeling process (challenges involved in)• CPG (consumer packaged goods)• One of the first things I learned in the dS biz is that the biz problem is not far from the ds biz

wants to be invovled at all stages– They want to pose problem– Give perspective on solutions– Review what DS is finding,– Refine, the process and make suggestions– Understand and critique the results– Porous layer between biz and ds teams

• Can be a very positive thing: ideas on what should be included, validate if the results are meaningful, biz context needed to build good models

• Downside: biz will often lead you down paths that are not productive or defensible + anecdotes!

• Having biz involved forces you to have models that are explanatory and not just predictive this means they are meaningful

• If you just focus on prediction this will lead to overfit,

Page 3: UC Berkeley Data Science Webinar

COMPANY CONFIDENTIAL3

It’s all about the data!• Morgan Stanley we sell AA but many ppl do basic stuff with data• Means that you don’t’ spend that much time doing algo stuff, mostly

about feature generation and data prep• In SV w/ internet companies the data science is throw all the data at an

algorithm• If you can be more intelligent with feature gen, you will get better

performance • nevertheless, the more data you can get, the better

• So is acquisition of data very important and part of the process (overlooked)• Traditional world: what data to use, which transforms VERSUS throwing

data in an algorithm and hoping for the best– This is overlooked

Page 4: UC Berkeley Data Science Webinar

COMPANY CONFIDENTIAL4

It’s not about the algorithm!• Evicore example• In a very short period of time, just using the straightforward approach, we found

a way to save 10s of millions of dollars• By contrast, company like Vmware they are obsessed with applying advanced

algorithms on small amounts of data, not rich data, and not making impact on the biz

• What is more important than the algo, is finding an important biz problem and getting to a solution in a meaningful time period

• Also what is more important is operationalizing analytics result• You can have a perfect model, not in production is just an insight can die on

the vine• Simple model that can give you lift in customer acquisition and impact on fraud

that’s immediate

Page 5: UC Berkeley Data Science Webinar

COMPANY CONFIDENTIAL5

How to become a data scientist!• Personal experience and what you see during

hiring• Recruiting stuff • Plug for alpine!• Internships are the most important! Than courses

and stuffz• All about connections• Meetups