your data scientist hates you

21
Your Data Scientist Hates You Bradford Stephens bradford@roboticprofit.com ft. help from Nick Kypreos nick@roboticprofit.com

Upload: bradford-stephens

Post on 16-Apr-2017

173 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Your Data Scientist Hates You

Your Data Scientist Hates YouBradford Stephens

[email protected]

ft. help from Nick Kypreos [email protected]

Page 2: Your Data Scientist Hates You

About Us

Data Infrastructure and Data Science at scale.

Page 3: Your Data Scientist Hates You

fact

71%1 of Data Science projects fail

Page 4: Your Data Scientist Hates You

fact

63%2 of Data Scientists quit in < 2 years

Page 5: Your Data Scientist Hates You

• Data Scientists aren’t involved from the beginning

• No strategy

• Bad Data: more common than you think and untestable

• Pointlessness

Why

Page 6: Your Data Scientist Hates You

• Everything stems from this

• Goals need to be attainable

• Data needs to be accessible and formatted correctly

• You can’t conceive of what’s possible (or impossible)

Involvement

Page 7: Your Data Scientist Hates You

Your Data Strategy

Page 8: Your Data Scientist Hates You

Your Data Strategy: Diagnostic

• Diagnostic: How did we get here?

• Understanding history and how your org drives decisions is key

• What will your org’s immune system allow?

• Infrastructure: what is currently in place and how did it happen?

• Goals: How do we drive revenue or KPIs?

Page 9: Your Data Scientist Hates You

Your Data Strategy: Roadmapping

• Roadmapping: What are we going to build?

• Data Architecture?

• Platform feasible?

• Who builds what when, for how much?

• How do we ensure a low-latency feedback loop? DS highly iterative

Page 10: Your Data Scientist Hates You

Your Data Strategy: Development

• Platform: What’s our stack?

• Storage: Where does data come from, go to, and latency/throughput requirements on storage?

• Processing: Where do we transform data? Batch? Real-time? Bounds?

• Collaboration: How do we share results, data, and APIs across the org? (always forgotten)

Page 11: Your Data Scientist Hates You

Bad Data

Page 12: Your Data Scientist Hates You

Data Science is Untestable

Page 13: Your Data Scientist Hates You

Data Science = Math + QA + CS + PM + Psionics

Page 14: Your Data Scientist Hates You

Untestable

• Data Scientists spend vast amounts of time fixing data

• …and you need to be OK with that

• Unit Testing doesn’t make sense in science

• Distributions fittings, etc

• Can only test via simulation: a whole ‘nother process

• “Simple” things take weeks to verify

Page 15: Your Data Scientist Hates You

Instrumentation

• Can you even verify your instrumentation?

• Are you collecting everything?

• Collecting the right thing?

• What if only 85% of the time?

• Systematically drop at high enough traffic?

• Someone comes into site through different channel from an acquisition 2 yr ago?

Page 16: Your Data Scientist Hates You

Software is Garbage

• Remember Hadoop?

• Spark?

• MLib bugs for years

• Wrong math won’t fail unit tests

• GIGO

• JSON, weekly microversioning, schema entropy…

• This is why DS efforts are so slow to start w/o initial involvement

• Don’t build the One True Data Platform

• one of our customers had 30 DBs including a critical out-of-license DB2 box

Page 17: Your Data Scientist Hates You

Pointlessness

Page 18: Your Data Scientist Hates You

! Dashboards ! are ! not ! a ! strategy

Page 19: Your Data Scientist Hates You

“Here’s some data, just tell us what’s interesting…”

Page 20: Your Data Scientist Hates You

“We didn’t think that was interesting, you’re bad at your job.”

Page 21: Your Data Scientist Hates You

Data Must be Treated like a Product

• Build a Data Products Team

• Engineers, PMs, Design. Data Science. Not just analysts.

• KPIs, Goals, Measurability, Backlogs

• Budget

• Freedom to Innovate

• Staff of diverse backgrounds

A Data Platform will touch every part of your org