thinking like a data scientist - goto conference · what does a data scientist do predictions,...

57
Thinking Like A Data Scientist Em Grasmeder ThoughtWorks Data Witch @emgrasmeder

Upload: others

Post on 24-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Thinking Like A Data ScientistEm Grasmeder

ThoughtWorks Data Witch@emgrasmeder

Page 2: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

About Me● Pronoun is they/them● ThoughtWorks

consultant● Graduate research in

economics○ I flunked out

● Generalist within data space

Page 3: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

<Data Science Identity Crisis>

Page 4: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

What Does a Data Scientist Do

● Predictions, categorization, clustering

Page 5: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

What Does a Data Scientist Do

● Predictions, categorization, clustering

● Write software

Page 6: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

What Does a Data Scientist Do

● Predictions, categorization, clustering

● Write software● Visualizations

Page 7: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

What Does a Data Scientist Do

● Predictions, categorization, clustering

● Write software● Visualizations ● ...magically fix the

business model??

Page 8: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

How is a Data Scientist UsefulModelsCode Visualization

??

?

?

??

?Business Value

Page 9: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Controversial opinions about data scientists● They should be good software and API developers● They should be competent at continuous delivery,

making and managing pipelines, and writing infrastructure as code

● They should speak the language of the business and be involved in conversations about KPIsIf not...

● They might not be very useful

Page 10: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

So how do data scientists actually think?

Page 11: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Let’s answer that question with a story about

Cholera!

Page 12: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Cholera Facts (yay!)Deadly bacteria that can kill within hours

The water in your body just comes out from everywhere

Page 13: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Cholera Facts (yay!)Deadly bacteria that can kill within hours

The water in your body just comes out from everywhere

Page 14: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Cholera Facts (yay!)

Deadly bacteria that can kill within hours

The water in your body just comes out from everywherePretty much curable (90% of cases) with salty, sugary water that costs $0.10

Used to be a problem, for example, in London; is still a problem in some places

Page 15: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

The cause of cholera and how it

spreads was unknown 1854.

They thought it was “miasma”

literally, bad air

Page 16: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Deadly, exploding cesspits

Waste from houses, slaughterhouses and factories dumped in the Thames

“The Great Stink”

Page 17: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

“The Great Stink”

Page 18: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

I can certify that the offensive smells, even in that short whiff, have been of a

most head-and-stomach-distending nature

Charles Dickens

“The Great Stink”

Page 19: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

f(a, b, c, ...) + ε = y

f(proximity to bad air, sinful, too much blood, other old fashioned belief) + ε = y

Page 20: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

The Broad Street Cholera Outbreak of 1854

John SnowThe data

Page 21: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

💡

Page 22: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Formally write your hypothesis

● H0 is called the Null Hypothesis● In 1850s England, the Null Hypothesis is “bad airs”

Page 23: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Formally write your hypothesis

● H0: Thing is normally distributedOR

● H0: Thing is uniformly distributed● H1: Thing is distributed differently because reason

* A model is another way of writing a hypothesis

Page 24: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Central Limit Theorem

Page 25: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Formally write your hypothesis

● H0: Thing is normally distributedOR

● H0: Thing is uniformly distributed● H1: Thing is distributed differently because reason

* A model is another way of writing a hypothesis

Page 26: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Formally write your hypothesis● H0: People living in equally

odorous parts of town will have a uniform likelihood of contracting cholera

Page 27: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Preposterous!What if we actually talked to the poor

people?

Page 28: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Collecting more data Workers at brewery were unaffected while their families diedChildren of some families died while their families lived There was this one woman, a complete outlier, the only person in her neighborhood to die

Page 29: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Formally write your hypothesis● H0: People living in equally

odorous parts of town will have a uniform likelihood of contracting cholera

Page 30: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Formally write yourhypothesis● H0: People living in equally

odorous parts of town will have a uniform likelihood of contracting cholera

● HA: People who drink contaminated poo-water have a uniform likelihood of contracting cholera

Page 31: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Formally write yourhypothesis● H0: People living in equally

odorous parts of town will have a uniform likelihood of contracting cholera

● HA: People contract cholera because they take water from certain wells

Page 32: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Collecting more data Workers at brewery drank beer which required boiling the water The children who died went to school near the infected well, far from their homes and familyThat one woman, she just loved the flavor of the water from that poo-contaminated cholera well

Page 33: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question
Page 34: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

The greatest value of a picture is when it forces us to notice what we never expected

to see. - Tukey “Exploratory Data Analysis” 1977

Page 35: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Fun fact! When the word “statistics” first came about in the 18th century, it meant the "science

dealing with data about the condition of a state or community"

Page 36: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Less fun fact! Decades after the cause and prevention of Cholera were known, states

knowingly sacrificed thousands of people’s lives for the sake of protecting businesses (for instance, Hamburg in 1892)

Page 37: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

So what are the lessons?Data is good. More data is better

Visualize your data!Do we really need machine learning for this?(Maybe states don’t have the people’s best interests at heart)

Page 38: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Data Exploration: Refining your mental model

John SnowThe data

Page 39: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Data Exploration

Page 40: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Let’s talk about models

f(a, b, c, ...) + ε = y

Page 41: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Data is good. More data is betterTry to move as much as possible from the ε into the function

f(a, b, c, ...) + ε = y

Maybe b comes from an external APIMaybe c is too complicated and needs to be split into d and eMaybe g is derived from a function/calculation based on other records or parameters a and b

Page 42: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Data is good. More data is better. Unless it’s not

f(a, b, c, ...) + ε = y

Sometimes b and c are just confusing the algorithmMethods of dimensionality reduction or principle component analysis help extract a signal from noise, and help prevent overfitting

Page 43: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

The hard part of making a model useful is not choosing a model, or even hyper- parameterizationLike in Hamburg, it’s overcoming bureaucracy and politics to use the model, even if it’s imperfect, to help your users

f(a, b, c, ...) + ε = y

Page 44: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

The data

Your model ->

It’s, you… you’re the scientist in this

metaphor now

Thinking about the business

Page 45: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”

- George Box(“one of the great statistical minds of the 20th century”)“Empirical Model Building and Response Surfaces”, 1987

Page 46: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”

- George Box(“one of the great statistical minds of the 20th century”)“Empirical Model Building and Response Surfaces”, 1987

Page 47: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”

- George Box(“one of the great statistical minds of the 20th century”)“Empirical Model Building and Response Surfaces”, 1987

Page 48: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”

- George Box(“one of the great statistical minds of the 20th century”)“Empirical Model Building and Response Surfaces”, 1987

Page 49: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”

- George Box(“one of the great statistical minds of the 20th century”)“Empirical Model Building and Response Surfaces”, 1987

Page 50: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”

- George Box(“one of the great statistical minds of the 20th century”)“Empirical Model Building and Response Surfaces”, 1987

Thinking like a data scientist means making

pragmatic choices

Page 51: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question
Page 52: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

https://collectiveactions.tech/

Page 53: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

https://collectiveactions.tech/

Page 54: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

https://collectiveactions.tech/

Page 55: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

https://collectiveactions.tech/

Page 56: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question
Page 57: Thinking Like A Data Scientist - GOTO Conference · What Does a Data Scientist Do Predictions, categorization, ... So how do data scientists actually think? Let’s answer that question

Thank you!

I’m Em Grasmederthe ThoughtWorks Data Witch@emgrasmeder on Twitter