the sidekick pattern: strata talk by abe gong

Post on 28-Oct-2014

691 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from my Strata talk: http://strataconf.com/strata2014/public/schedule/speaker/163953 Abstract: Creating value from big, messy data sets can be a daunting task. The session introduces the Sidekick Pattern: using small, curated data to increase the value of Big Data. Drawing on lessons from data science for Jawbone’s UP fitness tracker, we will see how smart selection of data sidekicks can accelerate analysis, solve cold start problems, and simplify complicated data pipelines.

TRANSCRIPT

THE SIDEKICK PATTERN: USING SMALL DATA TO MULTIPLY

THE VALUE OF BIG DATA@AbeGong

Data Scientist, JawboneStrata - February 2014

Wednesday, February 12, 14

Wednesday, February 12, 14

Wednesday, February 12, 14

Wednesday, February 12, 14

Wednesday, February 12, 14

Wednesday, February 12, 14

Wednesday, February 12, 14

DATA SIDEKICKS

Wednesday, February 12, 14

EX: HIEROGLYPHTRANSLATION

Wednesday, February 12, 14

EX: HIEROGLYPHTRANSLATION

Wednesday, February 12, 14

EX: HIEROGLYPHTRANSLATION

Wednesday, February 12, 14

EX: CAMPAIGN TARGETING

Wednesday, February 12, 14

EX: CAMPAIGN TARGETING

Wednesday, February 12, 14

EX: CAMPAIGN TARGETING

Wednesday, February 12, 14

EX: SLEEP CONTEXT

Wednesday, February 12, 14

EX: SLEEP CONTEXT

Wednesday, February 12, 14

EX: SLEEP CONTEXT

Wednesday, February 12, 14

SUB-TITLE[DATA ART EXAMPLE]

Wednesday, February 12, 14

Wednesday, February 12, 14

EXAMPLES, PLEASE:WHICH DATA STREAMS GET

BIG?(...AND BESIDES SIZE, WHAT ELSE DO THEY HAVE IN COMMON?)

Wednesday, February 12, 14

BIG, RICH, MESSY

Wednesday, February 12, 14

CAREFULLY CURATEDBIG, RICH, MESSY

Wednesday, February 12, 14

TRANSMUTATION!

Wednesday, February 12, 14

EX: HUFFPO MODERATION

Wednesday, February 12, 14

Wednesday, February 12, 14

Wednesday, February 12, 14

EX: HUFFPO MODERATION

Wednesday, February 12, 14

EX: HUFFPO MODERATION

Wednesday, February 12, 14

WHEN SHOULD I USE THE SIDEKICK PATTERN?

Wednesday, February 12, 14

WHEN SHOULD I USE THE SIDEKICK PATTERN?

• To separate munging and cleaning from scaling.

Wednesday, February 12, 14

WHEN SHOULD I USE THE SIDEKICK PATTERN?

• To separate munging and cleaning from scaling.

• To bootstrap new data products.

Wednesday, February 12, 14

WHEN SHOULD I USE THE SIDEKICK PATTERN?

• To separate munging and cleaning from scaling.

• To bootstrap new data products.

• To leverage variety against volume.

Wednesday, February 12, 14

EX: SLEEP RECOVERY

Wednesday, February 12, 14

EX: SLEEP RECOVERY

Wednesday, February 12, 14

EX: SLEEP RECOVERY

Wednesday, February 12, 14

EX: SLEEP RECOVERY

Wednesday, February 12, 14

Wednesday, February 12, 14

Wednesday, February 12, 14

LEVELS OF ABSTRACTION

Wednesday, February 12, 14

LEVELS OF ABSTRACTION

Wednesday, February 12, 14

LEVELS OF ABSTRACTION

Wednesday, February 12, 14

QUESTIONS? COMMENTS?

@AbeGongData Scientist, JawboneStrata - February 2014

Wednesday, February 12, 14

Wednesday, February 12, 14

SmallFocusedCurated

AbstractBusiness logicInternal-facing

“Quantitative”Science-making

BigRich

Messy

SensoryUser experienceExternal-facing

“Qualitative”Story-making

Wednesday, February 12, 14

TRANSMUTATION EXAMPLESExample Property

Rosetta stone Synonyms/Comparability

Campaign targeting Demographic categories

Sleep context Context

Instrumental variables Causality

HuffPo moderation Credibility

Sleep recovery Clean examples

Economic mobility Continuity

Crowdflower gold Credibility

Example Property

Bridge cases in IRT scaling models Relative ranking

Sentiment analysis Categories

Pretty much all supervised learning Categories/Scales

...

Wednesday, February 12, 14

RECOMMENDED READING

• Pete Skomoroch: http://www.slideshare.net/pskomoroch/strata-endorsements-16939466

• Paco Nathan: http://www.slideshare.net/pacoid/using-cascalog-to-build-an-app-based-on-city-of-palo-alto-open-data

• Jay Kreps: http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

• Joseph Turian: http://files.meetup.com/1542972/20120202-more-data-same-models-STUDY-SLIDES.pdf

• Me: http://blog.abegong.com/2014/02/wanted-good-examples-of-data-sidekicks.html

Wednesday, February 12, 14

top related