keynote at big data tech con sf 2014
TRANSCRIPT
Building Data Products: The Right Order of Things
Gloria Lau VP of Data, Timeful
Keynote @ Big Data Tech Con
http://www.linkedin.com/in/gloriatlau/ @gloriatlau
What do they have in common?
Right order of things
def __init__(self):
data infrastructure
for x in range(3):
offline modeling
online data product
user feedback
Model Product
Model Product
The challenge
Exception: tracking code missing/overloaded!
Debug: Power user computation takes forever!
def __init__(self):
data infrastructure
for x in range(3):
offline modeling
online data product
user feedback
The challenge
Data viz --> ID'ed new data potential --> Yet another data product
Sparse data --> Crappy model --> Need to nudge users for *more* data
Non-standardized data --> Crappy model --> Need to standardize
def __init__(self):
data infrastructure
for x in range(3):
offline modeling
online data product
user feedback
• Four diseases have broken out in the world and it is up to a team of specialists in various fields to find cures for these diseases before mankind is wiped out ... the diseases are out breaking fast and time is running out: the team must try to stem the tide of infection in diseased areas while also towards cures. A truly cooperative game where you all win or you all lose.
• How do you win?
• Optimally deploy minimal resources in the right order
• What is optimal
• Do you fix that tracking issue first?
• Do you optimize your power user computation?
• Do you double down on standardization?
• Relevant classifications
• P0 vs P1
• big company vs small company
2 Questions to ask
1 Quote answers them all
–Donald Knuth
“Premature optimization is the root of all evil.”
What is the one metric that your data product will move?
• Retention. Growth. Engagement. Money. Etc.
• Find it, and focus
If your users use your product a min/day/user, how would you spend that?
• Data scientists love data. More the merrier.
• More data solves your data scientist's problem. It does not solve your user's problem.
• Q1: Is it in the critical path of measuring that metric?
• Q2: Are you throwing away user's time?
Do you fix that tracking issue first?
Do you optimize your power user computation?
• Q1: Are power users your key user metric to lift?
• Q2: What fraction of total user's time is affected by this?
Do you double down on standardization?
• Q1: Peel the onion. How will x% increase in standardization rate affect your current and projected metric?
• Q2: Does it add friction to the funnel?
–Donald Knuth
“Premature optimization is the root of all evil.”
• Right order:
• talent first
• assimilation
• the 3%; fail fast
–Donald Knuth
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about
small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that
critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since
the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.”
It's an art.