walmart big data expo

91
Natural Intelligence: the Human Factor in A.I. Big Data Expo 2017 Utrecht, Netherlands

Upload: bigdataexpo

Post on 21-Jan-2018

422 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Natural  Intelligence:  the  Human  Factor  in  A.I.

Big  Data  Expo  2017Utrecht,  Netherlands

About  Me• Former  Member  of  the  Search  team  at  @WalmartLabs

• Former  Head  of  Metrics  &  Measurements  team• I  also  led  the  Human  Evaluation  team

• About  the  Metrics  and  Measurements  team• A  team  of  engineers,  analysts  and  scientists  in  charge  of

providing  accurate  and  exhaustive  measurements• we  also  had  an  auditing  role  towards  adjacent  teams

• What  do  we  measure?• Engineering  metrics  related  to  model  and  data  quality• Business  metrics  (revenue,  etc.)• More  exotic  customer-­‐centric  metrics  (customer  value,  customer  satisfaction,  model  impact,  etc.)

• Currently  Head  of  Data  Science  at  Atlassian• In  charge  of  the  Search  &  Smarts  team

About  Me• Former  Member  of  the  Search  team  at  @WalmartLabs

• Former  Head  of  Metrics  &  Measurements  team• I  also  led  the  Human  Evaluation  team

• About  the  Metrics  and  Measurements  team• A  team  of  engineers,  analysts  and  scientists  in  charge  of

providing  accurate and  exhaustive measurements• we  also  had  an  auditing  role  towards  adjacent  teams

• What  do  we  measure?• Engineering  metrics  related  to  model  and  data  quality• Business  metrics  (revenue,  etc.)• More  exotic  customer-­‐centric  metrics  (customer  value,  customer  satisfaction,  model  impact,  etc.)

• Currently  Head  of  Data  Science  at  Atlassian• In  charge  of  the  Search  &  Smarts  team

About  Me• Former  Member  of  the  Search  team  at  @WalmartLabs

• Former  Head  of  Metrics  &  Measurements  team• I  also  led  the  Human  Evaluation  team

• About  the  Metrics  and  Measurements  team• A  team  of  engineers,  analysts  and  scientists  in  charge  of

providing  accurate and  exhaustive measurements• we  also  had  an  auditing  role  towards  adjacent  teams

• What  do  we  measure?• Engineering  metrics  related  to  model  and  data  quality• Business  metrics  (revenue,  etc.)• More  exotic  customer-­‐centric  metrics  (customer  value,  customer  satisfaction,  model  impact,  etc.)

• Currently  Head  of  Data  Science  at  Atlassian• In  charge  of  the  Search  &  Smarts  team

About  Me• Former  Member  of  the  Search  team  at  @WalmartLabs

• Former  Head  of  Metrics  &  Measurements  team• I  also  led  the  Human  Evaluation  team

• About  the  Metrics  and  Measurements  team• A  team  of  engineers,  analysts  and  scientists  in  charge  of

providing  accurate and  exhaustive measurements• we  also  had  an  auditing  role  towards  adjacent  teams

• What  do  we  measure?• Engineering  metrics  related  to  model  and  data  quality• Business  metrics  (revenue,  etc.)• More  exotic  customer-­‐centric  metrics  (customer  value,  customer  satisfaction,  model  impact,  etc.)

• Currently  Head  of  Data  Science  at  Atlassian• In  charge  of  the  Search  &  Smarts  team

q Humans  &  Big  Data• The  role  of  human  beings  in  the  era  of  Big  Data• Why  do  we  need  to  tag  data?• How  to  get  tagged  data?

q The  Era  of  Crowdsourcing• What  is  Crowdsourcing?• Use  cases  and  details  about  Crowdsourcing• Traditional  crowds  vs.  curated  crowds

q The  Human-­‐in-­‐the-­‐Loop  Paradigm• Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML• Introduction  to  Active  Learning

Outline

q Humans  &  Big  Data• The  role  of  human  beings  in  the  era  of  Big  Data• Why  do  we  need  to  tag  data?• How  to  get  tagged  data?

q The  Era  of  Crowdsourcing• What  is  Crowdsourcing?• Use  cases  and  details  about  Crowdsourcing• Traditional  crowds  vs.  curated  crowds

q The  Human-­‐in-­‐the-­‐Loop  Paradigm• Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML• Introduction  to  Active  Learning

Outline

q Humans  &  Big  Data• The  role  of  human  beings  in  the  era  of  Big  Data• Why  do  we  need  to  tag  data?• How  to  get  tagged  data?

q The  Era  of  Crowdsourcing• What  is  Crowdsourcing?• Use  cases  and  details  about  Crowdsourcing• Traditional  crowds  vs.  curated  crowds

q The  Human-­‐in-­‐the-­‐Loop  Paradigm• Definition  and  details  about  Human-­‐In-­‐The-­‐Loop  ML• Introduction  to  Active  Learning

Outline

Humans  &  Big  Data:The  Role  of  Human  Beings  in  the  Era  of  

Machine  Learning

The  Era  of  Very  Big  Data

q VOLUME• More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race

• By  2020,  accumulated  data  will  reach  44 trillion gigabytes

q VELOCITY• By  2020,  ~1.7  MB of  new  data  /  second  /  human  being

• 1.2  trillion  search  queries  on  Google  per  year

q VARIETY• 31  million  messages/2.8  million  videos per  minute  on  Facebook

• Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube

• In  2015, 1  trillion  photos taken;  billions  shared  online

data  center  at  Google

The  Era  of  Very  Big  Data

q VOLUME• More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race

• By  2020,  accumulated  data  will  reach  44 trillion gigabytes

q VELOCITY• By  2020,  ~1.7  MB of  new  data  /  second  /  human  being

• 1.2  trillion  search  queries  on  Google  per  year

q VARIETY• 31  million  messages/2.8  million  videos per  minute  on  Facebook

• Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube

• In  2015, 1  trillion  photos taken;  billions  shared  online

data  center  at  Google

The  Era  of  Very  Big  Data

q VOLUME• More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race

• By  2020,  accumulated  data  will  reach  44 trillion gigabytes

q VELOCITY• By  2020,  ~1.7  MB of  new  data  /  second  /  human  being

• 1.2  trillion  search  queries  on  Google  per  year

q VARIETY• 31  million  messages/2.8  million  videos per  minute  on  Facebook

• Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube

• In  2015, 1  trillion  photos taken;  billions  shared  online

data  center  at  Google

The  Era  of  Very  Big  Data

q VOLUME• More  data created  from  2013  to  2015  than  in  the  entire  previous  history  of  the  human  race

• By  2020,  accumulated  data  will  reach  44 trillion gigabytes

q VELOCITY• By  2020,  ~1.7  MB of  new  data  /  second  /  human  being

• 1.2  trillion  search  queries  on  Google  per  year

q VARIETY• 31  million  messages/2.8  million  videos per  minute  on  Facebook

• Up  to 300  hours of  video  /  minute  are  uploaded  to  YouTube

• In  2015, 1  trillion  photos taken;  billions  shared  online

data  center  at  Google

Supervised  vs.  Unsupervised  Machine  LearningSupervised  ML

requires  tagged  data

• Classification:  problem  where  the  output  variable  is  a  categoryexamples:  SVM,  random  forest,  Bayesian  classifiers

• Regression:  problem  where  the  output  variable  is  a  real  valueexamples:  linear  regression,  random  forest

Supervised  vs.  Unsupervised  Machine  LearningSupervised  ML

requires  tagged  data

Unsupervised  MLdoesn’t  require  tagged  data

• Classification:  problem  where  the  output  variable  is  a  categoryexamples:  SVM,  random  forest,  Bayesian  classifiers

• Regression:  problem  where  the  output  variable  is  a  real  valueexamples:  linear  regression,  random  forest

• Clustering:discovery of inherent groupings in the dataexamples: k-­‐means, k-­‐nearest neighbors

• Association rules:discovery of rules describing the dataexample: Apriori algorithm

Supervised  vs.  Unsupervised  Machine  LearningSupervised  ML

requires  tagged  data

Unsupervised  MLdoesn’t  require  tagged  data

Supervised:• Image  Recognition• Speech  Recognition

Unsupervised• Feature  Learning• Autoencoders

• Classification:  problem  where  the  output  variable  is  a  categoryexamples:  SVM,  random  forest,  Bayesian  classifiers

• Regression:  problem  where  the  output  variable  is  a  real  valueexamples:  linear  regression,  random  forest

• Clustering:discovery of inherent groupings in the dataexamples: k-­‐means, k-­‐nearest neighbors

• Association rules:discovery of rules describing the dataexample: Apriori algorithm

The  Case  of  Deep  Learningboth  supervised  and  unsupervised  applications

NB:  Deep  Learning  algorithms  are  data-­‐greedy…

• Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML• Expensive• Quality  control  is  hard,  requires  second  human  pass• Hardly  scalable  à heavy  use  of  sampling  strategies

• How  do  companies  doing  Machine  Learning  get  tagged  data?• Implicit  tagging:  customer  engagement• Explicit  tagging:  manual  labor

• A  few  strategies  to  get  tagged  data  for  cheap/free:• Games  (Google  Quick  Draw)• Incentivization  (extra  lives  or  bonuses  in  games)

Tagged  Data

• Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML• Expensive• Quality  control  is  hard,  requires  second  human  pass• Hardly  scalable  à heavy  use  of  sampling  strategies

• How  do  companies  doing  Machine  Learning  get  tagged  data?• Implicit  tagging:  customer  engagement• Explicit  tagging:  manual  labor

• A  few  strategies  to  get  tagged  data  for  cheap/free:• Games  (Google  Quick  Draw)• Incentivization  (extra  lives  or  bonuses  in  games)

Tagged  Data

• Gathering  quality  tagged  training  data is  a  common  bottleneck  in  ML• Expensive• Quality  control  is  hard,  requires  second  human  pass• Hardly  scalable  à heavy  use  of  sampling  strategies

• How  do  companies  doing  Machine  Learning  get  tagged  data?• Implicit  tagging:  customer  engagement• Explicit  tagging:  manual  labor

• A  few  strategies  to  get  tagged  data  for  cheap/free:• Games  (Google  Quick  Draw)• Incentivization  (extra  lives  or  bonuses  in  games)

Tagged  Data

https://quickdraw.withgoogle.com/

Why  human  input  matters:  the  use  case  of  image  colorization

The  Wisdom  from  the  Crowd

Why  human  input  matters:  the  use  case  of  image  colorization

The  Wisdom  from  the  Crowd

ColorizationModel

à Colorization  is  straightforward  to  humans  because  they  can  ‘tap’  into  their  general  knowledge

The  Wisdom  from  the  Crowd

image  recognition

watermelon

grapesbananas

pineappleorange

tagged training  data  set

“Bananas  are  generally   ”

‘general’  knowledge• obvious  for  human  beings• fastidious  for  machines

colorization

Why  human  input  matters:  the  use  case  of  image  colorization

Crowdsourcing:Human  Wisdom  at  Scale

What  is  Crowdsourcing?

the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  peopleCrowdsourc ing

What  is  Crowdsourcing?

Ø Crowdsourcing  =  'crowd'  +  'outsourcing'  Ø Act  of  taking  a  function  once  performed  by  employees  and  outsourcing  it  to  an  undefined  (generally  large)  network  of  people  in  the  form  of  an  open  call

the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people

History  of  Crowdsourcing• Term  was  first  used  in  2005  by  the  editors  atWired• Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2016• Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd”

What  Crowdsourcing  helps  with:• Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)  • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals)

Crowdsourc ing

What  is  Crowdsourcing?

Ø Crowdsourcing  =  'crowd'  +  'outsourcing'  Ø Act  of  taking  a  function  once  performed  by  employees  and  outsourcing  it  to  an  undefined  (generally  large)  network  of  people  in  the  form  of  an  open  call

the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  people

History  of  Crowdsourcing• Term  was  first  used  in  2005 by  the  editors  atWired• Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2006• Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd”

What  Crowdsourcing  helps  with:• Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)  • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals)

Crowdsourc ing

What  is  Crowdsourcing?

Ø Crowdsourcing  =  'crowd'  +  'outsourcing'  Ø Act  of  taking  a  function  once  performed  by  employees  and  outsourcing  it  to  an  undefined  (generally  large)  network  of  people  in  the  form  of  an  open  call

the  process  of  getting  labor  or  funding,  usually  online,  from  a  crowd  of  peopleCrowdsourc ing

History  of  Crowdsourcing• Term  was  first  used  in  2005 by  the  editors  atWired• Official  definition  published  in  Wired  article  “The  Rise  of  Crowdsourcing”,  June  2016• Describes  how  businesses  were  using  the  Internet  to  “outsource  work  to  the  crowd”

What  Crowdsourcing  helps  with:• Scale   à peer-­‐production  (for jobs  to  be  performed  collaboratively)  • Reach   à connect  with  a  large  network  of  potential  laborers  (if  task  undertaken  by  sole  individuals)

The  Nature  of  Crowdsourcing

• Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc.• Data  validation:  validation  of  translation,  etc.• Data  tagging:  image  tagging,  product  categorization,  etc.

• Data  curation:  curation  of  news  feeds,  etc.

Microtasks

Funding

Macrotasks• Solution  development:  algorithm  improvement,  etc.

• Crowd  contest:  design  competition,  algorithmic  competition,  etc.

The  Nature  of  Crowdsourcing

• Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc.• Data  validation:  validation  of  translation,  etc.• Data  tagging:  image  tagging,  product  categorization,  etc.

• Data  curation:  curation  of  news  feeds,  etc.

Microtasks

Funding

Macrotasks• Solution  development:  algorithm  improvement,  etc.

• Crowd  contest:  design  competition,  algorithmic  competition,  etc.

The  Nature  of  Crowdsourcing

• Data  generation: user  generated  content  such  as  reviews,  pictures,  translations,  etc.• Data  validation:  validation  of  translation,  etc.• Data  tagging:  image  tagging,  product  categorization,  etc.

• Data  curation:  curation  of  news  feeds,  etc.

Microtasks

Funding

Macrotasks• Solution  development:  algorithm  improvement,  etc.

• Crowd  contest:  design  competition,  algorithmic  competition,  etc.

Some  Cool  Crowdsourcing  Applications

Some  Cool  Crowdsourcing  Applications

Mapping• Photo  Sphere• Google  Maps  crowdsources  info  for  

wheelchair-­‐accessible  places

Some  Cool  Crowdsourcing  Applications

Mapping• Photo  Sphere• Google  Maps  crowdsources  info  for  

wheelchair-­‐accessible  places

Traffic• Google  Traffic• Waze:  Traffic  reporting  app

Some  Cool  Crowdsourcing  Applications

Mapping• Photo  Sphere• Google  Maps  crowdsources  info  for  

wheelchair-­‐accessible  places

Traffic• Google  Traffic• Waze:  Traffic  reporting  app

Translation  • Google  Translate

Some  Cool  Crowdsourcing  Applications

Mapping• Photo  Sphere• Google  Maps  crowdsources  info  for  

wheelchair-­‐accessible  places

Traffic• Google  Traffic• Waze:  Traffic  reporting  app

Epidemiology• Flu  tracking  applications

Translation  • Google  Translate

Companies  Based  on  Crowdsourcing

Quora is  a question-­‐and-­‐answer  site where  questions  are  asked,  answered,  edited  and  organized  by  its  community  of  users.

Waze  is  a  community-­‐based  traffic  and  navigation  app  where  drivers  share  real-­‐time  traffic  and  road  info

Kaggle is  a  platform  for predictive  modelling competitions  in  which  companies  post  data  and  data  miners  compete  to  produce  the  best  models.

Stack  Overflow  is  a  platform  for  users  to  ask  and  answer  questions  and  to  vote  questions  and  answers  up  or  down  and  edit  them.

Flickr is  an image  and  video  hosting website that  is  widely  used  by bloggers to  host  images  that  they  embed  in  social  media.

The  Challenges  of  Crowdsourcing

Reliability  • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items)• Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area

Relevance  of  knowledge• Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating

Subjectivity• Search: Relevance  score  varies  depending  on  profile  and  personal  preferences

Speed  &  cost• Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples• Not  practical  for  measurement  purposes

The  Challenges  of  Crowdsourcing

Reliability  • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items)• Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area

Relevance  of  knowledge• Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating

Subjectivity• Search: Relevance  score  varies  depending  on  profile  and  personal  preferences

Speed  &  cost• Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples• Not  practical  for  measurement  purposes

The  Challenges  of  Crowdsourcing

Reliability  • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items)• Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area

Relevance  of  knowledge• Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating

Subjectivity• Search: Relevance  score  varies  depending  on  profile  and  personal  preferences

Speed  &  cost• Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples• Not  practical  for  measurement  purposes

The  Challenges  of  Crowdsourcing

Reliability  • Retail: Absence  of  emotional  involvement  (judges  are  not  actually  spending  money  on  items)• Waze:  Locals  were  sending  fake  information  to  limit  traffic  in  their  area

Relevance  of  knowledge• Retail:  Judges  might  not  have  appropriate  knowledge  of  the  items  they  are  evaluating

Subjectivity• Search: Relevance  score  varies  depending  on  profile  and  personal  preferences

Speed  &  cost• Human  evaluations  take  time,  can  only  be  performed  sporadically  and  on  samples• Not  practical  for  measurement  purposes

The  Challenges  of  Crowdsourcing

Crowdsourcing  vs.  Curated  CrowdsTraditional  Crowdsourcing  Model

$$$$$

+ Speed:  • many  hands  generate  light  work

+ Lower  cost:• typically  a  few  pennies  per  task

-­‐ No  quality  control-­‐ Lack  of  control:  

• little  to  no  incentive  to  deliver  on  time-­‐ High  maintenance:  

• clear  instructions  needed  • automated  understanding  checks

-­‐ Lower  reliability:  • high  overlap  required

-­‐ Lack  of  confidentiality:  • anyone  can  see  your  tasks

Curated  Crowd$$$$$

+ Quality  control:  • judges  submitted  to  quality  metrics  • removed  if  they  don’t  deliver  required  quality

+ Better  quality:  • very  little  overlap  needed

+ Expertise:• judges  become  experts  at  required  task

+ Constraints  on  crowd:  • judges  less  likely  to  drop  out

-­‐ More  expensive:• typically  primary  source  of  income  for  judges

-­‐ Consistency  required:  • need  frequent  tasks  to  keep  sharp  skills

Catalog  Curation• Product  Description  Curation• Product  Tagging  & Categorization• Product  Deduplication• Taxonomy  Testing

Search  Relevance  Evaluation• Relevance  score  (query-­‐item  pair  scores)• Engine  comparison  (ranking-­‐to-­‐ranking)

Review  Moderation• Removal/flagging  of  obscene  reviews

Mystery  Shopping• Analysis  and  discovery  of  new  trends  • Evaluation  of  new  products• Competitive  analysis

Crowdsourcing  Applications  in  e-­‐Commerce

Catalog  Curation• Product  Description  Curation• Product  Tagging  & Categorization• Product  Deduplication• Taxonomy  Testing

Search  Relevance  Evaluation• Relevance  score  (query-­‐item  pair  scores)• Engine  comparison  (ranking-­‐to-­‐ranking)

Review  Moderation• Removal/flagging  of  obscene  reviews

Mystery  Shopping• Analysis  and  discovery  of  new  trends  • Evaluation  of  new  products• Competitive  analysis

Crowdsourcing  Applications  in  e-­‐CommerceThe  exam

ple  of  Product  Tagging

Catalog  Curation• Product  Description  Curation• Product  Tagging  & Categorization• Product  Deduplication• Taxonomy  Testing

Search  Relevance  Evaluation• Relevance  score  (query-­‐item  pair  scores)• Engine  comparison  (ranking-­‐to-­‐ranking)

Review  Moderation• Removal/flagging  of  obscene  reviews

Mystery  Shopping• Analysis  and  discovery  of  new  trends  • Evaluation  of  new  products• Competitive  analysis

Crowdsourcing  Applications  in  e-­‐CommerceThe  exam

ple  of  Product  Tagging

Catalog  Curation• Product  Description  Curation• Product  Tagging  & Categorization• Product  Deduplication• Taxonomy  Testing

Search  Relevance  Evaluation• Relevance  score  (query-­‐item  pair  scores)• Engine  comparison  (ranking-­‐to-­‐ranking)

Review  Moderation• Removal/flagging  of  obscene  reviews

Mystery  Shopping• Analysis  and  discovery  of  new  trends  • Evaluation  of  new  products• Competitive  analysis

Crowdsourcing  Applications  in  e-­‐CommerceThe  exam

ple  of  Product  Tagging

Catalog  Curation• Product  Description  Curation• Product  Tagging  & Categorization• Product  Deduplication• Taxonomy  Testing

Search  Relevance  Evaluation• Relevance  score  (query-­‐item  pair  scores)• Engine  comparison  (ranking-­‐to-­‐ranking)

Review  Moderation• Removal/flagging  of  obscene  reviews

Mystery  Shopping• Analysis  and  discovery  of  new  trends  • Evaluation  of  new  products• Competitive  analysis

Crowdsourcing  Applications  in  e-­‐CommerceThe  exam

ple  of  Product  Tagging

Use  Case:  Evaluation  of  Search  Engine  Relevance

à Human  evaluation  makes  it  possible  to  measure  the  intangible  with  little  risk

Rank

ing  B

Rank

ing  A

Side-­‐by-­‐Side  Engine  Comparison

Judge  1:Prefers  ranking  A

Judge  2:Prefers  ranking  A

Judge  3:Prefers  ranking  B

Use  Case:  Evaluation  of  Search  Engine  Relevance

5/5

5/5

5/5

4/5

3/5

2/5

5/5

5/5

5/5

5/5

5/5

5/5

Query-­‐Item  Relevance  Scoring  for  Measurement  of  Ranking  Quality

𝐷𝐶𝐺$ =&𝑟𝑒𝑙*

𝑙𝑜𝑔-(𝑖 + 1)

$

*34

𝑛𝐷𝐶𝐺$ =𝐷𝐶𝐺$𝐼𝐷𝐶𝐺$

𝐼𝐷𝐶𝐺$ = &289:; − 1𝑙𝑜𝑔-(𝑖 + 1)

=>?

*34

where

graded  relevance  of item at  position i

Discounted  cumulative  gain

Human-­‐in-­‐the-­‐Loop:When  Human  Beings  still  Outperform  the  Machine

Fact:   the  brain  has 38  petaflops (thousand  trillion  operations  per  second)  of  processing  power…

The  Dream  of  Automation

FIRST  REVOLUTION  – 1784Mechanical  production,  railroad,  steam  power

SECOND  REVOLUTION  – 1870Mass  production,  electrical  power,  

assembly  lines

THIRD  REVOLUTION  – 1969Automated  production,  electronics,computers

FOURTH  REVOLUTION  – ongoingArtificial  intelligence,  big  data

The  4  Industrial  Revolutions

The  Dream  of  Automation

FIRST  REVOLUTION  – 1784Mechanical  production,  railroad,  steam  power

SECOND  REVOLUTION  – 1870Mass  production,  electrical  power,  

assembly  lines

THIRD  REVOLUTION  – 1969Automated  production,  electronics,computers

FOURTH  REVOLUTION  – ongoingArtificial  intelligence,  big  data

à Automation  is  not  a  new  idea

The  4  Industrial  Revolutions

The  Dream  of  Automation

FIRST  REVOLUTION  – 1784Mechanical  production,  railroad,  steam  power

SECOND  REVOLUTION  – 1870Mass  production,  electrical  power,  

assembly  lines

THIRD  REVOLUTION  – 1969Automated  production,  electronics,computers

FOURTH  REVOLUTION  – ongoingArtificial  intelligence,  big  data

à Automation  is  not  a  new  idea

The  4  Industrial  Revolutionsthe  use  of  various control  systems for  operating  equipment  such  as  machinery  and  processes  with  minimal  or  reduced  human  intervention.

Automat ion

The  Dream  of  Automation

the  use  of  various control  systems for  operating  equipment  such  as  machinery  and  processes  with  minimal  or  reduced  human  intervention.

FIRST  REVOLUTION  – 1784Mechanical  production,  railroad,  steam  power

SECOND  REVOLUTION  – 1870Mass  production,  electrical  power,  

assembly  lines

THIRD  REVOLUTION  – 1969Automated  production,  electronics,computers

FOURTH  REVOLUTION  – ongoingArtificial  intelligence,  big  data

Why?• Automate  boring/repetitive  tasks• Perform  tasks  at  scale• Perform  tasks  with  enhanced  precision• Deliver  consistent products• Use  machines  where  they  outperform  humans

à Automation  is  not  a  new  idea

The  4  Industrial  Revolutions Automat ion

When  Full  Automation  can’t  be  Achieved…Human -­‐ in -­‐ the -­‐Loop

Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interaction

The  idea  of  using  human  beings  to  enhance  the  machine  is  not  newWe  have  been  doing  Human-­‐in-­‐the-­‐Loop  all  along…• Example:  Autopilot  technology  for  planes

Human  intervention/presence  is  useful:• To  handle  corner  cases  (outlier  management)• To  “keep  an  eye”  on  the  system  (sanity  check)• To  correct  unwanted  behavior  (refinement)• To  validate  appropriate  behavior  (validation)

When  Full  Automation  can’t  be  Achieved…

Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interactionHuman -­‐ in -­‐ the -­‐Loop

The  idea  of  using  human  beings  to  enhance  the  machine  is  not  newWe  have  been  doing  Human-­‐in-­‐the-­‐Loop  all  along…• Example:  Autopilot  technology  for  planes

Human  intervention/presence  is  useful:• To  handle  corner  cases  (outlier  management)• To  “keep  an  eye”  on  the  system  (sanity  check)• To  correct  unwanted  behavior  (refinement)• To  validate  appropriate  behavior  (validation)

When  Full  Automation  can’t  be  Achieved…

Human-­in-­the-­loop or HITL is  defined  as  a  model  or  a  system  that  requires  human  interactionHuman -­‐ in -­‐ the -­‐Loop

Human-­‐in-­‐the-­‐Loop  ParadigmPare to  P r inc ip le

aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity-­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes

ML  version  of  the  Pareto  Principle:  • Evidence  suggests  that  some  of  the  most  accurate  ML  systems  to  date need:  

• 80%  computer  AI-­‐driven  • 19%  human  input• 1  %  unknown  randomness  

to  balance  things  out• The  combination  of  machine  and  human  intervention  achieves  maximum  machine  accuracy

How  can  human  knowledge  be  incorporated  to  ML  models?A. Helping  label  the  original  dataset  that  will  be  fed  into  a  ML  modelB. Helping  correct  inaccurate  predictions  that  arise  as  the  system  goes  live.

Human-­‐in-­‐the-­‐Loop  Paradigm

aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity-­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes

Pare to  P r inc ip le

ML  version  of  the  Pareto  Principle:  • Evidence  suggests  that  some  of  the  most  accurate  ML  systems  to  date need:  

• 80%  computer  AI-­‐driven  • 19%  human  input• 1  %  unknown  randomness  

to  balance  things  out• The  combination  of  machine  and  human  intervention  achieves  maximum  machine  accuracy

How  can  human  knowledge  be  incorporated  to  ML  models?A. Helping  label  the  original  dataset  that  will  be  fed  into  a  ML  modelB. Helping  correct  inaccurate  predictions  that  arise  as  the  system  goes  live

Human-­‐in-­‐the-­‐Loop  Paradigm

aka  the  80/20  rule,  the law  of  the  vital  few, or  the principle  of  factor  sparsity-­‐ states  that,  for  many  events,  roughly  80%  of  the  effects  come  from  20%  of  the  causes

Pare to  P r inc ip le

Human-­‐In-­‐The-­‐Loop  Use  Case  #1An  example  of  HITL  approach:  face  recognition

Human-­‐In-­‐The-­‐Loop  Use  Case  #1

Mary

Roberto

Victoria

LauraSebastian

Cecelia

An  example  of  HITL  approach:  face  recognition

Human-­‐In-­‐The-­‐Loop  Use  Case  #1

Mary

Roberto

Victoria

LauraSebastian

Cecelia

Accuracy• Facebook's  DeepFace Software  reaches  97.25%  of  accuracy

HITL  as  a  feedback  loop• When  the  confidence  is  below  a  certain  threshold,  it:

• suggests  a  label• ask  the  uploader  to  validate/approve  or  correct  the  

suggestion

• The  new  data  is  used  to  improve  the  accuracy  of  the  algorithm

An  example  of  HITL  approach:  face  recognition

Human-­‐In-­‐The-­‐Loop  Use  Case  #1

Mary

Roberto

Victoria

LauraSebastian

Cecelia

Accuracy• Facebook's  DeepFace Software  reaches  97.25%  of  accuracy

HITL  as  a  feedback  loop• When  the  confidence  is  below  a  certain  threshold,  it:

• suggests a  label• ask  the  uploader  to  validate/approve  or  correct  the  

suggestion

• The  new  data  is  used  to  improve  the  accuracy  of  the  algorithm

An  example  of  HITL  approach:  face  recognition

Human-­‐In-­‐The-­‐Loop  Use  Case  #2An  example  of  HITL  approach:  autonomous  vehicles

Teaching  the  machine• Driving  systems  were  trained  using  a  human  to  oversee  the  process

Accuracy  considerations• Autopilot  system  is  now  over  99%  accurate

• However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!)

• Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates

Human-­‐In-­‐The-­‐Loop  Use  Case  #2An  example  of  HITL  approach:  autonomous  vehicles

Teaching  the  machine• Driving  systems  were  trained  using  a  human  to  oversee  the  process

Accuracy  considerations• Autopilot  system  is  now  over  99%  accurate

• However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!)

• Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates

Human-­‐In-­‐The-­‐Loop  Use  Case  #2An  example  of  HITL  approach:  autonomous  vehicles

Teaching  the  machine• Driving  systems  were  trained  using  a  human  to  oversee  the  process

Accuracy  considerations• Autopilot  system  is  now  over  99%  accurate

• However,  a  99%  accuracy  means  that  people  can  die  1%  of  the  time  (!!)

• Though  we  have  seen  huge  advances  in  accuracy  of  pure  machine-­‐driven  systems,  they  tend  to fall  short  of  acceptable accuracy  rates

Human-­‐In-­‐The-­‐Loop  Use  Case  #2An  example  of  HITL  approach:  autonomous  vehicles

Corner  cases• Fun  fact: Volvo’s  self-­‐driving  cars  fail  in  Australia  because  of  kangaroos• Reaching  100%  is  hard  because  of  corner  cases• A  HITL  approach  helps  get  the  accuracy  to  ~100%

• get  the  accuracy  to  ~100%

Volvo's  driverless  cars  'confused'  by  kangaroos

The  Success  of  Human-­‐In-­‐The-­‐LoopThe  Example  of  Chess

The  Human  vs.  the  Machine• In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue

The  Success  of  Human-­‐In-­‐The-­‐LoopThe  Example  of  Chess

Garry  Kasparov

The  Human  vs.  the  Machine• In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue

The  Success  of  Human-­‐In-­‐The-­‐LoopThe  Example  of  Chess

Freestyle  or  “Advanced”  Chess• Advanced:  A  human  chess  master  works  with  a  computer  to  find  the  best  possible  move  

• Freestyle:  A  team  can  be  made  of  any  combination  of  human  beings  +  computers

• In  2005,  Steven  Cramton,  Zackary  Stephen  and  their  3  computers  win  Freestyle  Chess  Tournament

Why  it  works• Computers  are  great  at  reading  tough  tactical  situations

• But  humans  are  better  at  understanding  long  term  strategy

• Computers  to  limit  “blunders”  while  using  their  intuition  to  force  the  opponent  into  board  states  that  confuses  the  computer(s)

Garry  Kasparov

The  Human  vs.  the  Machine• In  1997,  Chess  Master  Garry  Kasparov  is  beaten  by  IBM  supercomputer  Deep  Blue

The  Success  of  Human-­‐In-­‐The-­‐LoopThe  Example  of  Chess

Freestyle  or  “Advanced”  Chess• Advanced:  A  human  chess  master  works  with  a  computer  to  find  the  best  possible  move  

• Freestyle:  A  team  can  be  made  of  any  combination  of  human  beings  +  computers

• In  2005,  Steven  Cramton,  Zackary  Stephen  and  their  3  computers  win  Freestyle  Chess  Tournament

Why  it  works• Computers  are  great  at  reading  tough  tactical  situations

• But  humans  are  better  at  understanding  long  term  strategy

• Computers  to  limit  “blunders”  while  using  their  intuition  to  force  the  opponent  into  board  states  that  confuses  the  computer(s)

Garry  Kasparov

Active  Learning:The  Best  of  Both  Worlds

Active  Learning

a special case of semi-­‐supervised ML in which a learning algorithm can interactively query theuser (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance

Act i ve   Learn ing

Active  Learning

a special case of semi-­‐supervised ML in which a learning algorithm can interactively query theuser (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance

General  StrategyIf  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets1. DK,  i :  data  points  where  the  label  is known2. DU,  i :  data  points  where  the  label  is unknown3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known)

Benefits• Query  labels  only  when  necessary  (lower  cost)

Next  Generation  Algorithms• Proactive  learning:  

• relaxes  the  assumption  that  the  oracle  is  always  right• casts  the  problem  as  an  optimization  problem w/  a budget  constraint

Act i ve   Learn ing

Active  Learning

a special case of semi-­‐supervised ML in which a learning algorithm can interactively query theuser (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance

General  StrategyIf  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets1. DK,  i :  data  points  where  the  label  is known2. DU,  i :  data  points  where  the  label  is unknown3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known)

Benefits• Query  labels  only  when  necessary  (lower  cost)

Next  Generation  Algorithms• Proactive  learning:  

• relaxes  the  assumption  that  the  oracle  is  always  right• casts  the  problem  as  an  optimization  problem w/  a budget  constraint

Act i ve   Learn ing

Active  Learning

a special case of semi-­‐supervised ML in which a learning algorithm can interactively query theuser (oracle) to obtain the desired outputs at new data points, maximizing validity and relevance

General  StrategyIf  D  is the  entire  data  set,  a each  iteration i , D is  broken  up  into  three  subsets1. DK,  i :  data  points  where  the  label  is known2. DU,  i :  data  points  where  the  label  is unknown3. DQ,  i :  data  points for  which  the  label  is  queried  (sometimes,  even  when  the  label  is  known)

Benefits• Query  labels  only  when  necessary  (lower  cost)

Next  Generation  Algorithms• Proactive  learning:  

• relaxes  the  assumption  that  the  oracle  is  always  right• casts  the  problem  as  an  optimization  problem w/  a budget  constraint

Act i ve   Learn ing

Active  Learning:  How  does  it  Work?

Active  Learning:  How  does  it  Work?Machine  Learning  needs  

• Logics  (algorithm)• Data  • Optimization• Feedback  ß Human-­‐in-­‐the-­‐Loop

Active  Learning  =  a  Machine  Learning  Algorithm  using  an  “oracle”  to  reduce  mistakes/uncertainty

Query  Strategy  -­‐ Labels  are  queried  when:• Data  points  for  which  model  uncertainty  is  high  

(uncertainty  sampling)• Data  points  for  which  the  different  models  of  an  

ensemble  method  disagree  the  most  (query  by  committee)

• Data  points  causing  the  most  changes  on  the  model(expected  model  change)

• Data  points  caused  overall  variance  to  be  high(variance  reduction)

Active  Learning:  How  does  it  Work?

Unlabeled  Data

Active  Learning  Algorithm

select/remove  single  example

Labeled  Data

Clas

sifie

r Oracle

(Hum

an)

update

add  labeled  example

provide  correct  label

Machine  Learning  needs  • Logics  (algorithm)• Data  • Optimization• Feedback  ß Human-­‐in-­‐the-­‐Loop

Active  Learning  =  a  Machine  Learning  Algorithm  using  an  “oracle”  to  reduce  mistakes/uncertainty

Query  Strategy  -­‐ Labels  are  queried  when:• Data  points  for  which  model  uncertainty  is  high  

(uncertainty  sampling)• Data  points  for  which  the  different  models  of  an  

ensemble  method  disagree  the  most  (query  by  committee)

• Data  points  causing  the  most  changes  on  the  model(expected  model  change)

• Data  points  caused  overall  variance  to  be  high(variance  reduction)

Active  Learning:  How  does  it  Work?

Unlabeled  Data

Active  Learning  Algorithm

select/remove  single  example

Labeled  Data

Clas

sifie

r Oracle

(Hum

an)

update

add  labeled  example

provide  correct  label

Machine  Learning  needs  • Logics  (algorithm)• Data  • Optimization• Feedback  ß Human-­‐in-­‐the-­‐Loop

Active  Learning  =  a  Machine  Learning  Algorithm  using  an  “oracle”  to  reduce  mistakes/uncertainty

Query  Strategy  -­‐ Labels  are  queried  when:• Data  points  for  which  model  uncertainty  is  high  

(uncertainty  sampling)• Data  points  for  which  the  different  models  of  an  

ensemble  method  disagree  the  most  (query  by  committee)

• Data  points  causing  the  most  changes  on  the  model(expected  model  change)

• Data  points  caused  overall  variance  to  be  high(variance  reduction)

Active  Learning:  How  does  it  Work?

Machine  LearningClassifier

Confidence  level  high?

YES

NO

Output

Annotation  by  Human  Oracle

Human-­‐in-­‐the-­‐Loop

Active  Learning

By  adding  a  human  feedback  loop,  we  allow  the  system  to:  • actively  learn• correct  itself  where  it  got  it  wrong• improve  the  algorithm  over  iterations

Active  Learning:  How  does  it  Work?

Machine  LearningClassifier

Confidence  level  high?

YES

NO

Output

Annotation  by  Human  Oracle

Human-­‐in-­‐the-­‐Loop

Active  Learning

By  adding  a  human  feedback  loop,  we  allow  the  system  to:  • actively  learn• correct  itself  where  it  got  it  wrong• improve  the  algorithm  over  iterations

3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail

Active  Learning  at  Walmart  e-­‐Commerce

qMachine  Learning  Lifecycle  Management  (Programming  by  Feedback)• Automatic  monitoring  of  input  and  output  values  for  ML  algorithm• An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action• A  human  validates  the  action,  creating  tagged  data  for  full  automation

q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning)• Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion• Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation

q Refinement  of  Query  Tagging  Algorithm  (Optimization)

• Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model• Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers• Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms)

3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail

Active  Learning  at  Walmart  e-­‐Commerce

qMachine  Learning  Lifecycle  Management  (Programming  by  Feedback)• Automatic  monitoring  of  input  and  output  values  for  ML  algorithm• An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action• A  human  validates  the  action,  creating  tagged  data  for  full  automation

q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning)• Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion• Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation

q Refinement  of  Query  Tagging  Algorithm  (Optimization)

• Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model• Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers• Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms)

3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail

Active  Learning  at  Walmart  e-­‐Commerce

qMachine  Learning  Lifecycle  Management  (Programming  by  Feedback)• Automatic  monitoring  of  input  and  output  values  for  ML  algorithm• An  algorithm  detects  failings  and  outliers  in  real-­‐time  and  suggest  an  action• A  human  validates  the  action,  creating  tagged  data  for  full  automation

q Diagnosis  of  Catalog  Data  Issues  (Reinforcement  Learning)• Algorithm  uncovers  demoted  items  and  suggests  most  likely  reason  for  the  demotion• Engineer  manually  confirms/corrects  the  suggestion,  generating  training  data  for  full  automation

q Refinement  of  Query  Tagging  Algorithm  (Optimization)

• Human  evaluation  team  manually  measures  accuracy  of  query  tagging  model• Mistagged  queries  are  used  to  discover  patterns  specific  to  problematic  queries,  which  are  reported  to  engineers• Sample  is  enriched  with  problematic  queries  (evaluation  team  can  diagnose  problems  with  algorithms)

3  Use  Cases  using  Active  Learning  in  the  context  of  Search/Retail

red t-shirt Size M

color product  type size

Active  Learning  at  Walmart  e-­‐Commerce

• Why  do  humans  and  machine  complement  each  other?• Human  beings  are  memory-­‐constrained• Computers  are  knowledge-­‐constrained

• Tagged  data  more  important  than  ever• But  getting  quality  data  is  challenging  given  the  volume  of  data• Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale

• Human-­‐in-­‐the-­‐Loop  paradigm• Improve  accuracy  of  machine  learning  algorithm  (classifiers)• Many  examples  of  successful  endeavors  using  “Augmented  Intelligence”• Active  Learning  is  a  booming  area  of  ML/AI

Conclusion  and  Takeaways

• Why  do  humans  and  machine  complement  each  other?• Human  beings  are  memory-­‐constrained• Computers  are  knowledge-­‐constrained

• Tagged  data  more  important  than  ever• But  getting  quality  data  is  challenging  given  the  volume  of  data• Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale

• Human-­‐in-­‐the-­‐Loop  paradigm• Improve  accuracy  of  machine  learning  algorithm  (classifiers)• Many  examples  of  successful  endeavors  using  “Augmented  Intelligence”• Active  Learning  is  a  booming  area  of  ML/AI

Conclusion  and  Takeaways

• Why  do  humans  and  machine  complement  each  other?• Human  beings  are  memory-­‐constrained• Computers  are  knowledge-­‐constrained

• Tagged  data  more  important  than  ever• But  getting  quality  data  is  challenging  given  the  volume  of  data• Crowdsourcing  offer  more  flexibility  to  tag  data  at  scale

• Human-­‐in-­‐the-­‐Loop  paradigm• Improve  accuracy  of  machine  learning  algorithm  (classifiers)• Many  examples  of  successful  endeavors  using  “Augmented  Intelligence”• Active  Learning  is  a  booming  area  of  ML/AI

Conclusion  and  Takeaways

Thank  You!