harambeenet: data by the people, for the people

37
Data by the people, for the people Powering Interactions via the Social Web Michael Bernstein MIT CSAIL | USER INTERFACE DESIGN GROUP | HAYSTACK GROUP MIT HUMAN-COMPUTER INTERACTI

Upload: michael-bernstein

Post on 17-Dec-2014

1.280 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: HarambeeNet: Data by the people, for the people

Data by the people, for the peoplePowering Interactions via the Social Web

Michael BernsteinMIT CSAIL | USER INTERFACE DESIGN GROUP | HAYSTACK GROUP

MIT HUMAN-COMPUTER INTERACTION

Page 2: HarambeeNet: Data by the people, for the people

Computer Science

[Zachary ‘77, via Easley and Kleinberg ‘10]

“In the most basic sense, a network is any collection of objects in which some pairs of these objects are connected by links.”

- Easley and Kleinberg, page 2

Page 3: HarambeeNet: Data by the people, for the people

With the abstraction, we can: - Reason at high levels - Make predictions - Interact online - Model data

http://www.flickr.com/marc_smith

Page 4: HarambeeNet: Data by the people, for the people

Social Science

[Zachary ‘77, via Easley and Kleinberg ‘10]

“The analysis of patterns of social relationship in the group is then conducted on the graph, which is merely a shorthand representation of the ethnographic data.”

- Zachary ‘77

Page 5: HarambeeNet: Data by the people, for the people

Many of you are sitting on terabytes of data about human interactions.  The opportunities to scrape data – or more politely, leverage APIs – are also unprecedented.  And folks are buzzing around wondering what they can do with all of the data they've got their hands on.  But in our obsession with Big Data, we've forgotten to ask some of the hard critical questions about what all this data means and how we should be engaging with it.

- danah boyd, WWW ‘10

Methodological mismatch

Page 6: HarambeeNet: Data by the people, for the people

Many of you are sitting on terabytes of data about human interactions.  The opportunities to scrape data – or more politely, leverage APIs – are also unprecedented.  And folks are buzzing around wondering what they can do with all of the data they've got their hands on.  But in our obsession with Big Data, we've forgotten to ask some of the hard critical questions about what all this data means and how we should be engaging with it.

- danah boyd, WWW ‘10

Methodological mismatch

Page 7: HarambeeNet: Data by the people, for the people

building privacy-sensitive systems

building successful systems

Page 8: HarambeeNet: Data by the people, for the people

Netflix: Getting it right

Collaborative filtering

http://www.eecs.berkeley.edu/~zhanghao

Page 9: HarambeeNet: Data by the people, for the people

Netflix: Getting it right

Temporal dynamics

[Koren ’09]

Page 10: HarambeeNet: Data by the people, for the people

the challengebridging

Page 11: HarambeeNet: Data by the people, for the people

Soylent

FeedMe

Eddi

Collabio

UIS

T ‘1

0U

IST ‘1

0C

HI ‘1

0U

IST ‘0

9

Page 12: HarambeeNet: Data by the people, for the people

Soylent A Word Processor with a Crowd Inside

[Bernstein et al. UIST ‘10]

human computationmarketsvoting

Page 13: HarambeeNet: Data by the people, for the people

Wizard of OzInterface

Page 14: HarambeeNet: Data by the people, for the people

Highly-educated workers, mostly from the U.S. and India

Appropriate for generic cognition tasks with little intrinsic motivation

Page 15: HarambeeNet: Data by the people, for the people

Wizard of OzInterfaceWizard of TurkWire paid human computation directly into an interface

Page 16: HarambeeNet: Data by the people, for the people

Editing for length is excruciatingEven experts make writing mistakes

High-level decisions result in lots of small tasks

Page 17: HarambeeNet: Data by the people, for the people

Shortn: Text Shortening

Page 18: HarambeeNet: Data by the people, for the people

Blog – 83%Print publishers are in a tizzy over Apple’s new iPad because they hope to finally be able to charge for their digital editions. But in order to get people to pay for their magazine and newspaper apps, they are going to have to offer something different that readers cannot get at the newsstand or on the open Web.Classic UIST – 87%The metaDESK effort is part of the larger Tangible Bits project. The Tangible Bits vision paper, which introduced the metaDESK along withand two companion platforms, the transBOARD and ambientROOM.Draft UIST – 90%In this paper we argue that it is possible and desirable to combine the easy input affordances of text with the powerful retrieval and visualization capabilities of graphical applications. We present WenSo, a tool thatwhich uses lightweight text input to capture richly structured information for later retrieval and navigation in a graphical environment..Rambling E-mail – 78%A previous board member, Steve Burleigh, created our web site last year and gave me alot of ideas. For this year, I found a web site called eTeamZ that hosts web sites for sports groups. Check out our new page: […]Technical Computer Science – 82%Figure 3 shows the pseudocode that implements this design for Lookup. FAWN-DS extracts two fields from the 160-bit key: the i low order bits of the key (the index bits) and the next 15 low order bits (the key fragment).

Page 19: HarambeeNet: Data by the people, for the people

Crowdproof: Human Proofreading

Finds errors that AIs miss, explains the reason behind the problem in plain English, and suggests fixes

Page 20: HarambeeNet: Data by the people, for the people

The Human MacroMacro scripting without programming

‘‘Please change text in document from past tense to present tense.’’ I gave one final glance around before descending from the barrow. As I did so, my eye caught something […]

I give one final glance around before descending from the barrow. As I do so, my eye catches something […]

Page 21: HarambeeNet: Data by the people, for the people

The Human MacroMacro scripting without programming

‘‘Pick out keywords from the paragrah like Yosemite, rock, half dome, park. Go to a site which hsa CC licensed images […]’’

When I first visited Yosemite State Park in California, I was a boy. I was amazed by how big everything was […]

http://commons.wikimedia.org/wiki/File:03_yosemite_half_dome.jpg

Page 22: HarambeeNet: Data by the people, for the people

The Human MacroMacro scripting without programming

‘‘Hi, please find the bibtex references for the 3 papers in brackets. You can located these by Google Scholar searches and clicking on bibtex.”Duncan and Watts [Duncan and watts HCOMP 09 anchoring] found that Turkers will do more work when you pay more, but that the quality is no higher.

@conference { title={{Financial incentives […]}}, author={Mason, W. and Watts, D.J.}, booktitle={HCOMP ‘09}, […]}

Page 23: HarambeeNet: Data by the people, for the people

Programming Crowd Workers

Rule of Thumb: 30% of worker effort on open-ended tasks will have an error in it

Two useful personas: The Lazy Turker and The Eager Beaver

Page 24: HarambeeNet: Data by the people, for the people

The Lazy Turker

Does as little work as necessary to be paidThe theme of loneliness features throughout many scenes in Of Mice and Men and is often the dominant theme of sections during this story. This theme occurs during many circumstances but is not present from start to finish. In my mind for a theme to be pervasive is must be present during every element of the story. There are many themes that are present most of the way through such as sacrifice, friendship and comradship. But in my opinion there is only one theme that is present from beginning to end, this theme is pursuit of dreams.

Page 25: HarambeeNet: Data by the people, for the people

The Lazy Turker

Does as little work as necessary to be paidThe theme of loneliness features throughout many scenes in Of Mice and Men and is often the dominant theme of sections during this story. This theme occurs during many circumstances but is not present from start to finish. In my mind for a theme to be pervasive is must be present during every element of the story. There are many themes that are present most of the way through such as sacrifice, friendship and comradeship. But in my opinion there is only one theme that is present from beginning to end, this theme is pursuit of dreams.

Page 26: HarambeeNet: Data by the people, for the people

The Lazy Turker

Does as little work as necessary to be paidThe theme of loneliness features throughout many scenes in Of Mice and Men and is often the dominant theme of sections during this story. This theme occurs during many circumstances but is not present from start to finish. In my mind for a theme to be pervasive is must be present during every element of the story. There are many themes that are present most of the way through such as sacrifice, friendship and comradship. But in my opinion there is only one theme that is present from beginning to end, this theme is pursuit of dreams.

Page 27: HarambeeNet: Data by the people, for the people

The Eager Beaver

Go beyond task requirements to be helpful, but introduce errors in the processThe theme of loneliness features throughout many scenes in Of Mice and Men and is often the dominant theme of sections during this story. This theme occurs during many circumstances but is not present from start to finish. In my mind for a theme to be pervasive is must be present during every element of the story. There are many themes that are present most of the way through such as sacrifice, friendship and comradship. But in my opinion there is only one theme that is present from beginning to end, this theme is pursuit of dreams.

Page 28: HarambeeNet: Data by the people, for the people

The Eager Beaver

Go beyond task requirements to be helpful, but introduce errors in the processThe theme of loneliness features throughout many scenes in Of Mice and Men and is often the dominant theme of sections during this story. \nThis theme occurs during many circumstances but is not present from start to finish. \nIn my mind for a theme to be pervasive is must be present during every element of the story. \nThere are many themes that are present most of the way through such as sacrifice, friendship and comradeship.\n But in my opinion there is only one theme that is present from beginning to end, this theme is pursuit of dreams.

Page 29: HarambeeNet: Data by the people, for the people

Find-Fix-Verify

A design pattern that controls the efforts of the Lazy Turker and the Eager Beaver

Separates open-ended tasks into three stageswhere each worker makes a clear contribution

Page 30: HarambeeNet: Data by the people, for the people

Find

Fix

Verify

“Identify at least one area that can be shortened without changing the meaning of the paragraph.”

“Edit the highlighted section to shorten its length without changing the meaning of the paragraph.”

Soylent, a prototype...

“Choose at least one rewrite that has significant style errors in it. Choose at least one rewrite that significantly changes the meaning of the sentence.”

Independent voting to identify patches

Randomize order of suggestions

Page 31: HarambeeNet: Data by the people, for the people

Why Find-Fix-Verify?

Why split Find and Fix?Force Lazy Turkers to work on a problem of our choiceAllows us to merge work completed in parallel

Why Add Verify?Quality raises when we put Turkers at odds with each

otherTrade off lag time with quality

Page 32: HarambeeNet: Data by the people, for the people

Data is made of people,Data is made by people,Data is made for people.

Page 33: HarambeeNet: Data by the people, for the people

Collaborators

Rob Miller, David Karger, Greg Little, Katrina Panovich, David Crowell

Mark Ackerman

Björn Hartmann

…and about 9000 Turkers.

I am generously kept off the streets by an NSF GRFP and NSF award IIS-0712793.

Page 34: HarambeeNet: Data by the people, for the people

BlogPrint publishers are in a tizzy over Apple’s new iPad because they hope to finally be able to charge for their digital editions. But in order to get people to pay for their magazine and newspaper apps, they are going to have to offer something different that readers cannot get at the newsstand or on the open Web.Classic UISTThe metaDESK effort is part of the larger Tangible Bits project. The Tangible Bits vision paper introduced the metaDESK along with two companion platforms, the transBOARD and ambientROOM.Draft UISTIn this paper we argue that it is possible and desirable to combine the easy input affordances of text with the powerful retrieval and visualization capabilities of graphical applications. We present WenSo, a tool that uses lightweight text input to capture richly structured information for later retrieval and navigation in a graphical environment..Rambling E-mailA previous board member, Steve Burleigh, created our web site last year and gave me alot of ideas. For this year, I found a web site called eTeamZ that hosts web sites for sports groups. Check out our new page: […]Highly Technical WritingFigure 3 shows the pseudocode that implements this design for Lookup. FAWN-DS extracts two fields from the 160-bit key: the i low order bits of the key (the index bits) and the next 15 low order bits (the key fragment).

Page 35: HarambeeNet: Data by the people, for the people

Blog – 83%Print publishers are in a tizzy over Apple’s new iPad because they hope to finally be able to charge for their digital editions. But in order to get people to pay for their magazine and newspaper apps, they are going to have to offer something different that readers cannot get at the newsstand or on the open Web.Classic UIST – 87%The metaDESK effort is part of the larger Tangible Bits project. The Tangible Bits vision paper, which introduced the metaDESK along withand two companion platforms, the transBOARD and ambientROOM.Draft UIST – 90%In this paper we argue that it is possible and desirable to combine the easy input affordances of text with the powerful retrieval and visualization capabilities of graphical applications. We present WenSo, a tool thatwhich uses lightweight text input to capture richly structured information for later retrieval and navigation in a graphical environment..Rambling E-mail – 78%A previous board member, Steve Burleigh, created our web site last year and gave me alot of ideas. For this year, I found a web site called eTeamZ that hosts web sites for sports groups. Check out our new page: […]Technical Computer Science – 82%Figure 3 shows the pseudocode that implements this design for Lookup. FAWN-DS extracts two fields from the 160-bit key: the i low order bits of the key (the index bits) and the next 15 low order bits (the key fragment).

Page 36: HarambeeNet: Data by the people, for the people

Average Performance

Cost: $1.41 per paragraph$0.55 to Find an average of two patches$0.48 to Fix each patch$0.38 to Verify the results

Time:Wait : median 18.5 minutes (Q1 = 8.3 min, Q3 = 41.6 min)

Work: median 2.0 minutes (Q1 = 60 sec, Q3 = 3.6 min)

Page 37: HarambeeNet: Data by the people, for the people

Qualitative Observations

Works best with unnecessary text[…] they are going to have to offer something different […]

Lack of domain knowledge[…] In this paper we argue that tangible interfaces […]

Parallel edits can be inconsistentFAWN-DS extracts two fields from the 160-bit key: the i low order bits of the key (the index bits) and the next 15 low order bits (the key fragment).