human computation tasks with global constraints:...

Human Computation Tasks with Global Constraints:A Case Study

Haoqi Zhang1, Edith Law2, Robert C. Miller3, Krzysztof Z. Gajos1, David C. Parkes1, Eric Horvitz4

1Harvard SEAS 2Carnegie Mellon University 3MIT CSAIL 4Microsoft ResearchCambridge, MA Pittsburgh, PA Cambridge, MA Redmond, WA

{hq, kgajos, parkes} [email protected] [email protected] [email protected]@eecs.harvard.edu

ABSTRACTAn important class of tasks that is underexplored in currenthuman computation systems are complex tasks for which thesolutions demand a set of global constraints to be met. Oneexample of such a task is itinerary planning, where solutionsconsist of a sequence of activities that meet the requirementsspecified by the requester. In this paper, we focus on the jointformulation of such plans as a case study of constraint-basedhuman computation task and introduce a collaborative plan-ning system called Mobi. Mobi presents a single interfacethat enables participants to view the current solution contextand make appropriate contributions based on current needsduring the progression of collaboration. We conduct exper-iments that show the end-to-end usability of Mobi, and thatexplain how the system enables a crowd to effectively andcollaboratively resolve global constraints.

Author KeywordsHuman Computation, Crowdware, Iterative Task, Groupware,Collaborative Planning

ACM Classification KeywordsH5.2 [Information interfaces and presentation]: User Inter-faces. - Graphical user interfaces.

General TermsDesign, Human Factors

INTRODUCTIONHuman Computation [16, 9] is an evolving paradigm with nu-merous studies and applications over the last decade. Mosthuman computation tasks explored to date are simple andeasy to parallelize – however, several recent studies (e.g.,Soylent [1], CrowdForge [7], PlateMate [13]) have tackledmore complex tasks by using workflows that link together in-dependent modules that seek different types of effort. Theseworkflows serve as algorithms that coordinate among the in-puts and outputs of different task modules, allowing complextasks to be decomposed into more manageable subtasks that

To appear in Proceedings of the 2012 ACM Conference on Human Factorsin Computing Systems (CHI 2012).

Figure 1. Planning mission (left) and itinerary (right)

the crowd can contribute to independently without having toreason about other modules or the task at large.

Within studies of human computation, an important class ofunderexplored tasks are those that demand the solution to sat-isfy a (potentially large) set of global requirements. For ex-ample, in leveraging the crowd to write an essay, a requestermay want to specify requirements on the desired tone, tense,length, structure of arguments, and style of exposition thatmust hold consistently throughout a piece of writing. Somerequirements, e.g., presenting a balanced perspective on a sit-uation, touch upon different components of the essay thatneed to be considered as a whole and in the context of thecomposition of the essay.

As another example, consider the problem of crowdsourcingitinerary planning. Planning events such as vacations, out-ings, and dates often involve an itinerary (Figure 1), whichcontains an ordered list of activities that are meant to be ex-ecuted in sequence over the course of an event. People go-ing on a trip have preferences and constraints over the typesof activities of interest (e.g., “I want a coffee break right af-ter lunch”), how long to spend on different activities (e.g., “Iwant to spend at least 2 hours in parks”), the composition ofactivities (e.g., “I want to focus on art galleries and museumsfor the day”), the budget, and the time available, which definea set of global requirements that an itinerary should satisfy.

Decisions on any particular activity in the itinerary may nat-urally affect other decisions. As examples, spending time onany particular activity leaves less time for another, and mov-ing to one location introduces distances to other locations.

Tasks consisting of such requirements are difficult to crowd-source because the global constraints are hard to explicitlycapture and maintain, and decomposing the task and crowd-sourcing for partial contributions independently provides noguarantees as to how well the composed solution would sat-isfy the set of global constraints. Instead of splitting suchtasks into separate subtasks, we demonstrate that having asingle workspace environment in which the current solutionand problem solving context are exposed and shared amongparticipants can enable effective contribution towards build-ing a solution that meets global constraints. We take inspi-ration from groupware systems [3], which suggest principlesand ideas on communication, collaboration within a sharedcontext, keeping in mind the goal and progression towards it,that help a group to accomplish a joint task. We consider howto apply such principles and ideas to crowd workers, who,unlike a cohesive group may be anonymous, have limited in-centives to spend time and effort to grasp the current solutioncontext, may make more noisy contributions, and may notreturn after making a single contribution.

We demonstrate how to coordinate a crowd to accomplishtasks involving global constraints by considering as a casestudy the problem of crowdsourcing itinerary planning. Byrecruiting a crowd to custom plan itineraries, we seek to bothdraw on the crowd’s ability to provide valuable suggestionsfor activities, and to engage the crowd to resolve global con-straints to build good, feasible itineraries. We introduce acollaborative itinerary planning system called Mobi, whichtakes as input a planning mission containing a set of quali-tative and quantitative constraints as articulated by the userand produces as output an itinerary that satisfies the mission.The crowd contributes via a single interface — displaying thecurrent itinerary and a stream of ideas generated thus far –that allows individuals to contribute opportunistically giventhe current context and to see their contributions incorporatedinto the solution in real-time. Mobi focuses the crowd’s atten-tion on what needs work by prominently displaying a list ofautomatically-generated todo items, which point out violatedconstraints, provide suggestions on how to address them, andpromote activities directed at the refinement of the itinerary.

We hypothesize that elements of Mobi’s design – namely theuse of todo items and availability of a single interface en-abling the crowd to jointly view and contribute to the currentsolution and existing ideas – promote an effective collabo-rative planning environment in which the crowd can quicklyresolve specified constraints and produce custom-tailoreditineraries that satisfy the stated missions. To validate this, weconducted two studies. First, we ran an experiment to test theeffect of displaying the automatically-generated todo itemson the rate at which constraints are resolved by the crowd,and measured the contribution patterns of the crowd workers.We find that the use of todo items leads constraints to be sat-isfied at a significantly faster rate than when they are not dis-played, and that the crowd’s editing patterns show evidenceof both collaboration and opportunistic planning. Second, weconducted a user study to understand whether crowd workersare able to generate itineraries that are good from the per-spective of target end users. Users found that the generated

itineraries contain a good mix of activities, mostly or fullysatisfy their mission requirements, and are useful for their ac-tual trips. After presenting these results, we revisit the designprinciples behind Mobi and discuss how they can facilitate acrowd to tackle problems involving global constraints.

RELATED WORKPlanning can be viewed as an iterative task in which work-ers make successive edits to improve the solution. Therehas been some attention on iterative tasks in human com-putation [11], and an interesting recent example is work byKittur et al. [6] that recruits workers to collaborate in Ether-pad to translate a poem. Workers were able to see their editsreflected in real time (as in Mobi), and can communicate toexplain their edits via chat. One difference in Mobi is that thesystem uses its sense of the progress made so far, e.g., howfull the itinerary is, which constraints are violated, etc., toprompt users on what needs work and drive the problem solv-ing process. Other recent human computation systems recruita crowd to plan (e.g., Turkomatic [8] and CrowdPlan [10])– our work differs in that we are interested in planning tasksthat have a set of global requirements that need to be met.

Wikipedia can be viewed as an example of a system in whicha (mostly expert) crowd write and edit articles to resolve aset of global constraints as defined by Wikipedia’s standards.Much like the way todo items are used in Mobi to drive progress,template messages and cleanup tags are used in Wikipedia [17]to alert editors of changes that need to be made to improvean article. Such messsages are typically managed by humancontributors, whereas in Mobi todo items are managed auto-matically whenever possible.

There have been several models of how people plan. Thesuccessive refinement model advocates a top-down approach,where a high-level goal is decomposed into subgoals itera-tively, down to a sequence of elementary actions [14]. In con-trast, the planning of many everyday activities (e.g., errands)is opportunistic – i.e., planning decisions happen wheneveropportunities arise [4, 5], so that a decision or observationin one part of the plan may suggest new ideas or illuminateproblems in a different part of the plan, causing the plan-ner to re-focus his attention. It was also found that plan-ning can be multi-directional, involving both top-down andbottom-up processing. For example, in an errand planningexperiment [4], it was found that subjects would start makingdetailed plans (e.g., sequence individual errands), and thenswitch to planning on a more abstract level (e.g., by discov-ering clusters of errands), and back and forth as they refinethe plan. Mobi is built with the opportunistic planning modelin mind, where individuals in the crowd are allowed to con-tribute freely as they see fit based on their observations ofwhat needs work given the current solution context.

Real-life planning is a difficult problem for computers; de-spite advances in automated planning [12], a major challengeis in making sense of people’s goals, preferences and other‘soft’ considerations [2]. Mobi allows users to specify theirwants and needs in natural language, thereby enabling com-plex constraints and preferences to be expressed and used

Figure 2. The Mobi planning interface consists of the information panel (top), the brainstream (left), and the map and itinerary viewer (right).

in the planning process. Currently, the AI planner in Mobisupports workers by automatically checking constraints andcomputing trip times and routes. In the future, AI may playa more active role in the planning process by learning aboutdifferent requirements, suggesting activities and their compo-sition in the itinerary, or even detecting and adding importantconstraints that may have been missed by the requester.

There are a few existing commercial systems that either al-low groups to plan trips for themselves or to ask friends andother members for suggestions. Examples include triporama,Kukunu, FriendTripper, and Google Maps. Mobi differs fromsuch systems in that it seeks to produce not only suggestionsfor activities but an itinerary satisfying a set of global re-quirements, and can focus the crowd on making contributionswhere they are most needed.

MOBI: A SYSTEM FOR ITINERARY PLANNINGThe Mobi prototype takes as inputs a planning mission con-sisting of preferences and constraints, and generates an itineraryby having a crowd asynchronously plan using a shared inter-face. Workers invited to contribute can view the current planand all ideas proposed thus far, and make contributions asthey see fit. Edits can be made at any time and without restric-tions, and the itinerary automatically saves after each change.In this section, we describe the modules of Mobi – used forspecifying the planning mission and assembling the itinerary– and discuss how these two key modules support the processof generating itineraries and resolving constraints. The Mobiinterface is shown in Figure 2).

Specifying the Planning MissionOur target users, also referred to as requesters, are peoplewho are interested in planning a trip. To start planning, therequester enters a planning mission using a simple web in-terface, which specifies the title and description of the trip,start/end locations and times, and whether they will use pub-lic transit or drive between locations in addition to walking.

Requesters can express two kinds of constraints – qualita-tive and quantitative. Figure 1 shows an example of a plan-ning mission that include both types of constraints. Qualita-tive constraints are described in natural language and spec-ify the nature of the trip, what the user hopes to accomplish,who they are traveling with, etc. Quantitative constraintscan be specified by creating natural language categories (e.g.,“museum visits,” “by the ocean”) and assigning preferencesand limitations over those categories. One can specify con-straints on the number of activities in each category (e.g., “Iwant to attend no more than two museum visits”), as wellas on the amount of time to spend on activities in each cat-egory (e.g., “I want to spend at least two hours shoppingin downtown”). Such constraints can then be used to alsoexpress the preferred combination of activities in the plan(e.g., “I want to spend half of my time on activities by theocean, and the other half on activities in the city”). Moregenerally, the allowable format for each constraint is “I want{at most, at least, exactly} [number] {activities, hours} of{cat1, cat2, . . . , catn},” where cati refers to the i-th cate-gory. Finally, there are two system-generated time constraints,which specify that the cumulative duration of the activities inthe itinerary should not be significantly less than, or greaterthan, the duration of the trip as specified by the user.

Assembling the ItineraryOnce a planning mission is specified, workers can access theplanning mission by clicking on the reveal mission detailsbutton (see Figure 2, on top) to bring up the information panel.The planning interface consists of two components – the brain-stream and the itinerary viewer.

Brainstream

The brainstream (see Figure 2, on left) is a collection of ev-eryone’s ideas. An idea can be an activity (“something to door see”) or a note (“a thought about the plan”).

To view ideas in the brainstream, one can either scroll downthe list, click on a hashtag to display ideas belonging to a

Figure 3. The brainstream displays system-generated todo items, alert-ing workers in regards to what needs work.

Figure 4. Adding a new activity to the brainstream via a dialog box.

particular category, or use the auto-completing search box.Clicking on an idea reveals additional details via a dialog box,and presents the option to edit the idea, or in the case of anactivity, an option to add it to or remove it from the currentitinerary. A blue badge next to an activity indicates that it isalready in the current itinerary.

To add a new idea (an activity or a note), one can type a titleinto the search box and click ‘add’. If similar ideas alreadyexist, a drop down list will appear, which helps to preventduplicates. For notes, workers can fill in a description if de-sired. In the activity editor (see Figure 4), workers are askedto provide the location name, what to do or see, the activity’sduration, and the (requester-defined) categories that the activ-ity belongs in. In the same editor, workers will also find amap, which allows them to locate points of interests match-ing the location name. Finally, workers can decide to add theactivity to both the itinerary and the brainstream, or only tothe brainstream for the time being.

The motivation behind the brainstream is to allow people tobrainstorm together and build upon each other’s ideas; in ad-

dition, our goal is to keep around all alternative activities, andallow workers to quickly access them through the hashtags.Its design draws inspirations from social technologies suchas Twitter and Piazza,that aggregate information into a feedor stream that one can then easily process.

If the current itinerary does not satisfy a stated quantitativeconstraint or is over time or under time, the violated con-straints are automatically turned into todo items that are dis-played at the top of the brainstream in a different color, alert-ing workers as to what needs work (e.g., see Figure 3). Thetodo items suggest specific actions, e.g., “Add more lunchactivities,” or, “The itinerary is over time. Try reorderingitinerary items. You can also edit or remove items.” Theyalso provide natural language explanations of how the cur-rent itinerary violates particular constraints, e.g., “You needexactly one lunch activity but there is currently none in theitinerary,” or “The itinerary is over time because the trip mustend by 9pm.” By adding notes, workers can also identify ar-eas that need work or raise questions about the plan’s feasibil-ity, that other workers can help to address or provide furthercomments.

We note that the system is able to check the quantitative con-straints without understanding the meaning of the natural lan-guage categories; this is because workers tag activities withthe categories they belong in. As we will show in the next sec-tion, the todo items is an important design element that helpsto accelerate the speed with which constraints are resolved.

Itinerary Viewer

The itinerary viewer (see Figure 2, on right) consists of anitinerary and a map. The itinerary displays the activities inorder, alongside the times during which they are scheduledto take place, with travel times between locations automat-ically computed and accounted for. It is accompanied by amap showing the activities’ locations and routes between lo-cations. The map and itinerary allow one to see at a glancewhether the plan is coherent, that is, if any activities are out ofplace (e.g., lunch happening too early, activities whose ordercan be swapped to avoid unnecessary back-and-forth travel),or if too much or too little time is spent on an activity. Theitinerary doubles as an editor – workers can drag and dropactivities to re-arrange their order, and click an activity to seeits details, edit it, or remove it from the itinerary. On anyitinerary change (i.e. via adding, removing, editing, or re-ordering of activities), the itinerary, activity times, map dis-play, trip time, and todo items automatically update, whichprovides direct feedback to the workers as they attempt to re-fine the itinerary.

Mobi promotes collaboration by making the plan always vis-ible, editable by everyone, and readily updated. This followsthe WYSIWIS (‘What You See Is What I See’) principle [15],which ensures that every participants have equal access toshared information. Mobi also supports opportunistic plan-ning, by providing support for both top-down and bottom-upplanning, and a fluid way to move back and forth between thetwo. For example, as workers plan at a detailed level (e.g.,suggesting activities in the brainstream), they may become

(a) Mother/Daughter NYC (b) Chicago with young children

(c) Vegas with buddies (d) Family in DC

Figure 5. Experiment: planning missions and the corresponding itineraries generated by the crowd in the todo items condition

aware of shortcomings of the current itinerary, which in turnprompts them to start considering the itinerary as a whole;likewise, when workers refine the itinerary, they may think ofnew activities to add to the brainstream, or ways to elaborateon the details of a particular activity in the current itinerary.

EXPERIMENT: TODO OR NOT TODOWe hypothesize that elements of Mobi’s design, namely todoitems and having a shared interface in which the crowd canwork off the current solution context and existing ideas, pro-motes the crowd to effectively and collaboratively resolveconstraints to produce itineraries that satisfy stated missions.Focusing first on quantitative constraints, we conducted anexperiment using two versions of Mobi – one that displaystodo items and one that does not – to evaluate the effect oftodo items on how quickly the crowd can reach good solu-tions by resolving the stated constraints.

MethodWe created custom day-trip planning missions for each ofeight major U.S. cities: New York, Chicago, DC, Vegas, LosAngeles, San Francisco, Seattle, and San Diego. We recruited

Mechanical Turk workers in the US who have a 95% or higherapproval rating to contribute to the planning missions, byworking on HITs (human intelligence tasks) in which the Mobiinterface was fully embedded. The interface is nearly iden-tical to that shown in Figure 2, with the addition of a sub-mit button on the bottom, a ‘HIT instructions’ button replac-ing the ‘what you can do to help’ button in the informationpanel, and the addition of a ‘continue to improve the itinerary’todo item that displays whenever the itinerary satisfies all thequantitative constraints in the planning mission. Turkers wereasked to make ‘micro-contributions’ as they plan the trip withother Turkers, and were told that they can submit a HIT assoon as they have made any contribution. Turkers were paid15 cents per HIT, and no verification was used other than re-quiring Turkers to have made some edit (however small) tothe brainstream or itinerary before submitting the task. Forhalf of the cities, the version with todo items was posted priorto that with no todo items, and the order of posting was re-versed for the other cities. Missions were posted for up tofour days. Other than the display of todo items, the interface,job description, and instructions are identical in the two con-ditions.

NYC Chicago Las Vegas DC# unique workers 17 15 16 21� 1 6 5 7 8# activities in brainstream 35 16 18 28# activities in itinerary 11 10 11 13# edits in brainstream 50 54 43 57# edits in itinerary 193 140 75 154# notes in brainstream 1 0 9 1# of HITs 64 31 47 50Total cost ($) 9.60 4.65 7.05 7.50

Table 1. Summary statistics about the final itineraries, contributions byand payments to Turkers. � 1 = # workers with at least one suggestedactivity in the final itinerary.

Results I: The Generated ItinerariesFigure 5 (on the previous page) shows a sample of the plan-ning missions and the corresponding itineraries generated byTurkers in the todo condition. All eight itineraries satisfythe stated quantitative constraints, and furthermore, times formeals and activities (e.g., Broadway shows) are reasonabledespite the lack of support for specifying explicit times duringwhich those activities should occur. From the final itineraries,it appears that Turkers not only pay attention to the quantita-tive constraints, but also the mission description, e.g., by fill-ing the itinerary with government/history related activities inDC, and by offering low-cost options in Vegas.

Table 1 summarizes, for each of the examples shown in Fig-ure 5, statistics about the final itineraries, different types ofedits Turkers made, and the amount of money paid to work-ers. We see that the final itineraries contain original ideasfrom multiple workers. Specifically, Turkers generated justover twice as many ideas for activities as are in the finalitineraries, and generally used notes sparingly. When noteswere added, they provided commentary on alternative sug-gestions (“They are a better place then Pasty’s by far, andhave better service, plus that perfect dessert.”), noted errorsin activities (“Barbary Coast isn’t called ‘Barbary Coast’ any-more”), presented general advice (“You can buy a MealTicketwhich will allow you to eat free at many places.”), or pointedout problems with the plan (“why are we eating so much din-ner?”).

Results II: Effects of Todo ItemsResults show that when prompting users with todo items, con-straints are satisfied much more quickly than when todo itemsare not displayed.1

We make three observations. First, there is a significant dif-ference (t(7) = 3.65, p = 0.0082) in the number of HITsit took to satisfy (for the first time) all of the stated quan-titative constraints between the todo condition (µ = 16.5,1In this experiment, we observed one particular worker who com-pleted a large number of HITs in the no todo condition but in do-ing so were splitting single contributions across two to three HITsby creating activities without description in one HIT, filling in thedescription in another, and sometimes adding the activity to theitinerary in a third HIT, when all this can and should have been ac-complished in one HIT. To avoid biasing the results in favor of thetodo condition, in reporting results we remove data points from thisparticular user in which his HITs did not alter the itinerary and thatthus have no direct effect on satisfying the constraints.

Figure 6. The number of HITs it took to satisfy all quantitative andsystem generated time constraints for each city in the todo and no todoconditions. For cities marked by an asterisk, their itineraries in the notodo condition still have violated constraints; in such cases we reportedthe number of HITs thus far.

Figure 7. Cumulative distribution on the violation duration of con-straints in the todo item versus no todo conditions, showing the fractionof constraints that are satisfied after at most k HITs since the time it waslast violated.

� = 9.65) and the no todo condition (µ = 39.5, � = 14.8);2see Figure 6 for a city-by-city breakdown. Second, there isalso a significant difference (t(7) = 4.247, p = 0.0038) inthe number of HITs it took to satisfy all constraints for thefirst time (this includes system generated time constraints)between the todo condition (µ = 22.5, � = 8.5) and theno todo condition (µ = 45.38, � = 13.9). Finally, since con-straints can be repeatedly violated and satisfied throughoutthe process of planning, in order to understand how quicklyconstraints get satisfied on average, we introduce the notionof the violation duration of a constraint, which is the numberof HITs it takes for a constraint to be satisfied by the itinerarysince it was last violated (which could be when it is first intro-duced). We observe a significant difference in the average vi-olation duration of quantitative constraints between the todocondition (µ = 5.64, � = 6.34) and the no todo condition(µ = 10.5, � = 10.97); the results are statistically significant(t(134) = 3.206, p = 0.0017).

Figure 7 shows the cumulative distribution on the violationdurations of constraints in the todo items versus no todo con-ditions. We observe that for any duration, a larger fraction ofthe constraints are satisfied within that duration in the todocondition than the no todo condition. We also see that morethan half of the constraints are satisfied after 3 or fewer HITsin the todo condition.

Figure 8 shows, for the todo versus no todo conditions, therate at which each constraint gets satisfied as workers con-tribute to the planning effort for the Seattle and Chicago plan-ning missions. We observe that constraints are satisfied much2In some cases for the no todo condition, no itinerary satisfied allthe stated requirements in the course of the experiment. In suchcases the number of HITs thus far was used as a lower bound forcomparison.

(a) Seattle with todo items (b) Seattle with no todo items

(c) Chicago with todo items (d) Chicago with no todo items

Figure 8. The rate at which constraints are satisfied during the planningprocess for the Seattle and Chicago missions. The height of each bar in-dicates the number of constraints unsatisfied after k HITs. Each coloredsegment represents a particular quantitative constraint, and its heightindicates the extent to which it is violated. The black segment representsthe percent by which the itinerary is over time or under time (when it isgreater or less than 5%).

more quickly in the todo condition. The Chicago case is par-ticularly interesting; whereas in the todo condition a workerviolated a previously satisfied constraint while editing andproceeded to make successive edits that led to satisfying all ofthe constraints, in the no todo condition a satisfied constraintswas violated and then left addressed for quite some time. Thisexample illustrates the power of immediate feedback – whenan edit to the itinerary violates some constraints, the automat-ically generated todo items are able to not only alert workersas to what needs fixing, but also make them aware that theiredits have direct effects on the constraints associated with theplanning mission.

Results III: Editing PatternsHaving shown that todo items play an important role in fo-cusing the crowd’s effort towards satisfying the quantitativeconstraints, we turn to investigate the crowd’s work processwhile using Mobi in the todo condition. In particular, we lookfor evidence of collaborative behavior from the crowd, and inhow the crowd plans using the current context of the plan.

We focus first on the process of generating ideas for activi-ties. We observe that roughly half of the edits to the brain-stream contain suggestions (52%) while the other half (48%)are edits of existing ideas in the itinerary. Of the edits, 72%of them are edits on ideas that originated from someone else(i.e., other than the person editing), which suggests that work-ers are working off others’ contributions as they refine ideasand the itinerary. When editing an activity, we see that editsare predominantly on an activity’s duration (80%), but thereare also edits to change title/descriptions (7%) and to correctan item’s location coordinates (12%). While duration edits

are often in the context of resolving some constraints, editsto the title, description, and location are particularly encour-aging to see as they suggest that the interface is providing(via the brainstream feed, map, itinerary) means for users todiscover and improve on existing ideas.

Turning to the patterns of itinerary edits, we observe thatmost of the contributions come from adding (31%) and re-ordering activities (32%), but that workers also edit existingideas (22%) and remove activities (14%). This is encour-aging to see because workers are using the different actionsavailable to them to plan as they see fit as they improve theitinerary. When tasks are left to run after the quantitative con-straints are all satisfied, we observe that itineraries continueto evolve; workers replace activities in the itinerary with otheractivities, reorder the itinerary, edit existing items, and so on.While constraints may be violated during such edits, work-ers were reminded of such violations by the todo items andviolated constraints were quickly resatisfied (e.g., see Figure8(c)). This suggests that new ideas can continue to be gener-ated and incorporated into the itinerary; in fact, workers areencouraged to do both because they are paid for such contri-butions and because we display a todo item that asks usersto continue to improve the itinerary whenever all quantitativeconstraints are met.

Throughout the experiment, we saw very few Turkers whoblatantly tried to game the system, and the kinds of gamingbehavior we observed generally fell into two categories. Inone, a Turker under-specifies an activity, either by creating anactivity without filling in its description and location, or byadding a note containing a suggestion for an activity insteadof just adding the suggested activity. In the other, a Turkerwould fully specify an activity, but use two or even threeHITs to do so, by spending a HIT on creating the activity,another to edit its details, and another to add it to the itinerary– when all this can be accomplished with a single ‘add ac-tivity’ action. While it is certainly useful to consider tweaksthat would curb such behaviors (e.g., by requiring activities tocontain descriptions, by not allowing workers to submit HITsin which they have only edited their own ideas, etc.), it is in-teresting to note that such gaming behaviors from a small setof Turkers did not seem to negatively impact the planning pro-cesses nor the resulting solutions. In particular, we saw thatpoorly-formed ideas were simply ignored, removed from theitinerary, or edited by another worker who discovered it viathe auto-complete search box in the brainstream, which oc-curred as a natural part of the iterative process through whichworkers improved the itinerary.

END-TO-END USER STUDYHaving seen that workers can effectively resolve quantitativeconstraints using Mobi, we conducted a user study throughwhich we seek to evaluate the generated itineraries and howwell they satisfy not only quantitative constraints, but alsoqualitative constraints, from the perspective of requesters.

MethodWe recruited 11 subjects to participate in the study via the useof several university mailing lists. We specified requirements

(a) Subject 1: Las Vegas (b) Subject 2: Orlando

Figure 9. User study: planning missions and the corresponding itineraries generated by the crowd.

that participants be actually planning a forthcoming trip to amajor U.S. city. Recruited subjects are a mix of undergrads,graduate students, and research scientists. We report on theresponses of 10 of the users, as one of the users specified a tripdestination that is not a major U.S. city. Users were instructedto describe their planning mission, which includes qualita-tive and quantitative preferences and constraints. Users weregiven unlimited access to Mobi for over a week, during whichthey were free to modify their planning mission or to partici-pate in the planning process. Missions were crowdsourced toMechanical Turk workers as was done in the todo versus notodo experiment. At the end of the study, users were asked tofill out a questionnaire, which asked them to evaluate the finalitinerary and describe their experiences using Mobi. Userswere each paid $30 for their participation.

The trip destinations users specified include Boston, New YorkCity, San Francisco, Las Vegas, Orlando, and WashingtonDC. The planning missions vary in length and specificity.Figure 9 provides two examples of user missions and the gen-erated itineraries.

ResultsWe consider three measures of quality of an itinerary, namelythe extent to which it (1) contains activities that the requesterlikes, (2) satisfies the qualitative and quantitative constraintsspecified in the planning mission, and (3) serves its purposeas a plan that is feasible, useful, and executable in real life.

1. Do itineraries contain activities that the requesters like?

Users were shown each of the itinerary activities (title, de-scription, start time, end time, duration) and asked to rate howmuch they think they would enjoy each activity on a 5-pointscale (1=“hate it”, 5=“love it”).

Figure 10 shows a histogram of the activity ratings across allparticipants. Across all ten users, the mean rating was 4.03

Figure 10. Histogram of Activity Ratings

(� = 0.44). Users also mentioned the activities are diverse,interesting, and often unknown to them prior to using Mobi.

2. Do itineraries satisfy the qualitative and quantitative con-straints specified in the planning mission?

All of the users answered that their itinerary fulfilled most orall of the requirements they had specified. Some users notedspecific activities that they do not like (e.g., “I just happen tobe afraid of bungee jumping because it seems so unsafe, buta similar activity would be fun,” “I am under age so the winething would not be great for me but everything else soundsgreat.”) or complained about too many activities (e.g., “Therewas far far far too much packed into a single day, but the ideaswere all totally interesting.”) or visiting too many parts of acity in one day (e.g., “For the most part, it was a good mixof things to do. I was not expecting to travel so much up-town/downtown in one day though.”) These problems can bepartially explained by the fact that many constraints (e.g., thefact that an itinerary shouldn’t be too packed) are assumedand therefore not explicitly stated by the users. A potentialsolution is to have the requesters evaluate the itineraries asthey are being created and add the missing constraints to theplanning mission. In fact, as a preliminary test, we took twoof the users’ feedback and entered them as todo items (i.e.,“Let’s just stay midtown and remove downtown activities,”“The harbor island suggestions are great but one island would

be enough. Please adjust time durations accordingly so theday is not so packed.”). After resuming the task, we observethat after just a few HITs workers have already addressed theissue by removing offending activities, reordering activities(so that meals occur at reasonably hours), and adding addi-tional activities (replacing a concert downtown with a broad-way show in midtown).

3. Are the itineraries feasible, useful, and executable in real-life settings?

We asked users if they would or did use the itinerary in real-life. All of the users expressed that they would use the itineraryas is, some version of the itinerary, or selective ideas in theitinerary. When asked “If Mobi were made available for gen-eral use, how likely would you want to use such a tool againfor recruiting the crowd to help you plan a trip?”, 7 out of 10of our users answered likely or very likely, 2 answered neu-tral and only 1 answered unlikely, showing the usefulness ofMobi in practice.

Most encouragingly, three users actually followed the itineraryor used the ideas in the itinerary in their real-life trips. Onesubject reported that “having other people involved in theidea-creation process was extremely helpful. It sparked allsorts of ideas that I kept in the back of my head throughoutthe weekend.” Another subject remarked that his “trip wasmostly in the plan, but all restaurant plans changed due tonecessity during the trip.”

We found a dichotomy of users: those who are interested inobtaining a fully-specified itinerary and those who are inter-ested in a loose itinerary that contains an unordered set ofsuggested activities that leave room for exploration. An in-teresting solution is to allow requesters to choose between afully specified or loose itinerary, which in turn translate intoconstraints that specify the maximum number of activities inthe itinerary, the amount of buffer time between activities, andwhether activities need to be ordered.

One of the most frequently mentioned benefits of Mobi isthat both the idea generation and the planning are fully au-tomated, thereby “integrating all the factors one would con-sider in planning an itinerary,” yet making “the time spentcreating the plan minimal.” Most users (7 out of 10) saidthat they are comfortable with an anonymous crowd planningtheir trip. Furthermore, results show that although there weresome involvements, requesters mostly left the planning up tothe crowd. In particular, 4 out of 10 users said that they neveror rarely checked on the progress of the itinerary, 5 did so oc-casionally, and only 2 frequently. Likewise, 7 out of 10 userssaid that they never went back to modify the mission detailsor add notes. As one user puts this succinctly: “the processseemed to work smoothly without my intervention.”

DISCUSSIONHaving demonstrated the effectiveness of Mobi in the itineraryplanning setting, we now revisit the elements of Mobi’s de-sign and discuss how these elements may in general informthe design of systems that facilitate a crowd to tackle prob-lems involving global constraints.

Keeping the crowd, the solution, and the context together

Compared to the design of most other crowdsourcing systemsfor tackling complex tasks, what’s most striking about Mobiis the use of a single structured interface through which thecrowd is exposed to the current solution and the global prob-lem solving context. This allows the crowd to coordinate andcommunicate with one another as they contribute to a solu-tion in ways that individuals who are working off separatecontexts in different subtasks cannot.

Interactions are less controlled, but stil structured

Inherent in Mobi’s design is the ability for the worker to choosehow she wants to contribute to the task. In our user study,we saw that workers generate a diverse set of ideas, and makevarious types of contributions during the problem solving pro-cess. This is particularly important for resolving global con-straints because we don’t know a priori the specific contribu-tions that are needed, but rather they are based on the currentsolution context. While interactions are less controlled thisway, note that the interaction is still highly structured; thereis a well-specified set of actions that the crowd can take, todoitems that guide the crowd towards useful actions, and real-time feedback on the effects of actions (e.g., via map, triptimes, and todo items).

A language for machine and human to communicate

In the background, the machine in Mobi is automatically com-puting routes and times, checking for violated constraints,and generating todo items. Mobi knows, for example, whenall the quantitative constraints are satisfied; this allows Mobito prompt the crowd for future revisions, ask the crowd orrequester to check for potential problems, etc. Mobi can doall this without knowing what the constraints mean, becausethe inputs that it seeks from the crowd include tags on ac-tivities based on which it has enough information to check ifconstraints are violated, and therefore assist in the planningprocess.

A fluid way for refining goals

With most complex problems, requirements change over timeas ideas and partial solutions stream in. In Mobi, a requestercan add or revise requirements, write notes, or even directlyalter the plan throughout the planning process. The crowd canreact to such changes just as they react to the current solutionat any other point in the planning process. The iterative na-ture of the task and the ease with which to access alternativesuggestions allow the crowd to handle such refinements.

These design elements serve not only as ideas, but as con-ditions that when met can facilitate a crowd to tackle openended problems involving global constraints. One particularchallenge that remains is in quantifying and rewarding contri-butions. While in Mobi all actions are “planning actions” andthus can be similarly classified as micro-contributions that arerewarded equally, other tasks may consist of “execution ac-tions” for which the costs and effort required may differ sig-nificant from one action to another. In cases where the systemcan quantify each worker’s contributions, payments can justbe assigned accordingly, but this may be difficult for more

open-ended, creative tasks like writing where the value of ed-its are difficult to judge automatically. Mechanisms for clas-sifying contributions directly or via reputation systems serveas a complementary area of future research.

CONCLUSIONTo date, many human computation systems have relied onthe assumption that problems can be solved in an algorithmicmanner, using an explicit procedure that outlines the oper-ations that need to be done and how they are ordered. Weargue that for solving more complex problems, it is bene-ficial for workers to perform tasks within a less controlledyet still structured environment, where they can build uponeach other’s ideas and choose how they want to contribute,yet still be steered towards the right direction by followingsystem generated advice and alerts. Specifically, using thetask of itinerary planning as a case study, we introduce aprototype system named Mobi, which uses both groupwareideas and explicit processes (e.g., automatic generation oftodo items) to efficiently generate itineraries that satisfy com-plex, inter-dependent constraints. Our results show that notonly are constraints being satisfied more quickly using ourdesign, but the generated itineraries are of high quality (i.e.,that it contains good suggestions and satisfies the qualitativeconstraints) from the perspective of our target end users.

An interesting direction for follow-up work is to investigateways to encapsulate qualitative constraints in todo items, suchthat the system can drive workers towards resolving thoseconstraints more quickly, just as they did for quantitative con-straints. There are also opportunities to study the range ofuser preferences for different types of itineraries, ideal meansfor communicating such preferences and ways to generate themost appropriate solutions.

An unexplored aspect of Mobi is the question of whether thiscollaborative environment would still be effective (or howmight it needs to be re-designed) if (i) the workers are peo-ple with some shared context (e.g., people living in the samecity, or friends and acquaintances), and (ii) the requesters andthe workers are the same people (e.g., a group of people plan-ning a trip for themselves). For these new contexts, there is anew set of design questions related to ownership (e.g., wouldfriends be hesitant to edit each other’s ideas?) and consen-sus (e.g., do we need a voting mechanism for determiningwhether activities should be added to or removed from theitinerary?).

Finally, we envision rich opportunities to integrate differenttypes of automation into Mobi – to detect failures, handle un-certainties, incorporate richer forms of user preferences, andcombine automated and human planners in a synergistic way.

ACKNOWLEDGMENTSHaoqi Zhang and Edith Law are generously funded by a NSFGraduate Research Fellowship and a Microsoft Graduate Re-search Fellowship respectively. This project was initially for-mulated and prototyped at Microsoft Research during sum-mer internships there by Haoqi Zhang and Edith Law, who

thank Anne Loomis Thompson and Paul Koch for technicalassistance on that earlier prototype.

REFERENCES1. Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B.,

Ackerman, M. S., Karger, D. R., Crowell, D., andPanovich, K. Soylent: a word processor with a crowdinside. In UIST (2010), 313–322.

2. Chen, L., and Pu, P. Survey of preference elicitationmethods. Tech. rep., Swiss Federal Institute OfTechnology In Lausanne (EPFL), 2004.

3. Ellis, C. A., Gibbs, S. J., and Rein, G. Groupware: someissues and experiences. 38–58.

4. Hayes-Roth, B., and Hayes-Roth, F. A cognitive modelof planning. Cognitive Science 3 (1979), 275–310.

5. Kamar, E., Horvitiz, E., and Meek, C. Mobileopportunistic commerce: Mechanisms, architecture, andapplication. In AAMAS (2008).

6. Kittur, A. Crowdsourcing, collaboration and creativity.XRDS 17 (December 2010), 22–26.

7. Kittur, A., Smus, B., Kraut, R., and Khamkar, S.Crowdforge: Crowdsourcing complex work. In UIST(2011).

8. Kulkarni, A. P., Can, M., and Hartmann, B. Turkomatic:automatic recursive task and workflow design formechanical turk. In CSCW (2011).

9. Law, E., and von Ahn, L. Human Computation.Synthesis Lectures on Artificial Intelligence andMachine Learning. Morgan and Claypool, June 2011.

10. Law, E., and Zhang, H. Towards large-scalecollaborative planning: Answering high-level searchqueries using human computation. In AAAI (2011).

11. Little, G., Chilton, L. B., Goldman, M., and Miller, R. C.Turkit: human computation algorithms on mechanicalturk. In UIST (2010), 57–66.

12. Nau, D., Ghallab, M., and Traverso, P. AutomatedPlanning: Theory & Practice. Morgan KaufmannPublishers Inc., San Francisco, CA, USA, 2004.

13. Noronha, J., Hysen, E., Zhang, H., and Gajos, K. Z.Platemate: Crowdsourcing nutrition analysis from foodphotographs. In UIST (2011). To appear.

14. Sacerdoti, E. D. A structure for plans and behavior.Elsevier North-Holland, 1977.

15. Stefik, M., Bobrow, D., Foster, G., Lanning, S., andTatar, D. Wysiwis revised: Early experiences withmultiuser interfaces. 147–167.

16. von Ahn, L. Human Computation. PhD thesis, 2005.Carnegie Mellon University.

17. Wikipedia:template messages/cleanup, 2011. Availableat http://en.wikipedia.org/wiki/Wikipedia:Template_messages/Cleanup.Accessed September 22, 2011.

http://en.wikipedia.org/wiki/Wikipedia:Template_messages/Cleanup

http://en.wikipedia.org/wiki/Wikipedia:Template_messages/Cleanup

human computation tasks with global constraints:...

Documents