p24-canny

8/8/2019 p24-canny

http://slidepdf.com/reader/full/p24-canny 1/9

8/8/2019 p24-canny


8/8/2019 p24-canny


ACM QUEUE July/August 2006 31 more queue: www.acmqueue.com

a human and a machine sharing a speech interface. Thisis why speech interfaces are also a rich research area.Much of the shared information is the context we have

already been talking about, and all of the aforementionedprojects are coupled with our work on context-awareness(for more information, see my home page, http://www.cs.berkeley.edu/~jfc).

A WORD (OR TWO) ABOUT PRIVACYPerceptual interfaces imply cameras, microphones, andother sensors capturing the user’s behavior. Context-awareness implies high-level interpretation of that data,often in locations remote (in space and time) from wherethe data was captured. These are all hot buttons for pri-vacy advocates. My group has been working on context-

aware systems for eight years, and privacy has always

been an issue. In fact, privacy in ubiquitous computingenvironments has become a major focus of our group,leading to six papers on the topic. There are a variety of approaches to the problem: better advice and consentinterfaces for users, anonymization, and various formsof obfuscation (e.g., reducing the accuracy of locationinformation). I have co-organized workshops on privacyat the Ubiquitous Computing conference for the past fouryears (UBICOMP 2002-2005), and these have provided agood overview of work in the area (all are available frommy home page).

The approach we have taken, and which we are nowbuilding into a context-aware prototype, is private compu-tation. In a private computation, user data is cryptographi-cally protected during the computation, and only the nalresult is revealed. For example, we are interested in theoverlap between activities of knowledge workers. It’s pos-

sible to infer this overlap by discovering similar keywordsin users’ e-mails to each other. Normally, doing patternmatching on full e-mail text would be extremely invasive,

but the result of the pattern matching is often benign byitself (e.g., if users A and B share a common activity, wetypically need only the most salient words or documentsrelated to that activity). Private computation allows usto determine the end result—say, the set of documentsrelated to the activity—without exposing any informationat all about the data used to do the pattern matching.

Private computation is challenging to use for a varietyof reasons, one of which has been high computationalcost. Our most recent result, however, has reduced thisby many orders of magnitude and allows privacy to beadded to many context algorithms with essentially no

computational overhead (accessible as Berkeley TechnicalReport UCB/EECS-2006-12 from http://www.eecs.berke-ley.edu/Pubs/TechRpts/2006/). This allows us to computehigh-level context information, such as who is involvedin an activity and how much (say, as a participationnumber between 0 and 1) without disclosing when andwhere the users were actually involved. Private compu-tation provides much stronger privacy protection thananonymization—for example, e-mail with sender/receiverremoved (anonymization) is hardly protected at all. Pri-vate computation requires some rather exotic techniques(zero-knowledge proofs), but we have built a Java toolkitthat is available to others who would like to experimentwith it.

CONTEXT -AWARENESS AND PERCEPTIONContext-awareness and perception are really two sides of the same coin. Context-awareness involves interpretingother cues (besides user input) to gure out what a userwants. Many of these cues will require machine percep-tion (is a user talking about food, is there trafc noise, isthe sky overcast?). Conversely, machine perception is adifcult task and it “scales” poorly—as you increase thesize of the speech vocabulary or the number of potentialimages to match for vision, accuracy goes down. Thetask becomes much easier when you add context datato the recognizer. In our research on face recognition,we were able to use available phone context data (time,place, event history) to improve recognition of faces fromcamera-phone images. In fact, face “recognition” usingcontext data alone (i.e., predicting who’s in the imagewithout looking at it) was more accurate than a state-of-the-art face recognizer using computer vision. Puttingcomputer vision and context together, though, doesmuch better than either one alone.

Machineperception isa difcult task andit “scales” poorly—as you increase thesize of the speechvocabulary or thenumber of potentialimages, accuracygoes down.

8/8/2019 p24-canny


32 July/August 2006 ACM QUEUE rants: [email protected]

Our work on voice interfaces is attempting to achievesimilar gains by adding context data to speech recogni-tion. We think the potential gains are even larger there.But there must be closer coupling between recognizer,the context data, and the application or service built ontop of it. That brings us to what is realistically the big-gest challenge to contextual and perceptual interfaces:bridging the barriers between the disciplines working on

these technologies—specically, HCI, speech recognition,and computer vision. It’s a familiar story when there is aparadigm shift in a technology or market. While there aresmall communities working on the boundaries, most of the time recognizers are “black boxes” to interface devel-opers. Conversely, folks working on recognition rarelypay attention to context or the applications that comelater. We’ll make some progress that way, but if we wanta revolution, which the market is ready for, then we needto forget tribal allegiances and work together.

OVERVIEW OF THIS ISSUEThe contributions in this issue cover the state of the art inperceptual and context-aware interfaces. In speech inter-faces, one of the most exciting pieces of the market is thecellphone. Many cellphones now support speech inputfor speed dialing or selecting a name from the phonebook. Large vocabulary interfaces for dictation appearedlast year. Full continuous large-vocabulary recognitionis on the way. The latter especially opens up whole newapplication possibilities for smart phones and may domuch to break the usability barrier for these devices.Most of this technology was developed by VoiceSignal.We open this issue with an interview with Jordan Cohen,who recently moved to SRI, but was formerly the CTO of VoiceSignal. Wendy Kellogg, of IBM’s Thomas J. WatsonResearch Lab, and I discuss with Cohen the growth of cellphone speech interfaces, their potential, and the chal-lenges still remaining.

Our second article looks at computer vision-basedinterfaces. James Crowley, who directs the GRAVIR(Graphics, Vision, and Robotics) laboratory at INRIA(French National Institute for Research in ComputerScience and Control) Rhône-Alpes in France, is a leaderin this area. A major challenge in high-level interpreta-

tion of human actions is context, as we already noted.Crowley and his colleagues have tackled this problemhead-on by developing a rich model of context consider-

ing “situations” and “scenarios.” This article describestheir approach top-down, starting with a representationalmodel and drilling down to their software architecture.

In the third article, we look at context-awarenessin a biology lab. Gaetano Borriello, computer scienceprofessor at the University of Washington, leads usthrough some eld tests of the Labscape system, whichis intended as an efcient but unobtrusive assistant (aRadar O’Reilly) for cell biologists. In this setting, theusers’ high-level activity is well understood (it’s a scienceexperiment). The system has to use available clues fromsensing (like most context-aware systems, there is plenty

of perception in this one) to gure out where the useris and what resources are needed. Borriello’s article isrich with practical advice for making this kind of system succeed.

In our nal article, Jim Christensen and colleaguesfrom IBM’s Thomas J. Watson Research Lab (includingWendy Kellogg) take a different approach to using con-text information. Whereas successful automatic context-aware systems are rare at this time, Christensen et al.argue for human interpretation of context information.They describe two systems that exemplify this approach:Grapevine, a system that mediates human-to-humancommunication to minimize inappropriate interruptions;and Rendezvous, a VoIP conference-calling solution thatuses contextual information from corporate resources toenhance the user experience of audio conferencing. Theyalso discuss some cogent issues related to user privacy incontext-aware systems. Q

LOVE IT, HATE IT? LET US [email protected] or www.acmqueue.com/forums

JOHN CANNY is the Paul and Stacy Jacobs Distinguished

Professor of Engineering at the University of California, Berke-ley. His research is in human-computer interaction, with anemphasis on behavior modeling and privacy. He received hisPh.D. in 1987 at the MIT AI Lab. His dissertation on RobotMotion Planning received the ACM dissertation award.He received a Packard Foundation Faculty Fellowship anda Presidential Young Investigator Award. His peer-reviewedpublications span robotics, computational geometry, physicalsimulation, computational algebra, theory and algorithms,information retrieval, HCI and CSCW and cryptography. Hehas best-paper prizes in several of these areas..© 2006 ACM 1542-7730/06/0700 $5.00

The Future of

Human-ComputerInteraction

The Futureof HCI F

O C U S

p24-canny

Documents