24447spr07 csnewsletter rev.qxd:24447spr07 csnewsletter rev · breathing dragon is enormously...

Columbia University in the City of New YorkDepartment of Computer Science1214 Amsterdam AvenueMailcode: 0401New York, NY 10027-7003

ADDRESS SERVICE REQUESTED

CS@CU NEWSLETTER OF THE DEPARTMENT OF COMPUTER SCIENCE AT COLUMBIA UNIVERSITY WWW.CS.COLUMBIA.EDU

NON-PROFIT ORG.

U.S. POSTAGE

PAID

NEW YORK, NY

PERMIT 3593

CUCS Resources

You can subscribe to the CUCS news mailing list at http://lists.cs.columbia.edu/mailman/listinfo/cucs-news/

CUCS colloquium information is available at http://www.cs.columbia.edu/lectures

Visit the CUCS alumni portal at http://alum.cs.columbia.edu

Born in New York in 1973, Ms. Novik studied EnglishLiterature at Brown Universityand Computer Science atColumbia University. After herstudies at Columbia, she movedto Canada to help design anddevelop the computer game“Neverwinter Nights: Shadowsof Undrentide”. While workingon the game, she found shepreferred writing and returnedto New York to write her firstnovel, “His Majesty’s Dragon”.This novel proved to be the firstinstallment in her Temeraireseries, a reimagining of theepic events of the NapoleonicWars with an air force of dragons, manned by crews ofaviators. The series has beenwildly successful and wasrecently optioned by directorPeter Jackson of “Lord of theRings” fame.

CUCS Newsletter rovingreporter Sean White recentlyhad a chance to speak withMs. Novik about her path towriting, her stories, and thesimilarities between designingsoftware and novels.

Sean: You started in Englishand then moved to computerscience and then into games.Can you tell me a bit aboutwhat led you to start writing?

Naomi: I started writing just forfun, just writing fan fiction incollege – in undergrad, I mean –back when I was still an Englishmajor, pretty much when I firstdiscovered the Internet. Andthen while I was still workingon my English degree at Brown,at that point I started actuallygetting interested in computers

(continued on next page)

CS@CU

CS@CU SPRING 2007 1

Imagine the year is 1806. You watch from ahilltop as Napoleon’s army surges againstPrussian forces that expect an easy defeat ofthe French Emperor. Suddenly, rising fromthe forest comes a flight of dragons, AerialCorps in support of the French troops. ThePrussian dragons and their crews of aviatorsare taken by surprise by the large numbersand unexpected tactics. The tide turns and the Prussians are forced to flee under heavyattack as the adventure continues.This is the world imagined and realized by

Naomi Novik in her Temeraire series of novels.

NEWSLETTER OF THE DEPARTMENT OF COMPUTER SCIENCE AT COLUMBIA UNIVERSITY VOL.3 NO.2 SPRING 2007

FromComputerScience to Fantasy: An Interview with

Naomi Novik

2 CS@CU SPRING 2007

Cover Story (continued)

and working in the studentsupport IT department thatsupported the student labs andreally just enjoyed that work.

When I actually got a job aftergraduating I went to work for a company called D.E. Shaw & Co. I wanted to do a lot oftechnical stuff, so they let medo some work with the sys-tems department despite thefact that I didn’t have formalcomputer science training. Andthen after a while I decidedthat I wanted to go back andlearn programming, and that I actually eventually wanted to work on computer games.

Computer games are one ofthose things that are a very nicecombination of creative work,programming work, design, andcreativity together, so I reallyliked the idea. Columbia has agreat program called the SecondMajor program which basicallylets you complete all the under-graduate requirements for adegree for a major that youdidn’t actually study in withouthaving to take all the otherelectives and so forth. Goingthrough the Second Major pro-gram allowed me then to apply

for the master’s program incomputer science at Columbia.Then during the master’s program I was funded as anMSTA, so I actually starteddoing some teaching. I taughtone class about computergames and once I taught oneon Windows programming, sothat was a lot of fun.

And, basically, I then continuedon and decided to go to thedoctoral program after I finishedthe master’s, mostly because Ireally enjoyed taking the classesand didn’t quite want to stop.And it was a terrific experience.I’ve never particularly lovedresearch so much; I’ve alwayspreferred to actually build thingsin programming.

I got an opportunity to leave andwork on the game NeverwinterNights: Shadows of Undrentideafter I had just finished mycourse work and was sort offlailing around, trying to figureout what I wanted to write mydissertation on. And thenNeverwinter Nights came alongand I said, “Okay. Never mindthat dissertation,” and took offto work on that. I really enjoyedworking on Neverwinter Nights,

but in fact the parts that Imost enjoyed were the creativeparts, the design, being able tocraft the storylines and writedialogue for the characters. I’dbeen writing as a hobby foryears and years by that point,but I pretty much wrote at thevignette level, not a larger story.And partly that’s because onceyou get into writing a largerstory it actually, I think, parallelsvery closely the process ofwhat happens between, say,writing a very short script todo a small specific task, andhaving to create the architecturefor a large software project.

Sean: Can you tell me moreabout that? That’s interesting.

Naomi: Having worked in bothfields, I absolutely believe thatthere’s a parallel; once you’rewriting at the novel length youneed a lot more structure, you need architecture that willsupport the writing. It’s notenough to just be able to writedecent prose, you have to alsobe able to put the events ofthe narrative together in a waythat’s going to be satisfying tothe reader, and that will keepthe pacing moving, will keep

the events exciting to the readeras they go.

Working on Shadows ofUndrentide, we were creating a30 hour game and this plot hadto engage the reader for thatamount of time. So that washow we were approaching it;we wanted to create a generalarchitecture for the story tohang off of. Doing that was verymuch a learning process for mein how you build the structure of the story of a narrative, andthat’s really what enabled me to make the jump in my ownwriting from writing at the shortvignette level to writing a longer,more structured narrative.

I pretty much started thinkinghow I could still enjoy the cre-ativity, the creative aspects,and a friend actually said to me“Why don’t you sit down andtry and write a novel?” And Isaid, “You know what? That’snot a bad idea,” and not toolong after that started writingTemeraire. And Temeraire reallyjust clicked for me right awayand was just a joy to work on.

Sean: You describe the processas much like structuring softwarearchitecture. When you say that, I

“once you get intowriting a largerstory it actually parallels very closely the processof what happensbetween writing a very short scriptto do a small specific task, andhaving to create the architecture fora large softwareproject”

CS@CU SPRING 2007 3

immediately think of black boxesto handle specific functions anda sort of hierarchical structure, orperhaps some object structure.Is that what you mean?

Naomi: The nice thing aboutwriting a novel is you don’thave to be as precise. But I do believe in some degree of compartmentalizing. Forinstance, my books, the way I write them is I don’t see the book entirely as a singlethrough line. I do view eachchapter as doing something,each scene is doing some-thing that then proceeds on tothe next piece in very muchthe same way as classes orobjects or functions within aprogram, expect of course its linear structure in general.But even though this is a series,the continuing adventures of Laurence and Temeraire, I try to keep each novel fairlyself-contained as to plot, forinstance, because I feel that’smore satisfying to the readerthan ending with a cliff hangerwhere the main plot of thestory is unresolved and you’reforced to wait for the nextbook to get resolution on the main plot thread. And, of course, there’s the overallarching narrative of theNapoleonic Wars overlaidupon the entire series, whichis the engine that kind ofdrives the whole process.

Sean: I was curious about that.Why the Napoleonic Wars?

Naomi: I love the Napoleonic era,the Regency era in England,it’s just a lot of fun and has anice combination of swash-buckling but at the same timea certain degree of modernity.The society is more etiquettedriven, so it’s later than piratesbut before the Victorians. Andthere’s a combination of thedanger of the war, the anxietiesproduced by the war, and the Jane Austen comedy ofmanners that you have, thesesmall domestic dramas that goon within society. So it’s just a wonderful period to play in.

From a science fiction point ofview it’s a great era in which

to put dragons because it’s anera in which dragons are stillenormously powerful. Onceyou get into an era like WorldWar I or World War II whereyou have fighter planes avail-able, then you have to makedragons a lot more powerful incertain ways, or you have do alot more exposition and set uphow dragons will work with thetechnology in order to still havedragons be powerful. Whereasin the Napoleonic era you canclearly see these people aresailing around in wooden sailingships. It’s very intuitive toimmediately see why a firebreathing dragon is enormouslyterrifying and enormously pow-erful in this era, even while thefire breathing dragon can onlyfly 30 miles an hour and is stilla believable physical creature,in a way; and also an intuitivecreature to people who areused to dragons, who hadsome impressions of dragonsfrom mythology and lore.

Sean: I take it you’re also a bigfan of historical fiction?

Naomi: I am. I love historicalfiction. Patrick O’Brian’s Aubrey/Maturin series is one of myabsolute favorites and absolutelypart of my inspiration, but JaneAusten is probably my singlefavorite author ever. The thingis, I find historical fiction andspeculative fiction have a lot of the same pleasure, which is world building. Good specu-lative fiction for me creates a world that is unfamiliar andmakes it believable to thereader and draws the reader inand creates characters to fitinto that world and illuminatethat world.

And historical fiction does the same thing because eventhough it’s something that did exist at one time, it’s stillunfamiliar to us. It’s still kind of an alien world to us. Mostof us would feel as disorientedif we were dropped down inthe middle of Jane Austen’sregency England as we wouldif we were dropped onto Mars100 years from now, and insome ways we would probably

find it harder to adjust to goingbackwards and losing a lot of the creature comforts thatwe’re used to, rather thanadjusting to moving forward, I suspect.

Sean: Does that mean that youbuild a world first and then youthink of your plot, or is it some-thing that happens in parallel?

Naomi: What happens in parallelfor me is world and characters.In the case of the Temerairebooks it was very much theworld, the historical world, thatcame first; I first thought of theidea of these characters andthe Napoleonic wars, and thenimagined “what if I had a drag-on?” That’s sort of the seed ofit. And then basically from therewhat develops is what kind of acharacter would it be interestingto tell a story about in a worldwhere the Napoleonic wars aregoing on, it’s a fairly familiarregency England setting, butthere are dragons.

That’s how the character ofLaurence developed. AndLaurence starts out as a seacaptain, a very successful manof his day, a gentleman and anofficer who’s doing very welland is very happy with his life.And then what happens is, ofcourse, the advent of a dragoninto his life throws his wholeworld out of kilter, which ismuch more interesting andexciting, I think, to read abouta character who’s a fish out ofwater. And that allowed me totake this character and introducethe elements that make theTemeraire universe different byhaving him be somebody whodid not grow up around dragons,who’s not used to dragons, he’slearning about dragons even as the reader is. So those twoaspects, the universe and thecharacters of Laurence andTemeraire were developed first.And then my plot invariablygrows out of my characters.

Lois McMaster Bujold, who’s aterrific writer and someone Iadmire a great deal, has said ininterviews that what she likes todo is think of the worst possiblething that can happen to her

characters and that’s her plot,makes it happen and then forcesher characters to deal with it.

And I think that’s generally agood summary of the way that Iapproach plot. Because I am ina historical setting I have someconstraint, and I like havingsome constraint in terms ofwhat actually happened in theNapoleonic wars. I am changinghistory to some extent as I go,but I tend to only let myselfget away with changing it for areason, for a particular devel-opment reason. Then what I’ve done is tried to constrainmyself a little bit because I findthat it actually makes the workricher when I do that, when Idon’t let myself get away witheasy solutions.

So within the constraint of thehistorical record, my plot thenbasically comes out of whatwould be interesting to havehappen to my characters, whatwould be difficult for them, whatwould be challenging for them,what would make them hurt orunhappy, and what lets you hitthose emotional high notes thatI think the reader really enjoys;at the same time making themsatisfying and believable.

Sean: Let me ask you one last question. What drives you to write?

Naomi: Wow. You know what? I don’t actually think I can giveyou an answer to that question.I grew up reading voraciouslyand telling stories for me ishow I respond to other texts,to ideas. I enjoy writing fictionmuch more than I enjoy writingessays or papers. There’s some-thing much more fun for me incommunicating ideas throughfiction and through inventingcharacters than in trying towrite essays, for instance. It’sjust more satisfying to me.

But I don’t think there really isan answer than any writer cangive you other than – well, youcan hear me sort of stutteringand saying, “What do youmean? How could I not write?”

4 CS@CU SPRING 2007

Message from the ChairHenning

Schulzrinne,Professor & Chair

Thank you for reading ourspring 2007 edition of theColumbia Computer Sciencenewsletter!

Before introducing some ofthe features in this edition, I’dlike to speak to a topic thathas generated a fair amount ofdiscussion in our department,as I’m sure it has in others.

As appears to be the caseelsewhere, CS undergraduateenrollments at ColumbiaUniversity are stabilizing, albeitat a much lower level than duringthe boom years and probablysettling at a lower level thanthe previous “bottom” in themid-1980s. The drop-off inenrollments seems to be nearlyuniversal across North America,Western Europe and Australia,even where outsourcing is farless a media buzzword. Oneparticularly bad early warningindicator is that the fraction of incoming freshmen that listCS as their probable major hasdropped from a high of nearly5% in 1983 and a second peakof 3.7% in 2000 and 2001 tojust above 1%, a value lastseen in 1977. Reflecting thisdecreasing student interest,the number of newly declaredCS majors has decreased fromits Fall 2000 high of 16,000 to8,000 in Fall 2005.

The enrollment decrease seemsto be fairly uniform across four-year institutions, although pro-grams emphasizing computingskills, including those for non-traditional students, appear to

have fared a bit better. From all Ican tell, Columbia CS majors, atall degree levels, seem to havelittle problem finding decentjobs, even in fields, such assoftware development, thatsupposedly have migrated toBangalore years ago. Speakingfrom local experience, we havesignificant difficulties findingcapable system administrators,junior or senior, for our depart-mental computer operations.

At Columbia, we have also celebrated our 25th anniversarynot too long ago; this offers a good opportunity to reflect on how we intend to reach,educate and inspire the nextgeneration of computer scien-tists and computer engineers.

Organizations such as the CRAhave been thinking about howto use media campaigns toreach teenagers, to inspire andconvince them to choose STEMcareers, in particular computerscience. This can be very valu-able, but I believe we need to gobeyond improving the image ofcomputer science in the eyesof parents, guidance counselorsand teenagers. If we want to do more than focus on ahandful of inspiring anecdotes,it would be helpful to knowmore about our graduates. Thisknowledge is also necessaryfor discussions about what wewant to teach our students andwhat we can afford to omit,given that we have more thanfilled the available credit hourswith material covering multiple

CS@CU SPRING 2007 5

sub-disciplines that are now as large as computer sciencewas as a whole 20 years ago.

We seem to know surprisinglylittle about our students andabout the paths they take oncethey leave us. For incomingstudents, what led them tochoose computer science? Didthey have specific professionalgoals in mind? Is CS a stepalong the way, as a pre-med,pre-business or pre-law degree?Were they guided by a computerscientist? Did they have thegood fortune of having an excel-lent high-school CS teacher or participating in a RoboCup?Or did they believe that all CS majors program computergames? Did the lack of out-of-the-box programmability of mod-ern PCs and the sophistication ofcommercial software discouragetinkering and exploration?

Given that the number offreshmen interested in CS hasdropped even further than thenumber of majors, it would beuseful to know why. To reflectsome of the common stereo-types: Are they perceiving thefield to be less exciting thanten years ago? Are high-school seniors scared off by tales ofworking conditions only suitablefor single men without a sociallife? Do they see the choice ofCS as risky, where a job may beavailable upon graduation, butmight disappear a few yearslater, before the college loanhas been paid off? Do highly-motivated students see a higher

financial upside in professionalcareers in law, business ormedicine or more societal ben-efit and relevance in biomedicalengineering?

I suspect that one factor thatmay be discouraging high-schoolseniors might actually be oneof the strengths of a computerscience and computer engineer-ing education – the extremelywide range of industries andjob descriptions that a graduatemight find themselves in. Westill tend to think of CS majorspursuing a career in shrink-wrapped software, and CEmajors developing the next generation of micro-processors,although those career pathsprobably have become a rareexception. I suspect studentshave a much better notion ofwhat, say, a graduate in financialengineering (a very popular newprogram at Columbia) will bedoing, even if their anticipation ofglamour and financial opportunitymay be misplaced.

While we might have someknowledge of the companiesthat fresh BS and BA graduatesstart at, we seem to knowalmost nothing about theircareer paths five and ten yearslater. What fraction of our under-graduates eventually pursue a post-graduate degree, eitherin CS or another discipline?How long do graduates remaininvolved in deeply technicalwork, as opposed to, say, proj-ect management? How doesthe technical career duration

compare to older science andengineering disciplines? Is it truethat somebody laid off at age 40 will have to sell real estate?How happy are CS and CEmajors at the mid-point of theircareers compared to other engi-neering graduates? What are themost exciting and intellectuallyrewarding aspects of theirjobs? What contributions dothey seem themselves makingto their employers and largersociety? What technical subjectsand knowledge have turned outto be the most useful, both asa ticket to the first job and as asolid foundation for a technicalcareer? Would our students rec-ommend computer science as acareer to their sons, daughters,nieces and nephews?

As a relatively young discipline,this lack of knowledge wasunderstandable twenty yearsago, but now our first graduatesare reaching AARP eligibility, andColumbia is hardly the oldestCS department.

The CRA Taulbee survey is agood example of how CS hasturned from anecdotes to datain evaluating and improvingPhD programs. I believe weneed to do the same for ourundergraduate program, butfocusing on outcomes, careersand motivations. Only byknowing what motivates ourstudents, and those who have gone elsewhere, can wemake effective changes in ourprograms and try to counteractstereotypes and myths.

Columbia Computer Science isin the process of exploring newmajors and concentrations,such as those emphasizinginformation science and digitalmedia, that better reflect thediversity of career paths andthe breadth of the field. I wouldvery much appreciate hearingfrom others that have exploredthese issues and may evenhave answers to some of thequestions above.

The current edition of thenewsletter reflects some of thediversity of computer science at Columbia. Beyond the use ofCS as a pre-professional path-way, the interview with NaomiNovik illustrates how a degreein CS can indirectly lead to acareer as a successful novelist.Prof. Steve Unger’s article on e-voting emphasizes that knowingthe limitations of our own toolsis an important function of com-puter scientists (as opposed tothe sales and marketing side ofthe industry...). Columbia hashad a strong networking anddistributed systems presenceacross both the ComputerScience and ElectricalEngineering departments, withthe DNA lab featured in thisissue as one important compo-nent of that research thrust.

As always, I look forward tohearing from you as our readers.

Our work involves analyzing anddesigning distributed systems,be it classical networking sys-tems like wireless/sensor net-works, peer-to-peer networks,or job processing systems likeserver farms. One of the mainthrusts of our research has beenon resilience. Our group’smain focus is the use of a wide variety of analytical techniques(signal processing, graph theory,algorithms, control theory,queueing theory, optimization,and stochastic modeling) thatenable us to explore networksscaled to ultra-large sizesunder irregular conditions thatare difficult to induce withinsimulation and experimentation.We also perform simulation andexperimentation to provide addi-tional support to our theoreticalfindings. A sampling of someof the projects being run in ourlab is described next, to give aflavor of our style of research.

A Tunable Scheduling Policy The performance of a systemthat simultaneously processesseveral tasks depends in largepart on the scheduling policyused to partition processingresources among currentlyactive jobs. There is an abun-dance of literature that proposesand evaluates so-called blind

scheduling policies, which useonly the arrival time and serviceobtained to-date of an active jobto determine its processing pri-ority with respect to other activejobs. Example polices includeFirst Come First Serve (FIFO),Processor Sharing (PS), and LastAttained Service (LAS), and eachof those policies are “optimal”for different classes of jobs.Since each policy has a uniqueimplementation, system design-ers must decide in advance

which policy best suits theirneeds. As an alternative, wehave designed a configurable

blind scheduler that contains acontinuous, tunable parameter.By appropriately setting thisparameter, the scheduling policyemployed by the scheduler can exactly emulate or closelyapproximate the three abovepolicies, as well as implementpolicies whose behavior is ahybrid of the above policies. Wehave implemented our schedulerinto the Linux operating sys-tem, and demonstrated that (i) we can emulate the behaviorof the existing, more complexscheduler with a single (hybrid)setting of the parameter, and(ii) the ability to easily vary the behavior of the scheduler is beneficial in environmentswhere the job size distributionmay change over time.

Resilience of Distributed AlgorithmsIn any setting that involves anetwork, many important com-putations are carried out in adistributed fashion. Examplesare routing-algorithms, parame-ter-estimation algorithms anddistributed computations.These distributed algorithmsassume cooperating, error-freebehavior among the nodes; i.e.each node involved in the com-putation carries out its tasksfaithfully. This assumption is valid when the nodes arehomogeneous, are present in abenign environment, and areunder central control. Howeverif the environment is hostileallowing nodes to be compro-mised, or if nodes are hetero-geneous allowing for differentinterpretation and implementa-tion of the algorithm, or if thereis no central control that canverify and ensure the validity

6 CS@CU SPRING 2007

Research Focus

The Distributed Network Analysis (DNA) Groupis a group run jointly between the CS and EEdepartments at Columbia.

The group includes Professors Vishal Misra (CS and EE)and Dan Rubenstein (EE and CS) and nine PhD students,Eli Brosh (CS), Hoon Chang (CS), Hanhua Feng (CS),Kyung-Wook Hwang (EE), Abhinav Kamra (CS), Patrick Lee(CS), Richard Tianbai Ma (EE), Raj Kumar Rajendran (EE)and Joshua Reich (CS). The strong CS-EE collaborationshould come as no surprise: Dan has a PhD in CS and is a faculty member in EE, while Vishal has a PhD in EEand is a faculty member in CS!

The Distributed Network Analysis (DNA) Group

CS@CU SPRING 2007 7

of each node’s execution ofthe distributed algorithm, thecomputation may fail. Whethera distributed computation willfail, the conditions under whichit will fail, and the extent towhich it will fail are importantquestions, answers to whichwill help us determine therobustness of the distributedcomputation. Until nowresearch and analysis of distrib-uted algorithms has focusedon the convergence propertiesof distributed algorithms, whileassuming that nodes in thecomputation function as theyshould. It is important and use-ful to analyze the properties of distributed algorithms underadverse conditions, wherenodes may either malfunctionor be malicious and do notfaithfully follow the protocol ofcarrying out the computation.

In our work we are studyingsuch resilience analysis. Wefocus on a particular aspect ofresilience we call strong detec-tion that grapples with the issueof whether and when nodescan sense that there is an errorin the distributed computation.Toward such detection, wedemonstrate results which showthat, for a distributed protocol,certain classes of disruptivebehaviors are detectable whileother classes of disruptivebehavior are not.

Growth Codes In various communication net-works such as sensor networks,distributed sensing sourceseach generate data and use oneanother as forwarding pointsto transmit information to sinknodes. This transmission has

to take place in the presenceof possible failures of nodes,which disrupts routing and leadsto a loss of information. Wehave developed novel codingtechniques, which we callGrowth Codes, to expedite and increase the informationthat can reach the sink in thepresence of such failures.

Growth Codes is a linear codingtechnique that encodes infor-mation via a completely distrib-uted process. The techniquenot only ensures that the datasink is able to recover all thedistributed data, but also thatdata is efficiently recoveredwhen only a small number ofcodewords is received.

Growth Codes are novel in thatthe codewords used to encodea fixed set of data change overtime. Codewords in the network

start with degree one and growover time as they travel throughthe network en route to thesink. This results in the the sinkreceiving codewords whosedegree grows with time. Usingsuch a design, if the networkis damaged at any time and thesink is not able to receive anyfurther information, it can stillrecover a substantial numberof original symbols from thereceived codewords.

The application of GrowthCodes extends beyond SensorNetworks, and in fact they arethe first distributed channelcodes developed anywhere.We are currently developing an application of GrowthCodes to a bittorrent like filedistribution application.

Members of the Distributed Network Analysis (DNA) group. Top row, from left to right: Abhinav Kamra, Hoon Chang, Dan Rubenstein, Josh Reich, Patrick Lee, Vishal Misra. Bottom row, from left to right: Hanhua Feng, Richard Ma, Kyung-Wook Hwang, Raj Kumar Rajendran. Not pictured: Eli Brosh.

8 CS@CU SPRING 2007

Featured Lectures

2006-7DistinguishedLecture Series The Computer Science Department enjoyedanother successful Distinguished LectureSeries during the 2006-7 academic year.Organized by Professors Steve Feiner andRavi Ramamoorthi, the yearly series bringscritically acclaimed scientists and pioneers in academia and industry from across thecountry to Columbia’s campus.

These distinguished speakers give talks about theirresearch, field questions from the audience, and meetwith faculty and students. The lectures are open to thepublic and announced and documented at our website(http://www.cs.columbia.edu/lectures), and ensure thatmembers of the department and the community at largesee first-hand the latest and greatest computer scienceresearch emerging outside the Columbia campus. Thisyear, six lecturers visited us and discussed a broad andexciting range of topics, which we summarize here.

These speakers helped make this year an exciting andinspirational one for computer science at Columbia. We are looking forward to next year, when we bringmore of the greatest researchers to our department and hear about their work.

Biology as Computation

Leslie Valiant fromHarvard visited onSeptember 25th.

Professor Valiant is the T.Jefferson Coolidge Professor ofComputer Science and AppliedMathematics and the winner of the Knuth Award in 1997. Hediscussed how computationalmodels play an essential role in uncovering the principlesbehind a variety of biologicalphenomena. His talk consid-ered recent results relating to three questions: How canbrains, given their knownresource constraints such asthe sparsity of connections andslow elements, do any signifi-cant information processing atall? How can evolution, in onlya few billion years, evolve suchcomplex mechanisms as it has?How can cognitive systemsmanipulate large amounts ofsuch uncertain knowledge andget usefully reliable results?He showed that each of theseproblems can be formulatedas a quantitative question for a computational model, andargued that solutions to theseformulations provide someunderstanding of these biolog-ical phenomena.

Computer Science and

the Age of Services

Our first lecture was on September 11th by Stuart Feldman

from IBM.

Dr. Feldman is Vice President,Computer Science, at IBMResearch, President of theACM, and the author of theoriginal UNIX “make” utility.He addressed the economicimportance of services, whichhas been rising significantly, in large part because of theexciting progress in recentyears in our ability to model,implement, and use servicescomputationally. Dr. Feldmancovered the technologicalvision and practical challengesunderlying services-orientedapplications and infrastructures.He surveyed the research issuesthat need to be addressed,ranging from basic questionsof algorithms, models, anddistributed systems to verypractical issues of deploymentand industrial practice. Servicesmodels provide a higher levelof abstraction, one requiringnot just implementation andcontrol but also purpose andmanagement.

CS@CU SPRING 2007 9

Human Computing: The

Rewards of Computer

Science in Everyday Life

Elizabeth Mynatt ofGeorgia Tech visited us on March 7th.

Professor Mynatt is associateprofessor in the College ofComputing and director of the Graphics, Visualization and Usability Center (GVU)at Georgia Tech. Her talkaddressed how visionsdescribing the pervasive useof computing technologies bythe general population oftenfail to explain how these tech-nologies made the transitionfrom the research lab to thehome. Prof. Mynatt provided a fascinating glimpse into the complex cycle of humanassimilation, learning, andadaptation that is often requiredbefore a novel technologybecomes an integral part of oureveryday lives. She presentedthree case studies in whichthese processes are visibletoday: consumer medical tech-nologies for chronic healthcare,gaming technologies and theirpotential path into the work-place, and home networking.

Reflections on the VLSI

Design Revolution

Lynn Conway, one of thepioneers of VSLI design,was here on March 27 to give a talk that wasalso the Department ofElectrical EngineeringArmstrong MemorialLecture.

Professor Conway is ProfessorEmerita at University ofMichigan, the inventor ofdynamic instruction scheduling,and a recipient of the PenderAward of the Moore Schooland the Wetherill Medal of theFranklin Institute. She provideda fascinating retrospective onthe VLSI chip design revolutionthat swept through SiliconValley in the early 80s. Sheexplained the key factors thatled to that revolution, includingthe complexities of existing chipdesign practices, the opportu-nities for simplification, theopportunities for performanceimprovement via scaling of chipdimensions, the opportunityfor applying then-new personalcomputers as tools for chipdesign, and the opportunitiesfor exploiting the then-newInternet for design collabora-tions and quick turnaround chipprototyping. Professor Conwaythen talked about where weare now and speculated aboutwhat the future may bring.

Further details of the lecture series are available online at:http://www.cs.columbia.edu/lectures

Conceiving an Internet

for Tomorrow:

Design for Tussle

David Clark from MIT gave a lecture on October 18th.

Dr. Clark is Senior ResearchScientist at the MIT ComputerScience and ArtificialIntelligence Laboratory, pastchair of the Computer Scienceand Telecommunications Boardof the National Academies, andthe chief protocol architect in the development of theInternet from 1981-1989. Histalk emphasized that the futureof the Internet is not definedby new technology. Rather, hepointed out that the Internet oftoday is firmly embedded in alarger space of social, politicaland economic considerations,and it is the understanding andmanipulation of these consider-ations that will allow progresstoward a better network fortomorrow. For example, ourcurrent security problems arenot just a technical failure, buta consequence of decisionswe made about the balance of identity and anonymity forInternet users. He offeredsome specific requirementsthat an Internet ten years fromnow should meet, and somedesign principles for tomorrow:how to design a mechanismthat tries to shape the largercontext in which the Internetmust function.

Virtual Cinematography:

Postproduction Control

of Viewpoint

and Illumination

Paul Debevec, researchassociate professor at theUniversity of SouthernCalifornia, executive producer of graphicsresearch at the USCCenters for CreativeTechnologies, and thefirst recipient of ACMSIGGRAPH’s SignificantNew Researcher Award,visited on November 20th.

Professor Debevec’s talkexplained some of the technol-ogy behind modern filmmaking,in which actors filmed in thestudio are placed into distantor imagined locations, andcomputer-generated creaturesand characters are added inalongside the main cast. Thecentral challenge in combiningthe real and the rendered ismaking it look like everythingwas shot at the same time withthe same camera. ProfessorDebevec explained his work ondeveloping novel ways to filmactors’ performances so thattheir viewpoint and their illumi-nation can be crafted virtuallyas part of the postproductionprocess. He illustrated his talkwith images from films such asSpider-Man 2, in which digitalactors were created withdevices his group developed.

10 CS@CU SPRING 2007

Other News

Professor Peter Belhumeur

offered COMS 6998-6,Computational Photography,in the Spring 2007 term. Inrecent years, the fields ofcomputer graphics, computervision and photography haveconverged to give rise to anew and very active area of research–ComputationalPhotography. The goal in thiswork is to redefine the cameraby using computational tech-niques to produce a new levelof images and visual represen-tations. The course is a seminar,offered to all students withknowledge in any of the threecore areas: computer vision,computer graphics, or photog-raphy. Topics covered includeHDR Imaging; FeatureMatching; Image Mosaics,Image Stitching, andDynamosaics; Image-BasedRendering, EnvironmentMatting and Compositing;Image Refocusing; MotionMagnification; RemovingCamera Shake; Camera Lens Arrays; Bluescreening;Programmable Lighting;Computational FlashPhotography; Light Fields;Photo Pop-Up; SchematicStoryboarding; Face Detection;Face Modeling; TextureSynthesis; Video Textures; View Synthesis; MotionEstimation and Warping; Single and Multi-ViewGeometry; and Photo Tourism.

Professor Eitan Grinspun

taught COMS 4995-1, Discrete

Differential Geometry: An

Applied Introduction in the Spring 2007 semester.Differential geometry is thestudy of curves and surfaces. Itgives us the language and struc-ture to describe the behavior ofphysical systems (with applica-tions to physical simulation) andto formalize the functionality ofgeometry-processing tools (withapplications to remeshing, para-meterization, smoothing, andcompression). Researchers in avariety of areas have discoveredthat theories, which are discretefrom the start, and have keygeometric properties built intotheir discrete description, canoften more readily yield robustnumerical simulations and mod-eling tools which are true to theunderlying continuous systems.By learning about such discretetheories we can create morerealistic and robust physical simulations, and more intuitiveand robust geometric modelingtools. This course introduces thestudent to the ideas of discretedifferential geometry, geometricmodeling, and physical simula-tion. The student learns aboutcontinuous and discrete differ-ential geometric operators, andapplies this knowledge to buildtheir own mesh smoother (usingdiscrete mean and Gaussiancurvature), cloth- and thin-shellsimulator (using a discreteShape Operator), remesher(using a discrete Laplacian), and fluid flow visualizer (usingDiscrete Exterior Calculus andHodge Decomposition).

New Computer Science Courses

Life inside a computational camera.

Discrete Differential Geometry is an emergingfield with exciting applications in geometryprocessing and physical simulation.

CS@CU SPRING 2007 11

Computer Science 6998-4,Learning and Empirical

Inference, was offered in theSpring 2007 term. The coursewas taught by Irina Rish andGerry Tesauro with lectures by Tony Jebara and Vladimir

Vapnik. The course coversadvanced topics for studentsinterested in research inmachine learning. These includeEmpirical Inference Science;Transduction; Universum; SVM+;Semidefinite Programming;Improved VC Bounds; EllipsoidMachines; Fisher Discriminant;Metric Learning; NearestComponents Analysis; XingClustering with Side Information;Supervised DimensionalityReduction; Metric Learning & Regression; Probabilisticconnections to distances andBregman divergences; Bayespoint machines; E-Family PCA;Sparse optimization; SparsePCA; Feature Selection; MatrixFactorization; l1-norm SVMs;Structured prediction; MaxMargin Markov Nets; ConditionalRandom Fields; ReinforcementLearning; Active learning;Model Selection, Occam’sRazor; BIC; AIC; InformationBottleneck Online Learning;Boosting; Bounds; Sampling;Invariance; and Groups.

Professor Itsik Pe’er taughtCOMS 4995-2, Seminar in

Computational Genomics, inthe Spring 2007 semester. Thiscourse is intended to introducestudents of both computationaland bio-medical skill sets tocurrent quantitative understand-ing of mammalian genomicsand prepare them to computa-tional research in the field. Thecourse is interdisciplinary innature, aiming at a broad scopeof topics, including sequencingof the human genome, verte-brate genomes, highlightingparts of the genome, sequenceconservation, sequence varia-tion, structural mutations, primate evolution. The compu-tational toolbox discussedincludes parameter inference,likelihood analysis, hiddenMarkov and other graphicalmodels, approximate stringmatching and other algorithms.

Professor

Dragomir

Radev (CUCSPhD ‘99, visit-ing from theUniversity of Michigan)offeredCOMS 6998,Search Engine

Technology, in the Spring 2007term. A significant portion of theinformation that surrounds us isin textual format. A number oftechniques for accessing suchinformation exist, ranging fromdatabases to natural languageprocessing. Some of the mostprestigious companies thesedays spend large amounts ofmoney to build intelligent searchengines that allow casual usersto find what they want anytime,from anywhere, and in any language. This course coversthe theory and practice behindthe implementation of searchengines, focusing on a widerange of topics including meth-ods for text storage and retrieval,the structure of the Web as agraph, evaluation of systems,and user interfaces. A significantfeature of the course is anopen-ended research project inone of two categories:

• Research paper–using theSIGIR format. Students are incharge of problem formulation,literature survey, hypothesisformulation, experimentaldesign, implementation, andpossibly submission to a con-ference like SIGIR or WWW.

• Software system – develop a working, useful system(including an API). Studentsare responsible for identifyinga niche problem, implement-ing it and deploying it, eitheron the Web or as an open-source downloadable tool.

Professor Kenneth Ross

taught COMS 6998-2, High-

performance Software for

Modern Processors, in theSpring 2007 semester. Modernprocessors have substantiallydifferent performance charac-teristics from those availableeven a decade ago. While traditional performance issuessuch as parallelism remainimportant, performance nowdepends much more on subtlearchitectural features such ascache behavior and branchmisprediction effects. Thiscourse is about how to effec-tively utilize modern hardwarewhen writing software. Theemphasis is on getting highperformance (usually meas-ured as speed) for compute-intensive or data-intensiveworkloads. At the end of thiscourse, students have learnedabout (and experienced) howto code algorithms so thatthey run efficiently on modernhardware platforms.

Visiting ProfessorDragomir Radev

Three visualization algorithms showing a network of face images in a 2D visualization. Each imagecontains hundreds of pixels and dimensions but only two dimensions can be preserved. On the top,the plot just lays out the images according to two highest variability dimensions. In the middle, thenetwork is stretched out before finding the top two dimensions. On the bottom (the most faithfulvisualization) the plot stretches the network of images in two dimensions while also squashing awaythe remaining unseen dimensions.


Professor Stephen Unger

The following piece by ProfessorStephen Unger was printed in theJournal News on October 26, 2006. A longer version is available athttp://www1.cs.columbia.edu/~unger/articles/e-voting1-11-07.html

With the 2008 presidential election on the horizon, manycomputer engineers and scien-tists fear that electronic votingmachines might facilitate votingfraud on an unprecedented scale.

Election fraud has alwaysexisted, regardless of the tech-nology used. Both U.S. politicalparties have used such simpletactics as ballot box stuffing,ballot destruction, buyingvotes, and fraudulent counting,which, for conventional votingsystems, are labor intensiveand hard to conceal.

E-voting eliminates none ofthese “traditional” modes ofcheating, while making possibleentirely new corruption methodsthat are very hard to detectbecause detailed digital systemoperations are invisible tohuman observers.

E-voting systems have alreadycrashed on election days.When there is Internet access,there are real dangers of hackerbreak-ins.

Still more serious is the possi-bility of systems with concealedfeatures deliberately designed tofalsify results by switching votesfrom one candidate to another.Engineers are accustomed tolooking for program bugs andinadvertent design errors, and to testing systems for faulty circuit elements. Finding hidden features in a complex program,when even the comments maybe deliberately misleading, is a very different, dauntingproblem, akin to detecting the tricks of stage magicians.Programs can be written thatrespond properly to all normallyexpected inputs during testing,but which change their behaviorcompletely after some specialinput signal is entered.

Malware might be insertedduring the election processby wireless transmissions, orunder the guise of spelling corrections, or fixing “minor”program bugs. This code mightbecome active only at certaintimes during the electionprocess and then delete itself,leaving no trace behind.

Given the complexity of modernintegrated circuit technology,with tens of millions of transis-tors on a chip, ensuring thatthe hardware side is cleanwould be at least as difficult.

Another corruption technique isa “denial of service attack.” Inprecincts where the cheater’sopponent is expected to receivea large majority of the votes, a substantial subset of themachines can be programmed to

break down during the election,causing many voters to leavewithout casting ballots.

A proposed antidote to elec-tronic manipulation is paperballots. Voters, after verifyingthat the on-screen votes matchtheir intentions, press a keycausing the screen image to beprinted on a paper ballot thatthey can see (but not touch). If the printed ballot is correct,they press another key, causingthe ballot to be dropped into aslotted ballot box in full view ofthe voters and all observers.The system reports the talliesas computed from data corre-sponding to voter approvedscreen images or from opticalscans of the paper ballots.Disputes over the validity of thefinal numbers can, in theory,be resolved by manual recountof the paper ballots, which arecarefully preserved.

However, experiments indicatethat most voters won’t noticediscrepancies between screenand paper ballots. Suppose a machine is programmed tochange 10 percent of the votescast for X to votes for Y bothelectronically and on the printed

ballot, and that this is noticedby voters (and corrected) in 50 percent of the cases. Then5 percent of X’s votes wouldbe recorded both electronicallyand on the paper ballots asvotes for Y.

Post-election manual recountingof paper ballots and paralleltesting (election day tests on randomly selected votingmachines) have been proposedas safeguards. But electionlaws and the way they areadministered in the variousstates offer little hope that thesecould generally be implementedreliably enough to thwart corruptE-voting systems.

Every voting system is vulnera-ble to fraud if it is not open topublic inspection or if represen-tatives of concerned parties arenot sufficiently vigilant at everystage. Ensuring that E-votingsystems are not cheating maybe impossible in the real worldeven with back-up paper ballots.

Having spent a lifetime workingon the kind of technology under-lying E-voting systems, I findmyself in the peculiar positionof advocating use of the mostprimitive type voting system:manually marked, manuallycounted, paper ballots. Thissimple, inexpensive techniquehas withstood the test of time,functioning reliably and scalingwell for electorates of all sizes. It can be operated and moni-tored effectively by ordinarycitizens, and is widely used(e.g., in New Hampshire, partsof several other states, and inmost other countries, includingCanada, France, Germany, andSweden). E-voting systemsconfer no benefits justifyingtheir great risks.

E-voting still vulnerable to fraud Featured Article


Adam Aviv, an undergraduatestudent in the ComputerScience Department, wasselected for Honorable Mentionin the Computing ResearchAssociation’s (CRA) OutstandingUndergraduate Award for 2007.

Professor Peter Allen is amember of a multi-institutionteam that has received a $5.3M NIH BiomedicalResearch Partnership grant forCortical Control of a DextrousProsthetic Hand. The teamincludes researchers from Pitt,Minnesota, CMU, Arizona Stateand Columbia. The goal of thisproject is to build and demon-strate an anthropomorphicprosthetic arm and hand that is controlled by cortical output.

Professor Steven Bellovin

received the 2007 NationalComputer Security Award of the National Institute ofStandards and Technology andthe National Security Agency.This prestigious honor, firstawarded in 1988, recognizesindividuals for scientific ortechnological breakthroughs,outstanding leadership, highlydistinguished authorship or sig-nificant long-term contributionsin the computer security field.The award was presented in a ceremony during the 22ndAnnual Computer SecurityApplications Conference(ACSAC) in Miami Beach,Florida, in December 2006.

Professor Luca Carloni wonan NSF Faculty Early CareerDevelopment (CAREER) award todevelop a new communication-based design methodology fordistributed embedded systems.The grant is titled “IntegratingControl, Computation, andCommunication– A DesignAutomation Flow for DistributedEmbedded Systems”.

Seung Geol Choi, a PhD student in the Department of Computer Science, won a best student-paper award at the International Workshopon Security (IWSEC’06). Thepaper was co-authored withKunsoo Park (Seoul National

University) and Moti Yung

(RSA and Columbia University).The paper is titled “ShortTraceable Signatures Based on Bilinear Pairings”.

Professors Steven Feiner andKathleen McKeown and theirstudents are working in collabo-ration with Professor DesmondJordan (Anesthesiology andBiomedical Informatics) and Dr.

Michelle Zhou (IBM, PhD ‘99)on a charter project under thenew IBM Open CollaborativeResearch program announced inDecember. They are developingtools to support physicians increating, editing, and reviewingnotes about patients. Notesare the primary source of infor-mation about patient status forcaregivers, but are time-con-suming to write and difficult toreview quickly. To address this,the research emphasizes intel-ligent, ubiquitous informationgathering; automated genera-tion of notes; and on-demandsynthesis of customized multi-media presentations of patientstatus. The team is using theNew York Presbyterian HospitalCardiothoracic Intensive CareUnit as a living laboratory inwhich to develop, demonstrate,and evaluate new technologies.The project leverages theresearchers’ collective expertisein human-computer interaction,natural language processing,computer graphics and visuali-zation, cardiac anesthesiology,critical care medicine, andmachine learning. Team members have been workingtogether for over ten yearsbuilding and testing experimentalmultimedia systems for care-givers in the CardiothoracicIntensive Care Unit.

Professor Steven Feiner gavethe keynote address at AUIC2007 (the Australasian UserInterface Conference) on directions for user interfaceresearch. He is also serving on the DARPA IXO ImmersiveOperations panel, which isexploring how effective systemscould be created to communi-cate real-time data to personnel

at all levels of command toenable quick and accurate decision-making. The goal of the panel is to suggest newprogram ideas for DARPA IXOto pursue. In addition, ProfessorFeiner was associate paperschair for ACM CHI 2007 andarea chair for the 2006 IEEE andACM International Symposiumon Mixed and AugmentedReality, and was featured in a TechNewsWorld story onAugmented Reality (AR).

Agustin Gravano, Christian

Daniel Murphy, Joshua Reich,Cristian Petrus Soviani,Stanley Tzeng, and Ilho Ye

were named as “DistinguishedTAs” for the Spring 2006semester. Congratulations toour outstanding TAs!

Associate Research ScientistNizar Habash of CCLS (theCenter for ComputationalLearning Systems) was men-tioned in a Wired piece onmachine translation.

Professor Tony Jebara

received an ONR award titled“Learning to Match Data fromHeterogeneous Databases”. The research proposal exploresmatching, b-matching and permutation within statisticallearning, for applications includ-ing constraining social networksusing graphs and b-matchings,visualizing large social net-works, minimum volumeembedding and merging socialnetworks across heterogeneousdatabases. The applications willbe explored via several novelalgorithms and will scientificallyadvance the areas of b-matching,permutation, metric learning,structured prediction, invarianceand graph embedding.

Professors Angelos Keromytis

and Sal Stolfo won a GoogleResearch Award to study “SafeBrowsing Through Web-basedApplication Communities”.Application Communities is anew paradigm for protectingsoftware systems. Communitymembers running independentinstances of the same applicationwill continuously exchange infor-

Department News & Awards


mation that allows them to col-lectively identify new faults andattacks (collaborative monitoring),and to automatically develop,test and apply fixes (heal).

PhD student Christian D.

Murphy has garnered thehighest TA honor in the School,the designation of “super TA.’’This designation is bestowedon TAs who have received theExtraordinary TA Award threetimes in the last four semesters(Fall ‘04–Spring ‘06). Therewere only three such super TAsdesignated in the whole School.Congratulations and thanks toChris for his great teaching!

Professor Steven Nowick

was awarded an ISE Grant(Initiatives in Science andEngineering) from the Office of

the Executive Vice Presidentfor Research. The funding is forinnovative proposals in earlystages of development, with a special interest in cross-disciplinary work. The proposalis “Designing a Flexible High-Throughput AsynchronousInterconnect Fabric for FutureSingle-Chip Parallel Processors.”The goal is to design a high-throughput, flexible and low-power digital fabric for futuredesktop parallel processors,e.g. those with 64+ proces-sors per chip. This work is incollaboration with the parallelprocessing and CAD groups atthe University of Maryland.

Professor Rocco Servedio wasawarded a grant to participate inthe DARPA Computer ScienceStudy Panel. The objective of

the Computer Science StudyPanel is to rapidly identifyideas in the field of computerscience that will provide revo-lutionary advances, rather thanincremental benefit, to theDepartment of Defense. Areasof special interest include pattern recognition, computervision, probabilistic reasoning,biologically inspired exploitation,abnormal behavior analysis,cognitive psychology, machinelearning, and other advanceddisciplines in computer science.

Professor Joseph Traub

organized the 20th AnniversarySymposium of the ComputerScience and TelecommunicationsBoard. The symposium wasthe subject of substantialmedia coverage, including theNew York Times.

Department News & Awards (continued)

Ely Labovitz has been promotedto chief technology officer forAccuro Healthcare Solutions, Inc.(Accuro), an industry leader inrevenue management solutions.Ely holds a bachelor’s degree incomputer science from YeshivaUniversity in New York, and hascompleted master’s courses in computer science fromColumbia with concentrationsin algorithm analysis, objectoriented development, andcomputer architecture.

The shareholders of CHAMCOAUTO (China AmericaCooperative Automotive, Inc.)elected William Pollack asChairman. William holds a BAin Mathematics and ComputerScience and an MS in ComputerScience from Columbia.

Naomi Novik, former CSdepartment instructor and MSstudent of Professor Jason Nieh,was the subject of a New YorkTimes Arts Section profile on theoccasion of selling film rightsof her three fantasy novels to Peter Jackson, the directorof the “Lord of the Rings”movies. (See cover article.)

Daniel Yellin (PhD ‘87) hasbecome Director of the IBMSoftware Group Lab in Israel.Daniel is an IBM DistinguishedEngineer and has been workingat IBM since graduating fromColumbia.

Erez Zadok (PhD ‘01) was promoted to the position ofAssociate Professor withtenure at SUNY Stony Brook.

Alumni News


Recent & Upcoming PhD Defenses Hrvoje Benko Advisor: Steven Feiner User Interactionin Hybrid Multi-DisplayEnvironments

Abstract: In recent years, thetypical computer workspace hasbeen experiencing a very signifi-cant transformation, from a singledesktop computer with a singleattached display, to a multi-display environment (MDE) withmultiple connected devices. Thisdissertation presents the design,implementation, and evaluation ofnovel pointer-based and gesture-based interaction techniques for MDEs. These techniques tran-scend the constraints of a singledisplay, allowing users to com-bine multiple displays and inter-action devices in order to benefitfrom the advantages of each. First, we introduce a set of Multi-Monitor Mouse techniques, whichimprove existing mouse pointerinteraction in an MDE, by allowingusers to instantaneously relocate(warp) the cursor to an adjacentdisplay, instead of traversing thebezel. Formal evaluations showsignificant improvements in userperformance when compared tostandard mouse behavior. Next, we focus on a particular kindof MDE, in which 3D head-wornaugmented reality displays arecombined with handheld and sta-tionary displays to form a hybridMDE. A key benefit of hybrid MDEsis that they integrate and utilize the3D space in which the 2D displaysare embedded, creating a seam-less visualization environment.We describe the development ofa complex hybrid MDE system,called Virtual Interaction Tool forArchaeology (VITA), which allowsfor collaborative off-site analysisof archaeological excavation databy distributing the presentation ofdata among several head-worn,handheld, projected tabletop, andlarge high-resolution displays. Finally, inspired by the lack of aninteraction vocabulary for hybridMDEs, we designed three sets ofgestural techniques that address

the important issues of data tran-sitions across 2D and 3D displaysand precise interactions withinthose displays. First, we presentCross-Dimensional Gestures,which facilitate transitioning thedata between devices, displays,and dimensionalities, by synchro-nizing the recognition of gesturesbetween a 2D multi-touch-sensi-tive projected display and atracked 3D finger-bend sensorglove. Second, we describe a setof Dual Finger Selection tech-niques that allow for precise andaccurate selection of small targetswithin 2D displays, by exploitingthe multi-touch capabilities of atabletop surface. Third, we presentthe Balloon Selection technique,which was found to be three timesmore accurate than standardwand-based selection whenselecting small 3D objects above atabletop surface. Balloon Selectiondecouples the 3DOF selectiontask into a 2DOF task and a 1DOFtask, while grounding the user’shands on a tabletop surface.

Sasha Blair-GoldensohnAdvisor: Kathy McKeown Long-AnswerQuestionAnswering and

Rhetorical-Semantic Relations Abstract: Over the past decade,Question Answering (QA) hasgenerated considerable interestand participation in the fields ofNatural Language Processingand Information Retrieval.Conferences such as TREC, CLEFand DUC have examined variousaspects of the QA task in theacademic community. In thecommercial world, major searchengines from Google, Microsoftand Yahoo have integrated basicQA capabilities into their coreweb search. These efforts have focused largelyon so-called “factoid” questionsseeking a single fact, such as thebirthdate of an individual or thecapital city of a country. Yet in thepast few years, there has beengrowing recognition of a broad

class of “long-answer” questionswhich cannot be satisfactorilyanswered in this framework, suchas those seeking a definition,explanation, or other descriptiveinformation in response. In thisthesis, we consider the problemof answering such questions, withparticular focus on the contribu-tion to be made by integratingrhetorical and semantic models. We present DefScriber, a systemfor answering definitional (“Whatis X?”), biographical (“Who is X?”)and other long-answer questionsusing a hybrid of goal- and data-driven methods. Our goal-driven,or top-down, approach is moti-vated by a set of “definitionalpredicates” which capture infor-mation types commonly useful in definitions; our data-driven, or bottom-up, approach usesdynamic analysis of input data toguide answer content. In severalevaluations, we demonstrate thatDefScriber outperforms competi-tive summarization techniques,and ranks among the top long-answer QA systems being devel-oped by others. Motivated by our experiencewith definitional predicates inDefScriber, we pursue a set ofexperiments which automaticallyacquire broad-coverage lexicalmodels of “rhetorical-semanticrelations” (RSRs) such as Cause and Contrast. Building on the framework of Marcu andEchihabi (2002), we implementtechniques to improve the qualityof these models using syntacticfiltering and topic segmentation,and present evaluation resultsshowing that these methods canimprove the accuracy of relationclassification. Lastly, we implement twoapproaches for applying theknowledge in our RSR models toenhance the performance andscope of DefScriber. First, weintegrate RSR models into theanswer-building process inDefScriber, finding incrementalimprovements with respect to thecontent and ordering of respons-es. Second, we use our RSRmodels to help identify relevantanswer material for an exploratory

class of “relation-focused” questions which seek explanatoryor comparative responses. Wedemonstrate that in the case of explanation questions, usingRSRs can lead to significantlymore relevant responses.

Hoon ChangAdvisors: Vishal Misra andDan Rubinstein AnalyticalModel andFairness

Scheduling of CSMA/CA inPhysical Layer CapturingAbstract: While physical layercapture has been observed in realimplementations of 802.11 devices,there is a lack of accurate modelsthat describe the behavior of thephenomenon and deep investiga-tion for fair allocation. We firstpresent a general analyticalmodel and an iterative methodthat predicts error probabilitiesand throughputs of packet trans-missions with multiple sender-receiver pairs. In our model,MAC protocol is considered afeedback controlling system andthe least square method is usedto model the rate adjustment of 802.11 DCF protocol. For theinteraction and interferenceamong nodes, we present aniterative method to offer a moreaccurate prediction than previouswork by taking into account the cumulative strength of inter-ference signals and using theBER model to convert a signal to interference and noise ratiovalue to a bit error probability. Thispermits the analysis of packetreception at any transmissionrate with interference fromneighbors at any set of locations.We prove that our iterativemethod converges and verify theaccuracy of our model throughsimulations in Qualnet. Second, we present log-utilityfair scheduling in physical layercapturing. Having been knownwidely as a less egalitarianapproach to max-min fairness,log-utility fair schedulingachieves a reasonable total


Recent & Upcoming PhD Defenses (continued)throughput as well as acceptablefairness. We investigate theeffect of log-utility fairness inphysical capturing and presenttheorems regarding convex feasible allocation space in two-sender cases. Based on our analytical model, we also pro-pose a decentralized log-utilityfair allocation algorithm. Theextension of our iteration methodand analytical model yields analgorithm that returns accessrates of all senders within a fea-sible running time, linear in thenumber of senders. For inputs tothe algorithm, we present a pro-tocol to gather neighbors’ errorstatistics. We show log-utility fairallocations through simulations. We also perform real experimentsto verify our analytical model.With the laboratory trial networktestbed, Orbit, we have developedmeasurement software for errorprobabilities and throughputs. Theaccuracy of our analytical modelis verified with measured resultsin the field. Our fairness schedul-ing algorithm is implementedusing PC and laptop nodes. Toextend our algorithm, we considerquality-of-service constraintssuch as requirements of maximumacceptable error probability andminimum throughput. Assumingonly the information of one-hopneighbors is available, we developan extended distributed algorithmto meet given constraints.Another research topic is channelallocation. Considering channelallocation and access rateassignment together, we maximizethe log utilities of nodes.

Michel GalleyAdvisor: Kathy McKeown Pragmatic and SyntacticModels forProbabilistic

Speech Summarizers Automatic speech summarizationcan reduce the overhead of navigating through long streamsof speech data by generatingconcise and readable textualsummaries that capture the

“aboutness” of speech recordings,such as meetings, talks, and lec-tures. In this thesis, we addressthe problem of summarizingmeeting recordings, a task thatfaces many challenges not foundwith written texts and preparedspeech. Informal style, speecherrors, presence of many speak-ers, and apparent lack of coherentorganization mean that the typicalapproaches used for text sum-marization must be extended foruse with conversational speech.We illustrate how techniquesthat exploit automatically derivedpragmatic and syntactic structurescan help create summaries thatare more on-target, coherent,and readable than those pro-duced by current state-of-the-artapproaches. We present the discriminativelearning of graphical models forsummary sentence selection,evaluate different model struc-tures, and discuss their impact onthe coherence of generated sum-maries. Significant findings in thiswork show that dependenciesbetween pragmatically relatedsentences, such as between a Question and its Answer, or an Offer and its Acceptance, are instrumental in determiningwhich sentences should beincluded in the summary. Thesedependencies also help generat-ing more coherent summaries,i.e., they penalize the inclusion ofa Question if none of its Answersare included as well. We discussand address some computationalintricacies and inference prob-lems specific to speech withmany participants, which oftenincorporate pragmatic dependen-cies between sentences that arefar apart in the speech stream. We also investigate the problemof rendering errorful and disfluentmeeting utterances into concisesentences that are maximallyreadable. We present a trainablesyntax-directed sentence com-pression system that automaticallyremoves information-poor phrases(such as “you know” or “forsome reason”) from the syntaxtree of each input sentence. Anovel aspect of this sentence

compression approach lies in itsMarkovization of syntactic com-pression rules, a process thatfactorizes these rules into sub-structures more amenable forprobabilistic estimation. Ourdecompositions not only facilitatesupervised learning under severesparseness conditions, but alsoenable the incorporation of manyadditional features, such as lexicaldependencies and tree internalannotation, which are critical foraccurately determining whethera given syntactic phrase is gram-matically optional or mandatory.This sentence compression tech-nique was effectively applied toboth speech and written texts.

Edward Ishak Advisor: Steven Feiner Content-AwareInteraction inUser Interfaces Whether inter-

acting with a mobile phone or awall-sized display, users oftenencounter screen space limita-tions that prevent the simultane-ous display of all visual objectsrequired to complete a task.Depending on the individual userand their current task, screenspace limitations result from sev-eral factors, including insufficientphysical space and inadequatepixel resolution. A number oftechniques have been created toaddress these limitations, includ-ing overlapping semi-transparentwindows, scrolling, and layoutmanagement. However, thesetechniques are typically designedto be generic and do not considerthe properties of the content towhich they are applied. In thisthesis, we present an approach tomaking user interfaces content-aware by augmenting existinginteraction techniques. By con-tent-aware, we mean that theytake into account various physicaland semantic attributes of thecontent, such as size, locationand type. We designed, implemented, andevaluated content-aware versionsof three existing user interface

techniques: content-aware trans-parency, content-aware scrolling,and content-aware layout.Content-aware transparencyapplied to overlapping windowsmakes it possible for users tointeract with otherwise hiddencontent by rendering importantregions of windows opaque, andunimportant regions transparent,thus keeping overlaid contentslegible and distinguishable at all times. Furthermore, based onproperties of the overlappingmaterial, appropriate image-processing filters are applied toobstructed content to help disam-biguate the overlapping content.Content-aware scrolling allows auser to scroll along a system- oruser-defined path within a docu-ment using conventional scrollinginteractions, varying scrollingspeed depending on content loca-tion. Content-aware layout laysout windows containing contentrelevant to that of the currentlyfocused screen area, positionedrelative to that area. We presentquantitative user performancedata gathered from formal experi-ments, as well as qualitative ques-tionnaire feedback to show thatinteraction with content-awaretechniques can provide an effec-tive advantage over techniquesthat are not content-aware.

Risi Kondor Advisor: Tony Jebara Group theoreticalmethods in machinelearning

Abstract: If mathematics is aboutcapturing structure in the worldaround us, then arguably there isno greater success story in theentire subject than the theory ofgroups and their representations.Having started out as a devicefor manipulating the roots ofpolynomial equations, groupshave proved to be so fundamentalto understanding the quantumworld that to modern physicsthey are as indispensable as thereal numbers.


Learning problems do not usuallyinvolve quantum particles ofcourse, but they do confront non-trivial symmetries and harmonicanalysis on various discrete andcontinuous spaces. In this regardit is surprising that up to now the connections to group theory,and algebra in general, have sofar remained largely unexploited.Beyond painting a unifying theoretical picture, algebra alsohas algorithms to offer: recentyears in particular have seen thedevelopment of a number of newefficient algorithms. This thesis explores the connec-tions between group theory andlearning from different angles,directed at different applications.One overarching theme however,is the extension of regularizationtheory to groups, allowing us todeploy SVMs and other familiarstatistical learning algorithms inthis algebraic domain. A numberof canonical theorems regardingpositive definite functions ongroups and their interpretationsare provided. The theoretical results find appli-cation in multi-object tracking,where a time varying distributionover permutations has to bemaintained. Representation theo-ry not only tells us how to evolvethis distribution, but also tells ushow to represent it compactly,and the modern theory of non-commutative FFTs provides algorithms for efficient updates. Another broad area for applica-tions is capturing symmetries,such as the rotation and transla-tion invariance of natural images.I describe a fundamentally newapproach to generating rotationand translation invariant featuresbased on combining ideas fromrepresentation theory with some classical tools from signalprocessing. Finally, I investigate how thestructure of the symmetric groupis reflected in ranking problemsand their many variations, andhow this can be exploited to buildmore principled and more power-ful ranking systems.

Janak Parekh Advisor: Gail Kaiser Privacy-PreservingDistributedEvent Correlation

Abstract: Event correlation is awidely-used data processingmethodology for a broad varietyof applications, and is especiallyuseful in the context of distributedmonitoring for software faults andvulnerabilities. However, mostexisting solutions have typicallybeen focused on “intra-organiza-tional”correlation; organizationstypically employ privacy policiesthat prohibit the exchange ofinformation outside of the organi-zation. At the same time, thepromise of “inter-organizational”correlation is significant giventhe broad availability of Internet-scale communications, and itspotential role in both softwarefault maintenance and softwarevulnerability detection. In this thesis, I present a frame-work for reconciling these oppos-ing forces via the use of privacypreservation integrated into theevent processing framework. Iintroduce the notion of event cor-roboration, a reduced yet flexibleform of correlation that enablescollaborative verification, withoutrevealing sensitive information. Byaccommodating privacy policies,we enable the corroboration ofdata across different organiza-tions without actually releasingsensitive information. The frame-work supports both sourceanonymity and data privacy, yetallows for temporal corroborationof a broad variety of data. Theframework is designed as a light-weight collection of componentsto enable integration with existingCOTS platforms and distributedsystems. I also present an imple-mentation of this framework:Worminator, a collaborativeIntrusion Detection System, basedon an earlier platform, XUES(XML Universal Event Service),an event processor used as partof a software monitoring platformcalled KX (Kinesthetics eXtreme).

KX comprised a series of compo-nents, connected together with apublish-subscribe content-basedrouting event subsystem, for theautonomic software monitoring,reconfiguration, and repair ofcomplex distributed systems.Sensors were installed in legacysystems; XUES’ two modules thenperformed event processing onsensor data: information was col-lected and processed by the EventPackager, and correlated usingthe Event Distiller. While XUESitself was not privacy-preserving,it laid the groundwork for thisthesis by supporting event typing,the use of publish-subscribe andextensibility support via pluggableevent transformation modules. I also describe techniques bywhich corroboration and privacypreservation could optionally be“retrofitted” onto XUES withoutbreaking the correlation applica-tions and scenarios described. Worminator is a ground-up rewriteof the XUES platform to fully sup-port privacy-preserving eventtypes and algorithms in the con-text of a Collaborative IntrusionDetection System (CIDS), wherebysensor alerts can be exchangedand corroborated without reveal-ing sensitive information about acontributor’s network, services,or even external sources, asrequired by privacy policies.Worminator also fully anonymizessource information, allowing con-tributors to decide their preferredlevel of information disclosure.Worminator is implemented as amonitoring framework on top of a collection of non-collaborativeCOTS and in-house IDS sensors,and demonstrably enables thedetection of not only worms butalso “broad and stealthy” scans;traditional single-network sensors either bury such scans in large volumes or miss thementirely. Worminator supportscorroboration for packet andflow headers (metadata), packetcontent, and even aggregatemodels of network traffic using avariety of techniques. The contributions of this thesisinclude the development of a

cross-application-domain eventprocessing framework withnative privacy-preserving types,the use and validation of privacy-preserving corroboration, and the establishment of a practicaldeployed collaborative securitysystem. The thesis also quantifiesWorminator’s effectiveness atattack detection, the overhead of privacy preservation and theeffectiveness of our approachagainst adversaries, be they“honest-but-curious” or activelymalicious.

AngelosStavrou Advisor: AngelosKeromytis An OverlayArchitecture for End-to-End

Service Availability Abstract: Perhaps one of themost compelling problems of theInternet today is the lack of anoverarching approach to dealingwith on-line service security andavailability: there exist a lot ofmechanisms but no “securityarchitecture”– no set of rules forhow these mechanisms should becombined to achieve overall goodsecurity. This thesis is aimed atintroducing and analyzing mech-anisms that boost the security,resilience and performance ofnetwork systems. These systemsare composed of large numbers ofuntrusted and unreliable compo-nents that communicate using theexisting network infrastructure. Towards this goal, we proposeand evaluate practical mecha-nisms that can protect a widerange of services while maintain-ing or even improving their per-formance characteristics. Our endgoal is to provide a practical end-to-end framework that significantlyimproves service availability andconnectivity without incurring a prohibitive deployment or performance cost. Ideally, ourapproach should be able to scaleto millions of users and accom-modate any applications’ require-ments including network latency


and throughput. We show ourprogression towards our end goalby presenting the advantages andlimits of the protection system wedeveloped: PROOFS, WebSOS,MOVE, and Spread-Spectrumwith Multi-path overlays.

Xiaotao Wu Advisor: HenningSchulzrinne UbiquitousProgrammableInternetTelephony

End System ServicesAbstract: In Internet telephony,endpoints usually have CPU andmemory, so they are programma-ble and can perform services suchas call forwarding, transfer, andscreening. In contrast, the tradi-tional Public-Switched TelephonyNetwork (PSTN) assumes dumbendpoints. In addition, peer-to-peer (P2P) technologies introducetelecommunication networks thatdo not need proxy or applicationservers to make calls. In such P2Pnetworks, many telecommunica-tion services have to be performedon end systems, such as intelligentphones. The enhanced capabilitiesof end systems and the servicerequirements in P2P networksmotivate the investigation of endsystem services in this thesis. Performing services in end sys-tems may result in many newcommunication services, maketelecommunication servicesmore distributed, and maketelecommunication networksmore robust and efficient overall.At the same time, telecommuni-cation services may becomemore difficult to manage, thusrequiring new techniques forcreating and composing servic-es. This dissertation presents myresearch on defining end systemservices, developing efficient anduser-friendly tools for creatingend system services, managingend system feature interactions,and integrating end system serv-ices with other Internet services,such as web, email, location-based services, and networkedappliance control.

I first analyzed the differencebetween end system services andnetwork services. The analysisshowed that they differ in callmodels, targeted service creators,and call control actions. Becauseof these differences, I defined a new scripting language calledthe Language for End SystemServices (LESS) specifically forend system service creation. LESSis designed to allow comparativelyinexperienced end users to createservices. End users may createconflicting services over time, so Ideveloped a method that is basedon the action conflict tables Idefined in LESS and a tree merg-ing algorithm to handle featureinteractions among LESS scripts.In addition, because end users areoften not aware of available serv-ices and do not know how to cre-ate services, automatically gen-erating their desired services canbe a great help to them. Therefore,I built a service learning systemthat uses the Incremental TreeInduction (ITI) algorithm to auto-matically create LESS scriptsbased on users’ communicationhistories. Furthermore, to evaluatethe usefulness of LESS for endusers, I conducted a survey onservice creation by end users. Thesurvey shows that relatively inex-perienced users are willing andable to create their desired serv-ices, and our LESS-based servicecreation tool, Columbia UniversityTelecommunication serviceEditor (CUTE), fits their needs. Beyond end system service cre-ation research, in this dissertationI also investigated location-based services in Internettelephony, specifically onInternet telephony end systems. I first analyze different location-based services in Internettelephony. Following the analysis,I introduce the implementation of the location-based services inour lab environment, as well asthe prototype implementation ofemergency call handling in SIP-based Internet telephony systems.Moreover, I also discuss our SIP-based global-scale ubiquitouscomputing architecture. Thearchitecture uses the ServiceLocation Protocol (SLP) to find

available resources based on location information, thenemploys the SIP third-party callcontrol architecture (3PCC) tocontrol the available resources. To test the hypotheses of myresearch, I have implemented aSIP user agent–SIPC. SIPC con-tains basic SIP functions as wellas an end system service execu-tion environment. It also supportsmany other Internet functions,such as service discovery, eventnotification, networked appliancecontrol, instant messaging, andmulticast media streaming basedon the Session AnnouncementProtocol (SAP). Multiple functionsintegrated in SIPC can interactwith each other to provide newservices. This dissertation discuss-es the new services introducedby multi-function integration andinteraction in detail.

Recent & Upcoming PhD Defenses (continued)


The reader who provides the wittiest accompanying caption for this photograph of Professors Steve Bellovinand Angelos Keromytis (bothexperts in computer security)will win– yes, it’s true – ayear’s free subscription to theCUCS newsletter.

Send your entry to:[email protected] look forward to reading all responses and publishingthe funniest ones!

Caption Contest

In what is sure to become an instant classic,the CUCS newsletter is delighted to kick off thevery first installment of our Caption Contest.

24447spr07 csnewsletter rev.qxd:24447spr07 csnewsletter rev · breathing dragon is enormously...

Documents