Transcript
Page 1: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Assieme: Finding and Leveraging Implicit

References in a Web Search Interface for Programmers

Raphael Hoffmann, James Fogarty, Daniel S. Weld

University of Washington, SeattleUIST 2007

Page 2: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Programmers Use Search

• To identify an API• To seek information about an API• To find examples on how to use an

API

“Programmatically output an Acrobat PDF file in Java.”

Example Task:

Page 3: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Example: General Web Search Interface

Page 4: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Example: Code-Specific Web Search

Interface

Page 5: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Problems

• Information is dispersed: tutorials, API itself, documentation, pages with samples

• Difficult and time-consuming to …– locate required pieces,– get an overview of alternatives,– judge relevance and quality of results,– understand dependencies.

• Many page visits required

Page 6: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

With Assieme we …

• Designed a new Web search interface• Developed needed inference

Page 7: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Outline

• Motivation• What Programmers Search For• The Assieme Search Engine

– Inferring Implicit References– Using Implicit References for Scoring

• Evaluation of Inference & User Study• Discussion & Conclusion

Page 8: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Six Learning Barriers faced by Programmers (Ko et

al. 04) • Design barriers — What to do?

• Selection barriers — What to use?

• Coordination barriers — How to combine?

• Use barriers — How to use?

• Understanding barriers — What is wrong?

• Information barriers — How to check?

Page 9: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Examining Programmer Web Queries

Objective• See what programmers search for

Dataset• 15 million queries and click-through data• Random sample of MSN queries in 05/06

Procedure• Extract query sessions containing ‘java’ – 2,529• Manual looking at queries and defining regex

filters• Informal taxonomy of query sessions

Page 10: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Examining Programmer Web Queries

Page 11: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Examining Programmer Web Queries

Descriptive Contain package, type or member name

Contain terms like “example”, “using”, “sample code”

64.1 % 35.9 %

17.9 %

“java JSP current date” “java SimpleDateFormat”

“using currentdate in jsp”

Selection barrier Use barrier

Coordination barrier

Page 12: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Assieme

example

code

documentation

required

libaries

relevance indicated by

# uses

Summaries show

referenced types

links torelated

info

Page 13: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Challenges

How to put the right information on the interface ?

• Get all programming-related data• Interpret data and infer relationships

Page 14: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Outline

• Motivation• What Programmers Search For• The Assieme Search Engine

– Inferring Implicit References– Using Implicit References for Scoring

• Evaluation of Inference & User Study• Discussion & Conclusion

Page 15: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Assieme’s Data

… is crawled using existing search engines

Pages withcode examples JAR files JavaDoc pages

Queried Google on“java ±import ±class …”

Queried Google on“overview-tree.html …”

Downloaded libraryfiles for all projects onSun.com, Apache.org,

Java.net, SourceForge.net

~2,360,000 ~79,000 ~480,000

Page 16: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

The Assieme Search Engine

… infers 2 kinds of implicit references

JAR files

JavaDoc pages

Pages withcode examples

Uses of packages,

types and members

Matches of packages,

types and members

?

Page 17: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

unclear segmentation

Extracting Code Samples

code in a different language (C++)distracting terms ‘…’ in code

line numbers

Page 18: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Extracting Code Samples

remove HTML commands,but preserve line breaksremove some distracters by heuristicslaunch (error-tolerant) Java parser at every line break

(separately parse for types, methods, and sequences of statements)

<html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html>

<html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html>

A simple example:

1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }

back

A simple example:

1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }

back

A simple example:

import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}

back

A simple example:

import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}

back

Page 19: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Resolving External Code References

Naïve approach of finding term matches does not work:

1 import java.util.*;2 class c {3 HashMap m = new HashMap();4 void f() { m.clear(); }5 }

Reference java.util.HashMap.clear() on line 4 only detectable by considering several lines

?

Use compiler to identify unresolved names

Page 20: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Resolving External Code References

• Index packages/types/members in Jar files

JARfiles

Utility function:# covered references(and JAR

popularity)

java.util.HashMap.clear()java.util.HashMap…

greedily pickbest JARs

JARfiles

unresolved names

compile

indexlookup

put onclasspath

• Compile & lookup

Page 21: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Scoring

• Existing techniques …

– Docs modeled as weighted term frequencies– Hypertext link analysis (PageRank)

– JAR files (binary code) provide no context– Source code contains few relevant keywords– Structure in code important for relevance

• … do not work well for code, because:

Page 22: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Using Implicit References to Improve Scoring

• Assieme exploits structure on Web pages

HTML hyperlinks

and structure in code

code references

Page 23: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Scoring

APIs(packages/types/members)

Web pages

Page 24: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Scoring

APIs• Use text on doc pages and on pages with

code samples that reference API (~ anchor text)

• Weight APIs by #incoming refs (~ PageRank)

Web Pages• Use fully qualified references

(java.util.HashMap) and adjust term weights• Filter pages by references• Favor pages with accompanying text

Page 25: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Outline

• Motivation• What Programmers Search For• The Assieme Search Engine

– Inferring Implicit References– Using Implicit References for Scoring

• Evaluation of Inference & User Study• Discussion & Conclusion

Page 26: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Evaluating Code Extraction and Reference Resolution

… on 350 hand-labeled pages from Assieme’s data

Reference Resolution• Recall 89.6%, Precision 86.5% • False positives: Fisheye and diff pages• False negatives: incomplete code samples

Code Extraction• Recall 96.9%, Precision 50.1% ( 76.7%)• False positives: C, C#, JavaScript, PHP,

FishEye/diff• (After filtering pages without refs: precision 76.7%)

Page 27: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

User Study

Assieme vs. Google vs. Google Code Search

Design• 40 search tasks based on queries in logs:

query “socket java” “Write a basic server that communicates using Sockets”

• Find code samples (and required libraries)• 4 blocks of 10 tasks: 1 for training + 1 per

interfaceParticipants• 9 (under-)graduate students in Computer Science

Page 28: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

User Study – Task Time

Assieme Google GCS0

50

100

150

seco

nd

s (

SE

M)

F(1,258)=5.74p ≈ .017

F(1,258)=1.91p ≈ .17

*significant

Page 29: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

User Study – Solution Quality

0 seriously flawed .5 generally good but fell short in critical regard1 fairly complete

Assieme Google GCS0.0

0.2

0.4

0.6

0.8

1.0

qu

alit

y (

SE

M)

F(1,258)=55.5p < .0001F(1,258)=6.29

p ≈ .013**

Page 30: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

User Study – # Queries Issued

Assieme Google GCS0.0

0.5

1.0

1.5

2.0

2.5

#qu

erie

s (

SE

M)

F(1,259)=9.77p ≈ .002

F(1,259)=6.85p ≈ .001

**

Page 31: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Outline

• Motivation• What Programmers Search For• The Assieme Search Engine

– Inferring Implicit References– Using Implicit References for Scoring

• Evaluation of Inference & User Study• Discussion & Conclusion

Page 32: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Discussion & Conclusion

• Assieme – a novel web search interface• Programmers obtain better solutions,

using fewer queries, in the same amount of time

• Using Google subjects visited 3.3 pages/task, using Assieme only 0.27 pages, but 4.3 previews

• Ability to quickly view code samples changed participants’ strategies

Page 33: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers

Thank YouRaphael Hoffmann

Computer Science & EngineeringUniversity of Washington

[email protected]

James FogartyComputer Science & Engineering

University of [email protected]

Daniel S. WeldComputer Science & Engineering

University of [email protected]

This material is based upon work supported by the National Science Foundation under grant IIS-0307906, by the Office of Naval Research under grant N00014-06-1-0147, SRI International under CALO grant 03-000225 and the Washington Research Foundation / TJ Cable Professorship.


Top Related