individualized knowledge access david karger lynn andrea stein
TRANSCRIPT
Individualized Knowledge Access
David KargerLynn Andrea Stein
Web Search Tools
Indices search by keyword
Taxonomies classify by subject
Cool site of the day
A lot like libraries... Library catalogues
Dewey Digital
New book shelf, suggested reading
Is a universal library enough?
Library/Web Limitations
Huge: too many answers, mostly irrelevant
Only published material miss info known to few, leading-edge
content Rigid:
all get same search results even if come back and try againThe library is the last place we look
Bookshelves First
My data: information gathered personally high quality, easy for me to understand not limited to publicly available content annotations
My organization: choose own subject arrangement optimize for my kind of searching
Adapts to my needs
Then a Friend
Leverage they organize information for their access so quickly find things for me
Personal expertise they know things not in any library
Trust their recommendations are good
Shared vocabulary they know me and what I want
Last the Library
Answer usually there but hard to find would be nice to rearrange to my needs
For hardest problems, need librarian they have broad knowledge of library but not as deep as an expert on
question
Lessons
Individualized access: The best tools adapt to individual ways of organizing and seeking data.
Individualized knowledge: People know much more than they publish. That knowledge is useful.
Haystack: a Tool for Oxygen
Independent but interacting repositories that adapt to their individual users
Individualize access My data collection, organization My search tools, with answers for me
Leverage individual knowledge Collaborative retrieval with others Motivate people to organize their data for
their own benefit and thus for others’
Example
Have probabilistic models been used in data mining? My haystack doesn’t know, but “probability”
is in lots of mail I got from Tommi Jaakola Tommi told his haystack that “Bayesian”
refers to “probability models” Tommi has read several papers on Bayesian
methods in data mining His haystack suggests them to mine
Research Threads
Heterogeneous data and metadata archive whatever user wants
Human-Computer Interaction let user express/use own organizational rules observe user to detect unexpressed knowledge
Machine learning use gathered data to improve performance
Collaborative filtering use others’ decisions to help me
My data
Haystack archives anything web pages browsed, email sent and received,
documents written, scanned images, home directory, people known, projects worked on
And any properties, relationships text of object (if know how) author, title, color, citations, quotations,
annotations, quality, last usage Users freely adds types, relationships
Gathering My Data
Active user input interfaces let user add data, note relationships
Mining data from haystack plug-in services opportunistically extract data e.g., find author/title/text in MSWord document or, detect that one document quotes another
Observing user plug-ins to other interfaces report user actions web pages browsed, mail sent, queries made
Adaptation
Remember user’s attempts to tune a query instead of first query attempt, use last one record items user picked as good matches future similar queries do better right away
Stored content shows what user knows/likes modify queries to big search engines filter results coming back personalized “cool site of the day”
Collaborative Access
Leverage others’ work organizing data no need to “publish” expertise exposed automatically self interest helps others
Privacy/permission concerns allowing exposure easier than publishing much public info: mailing lists, papers read
Whose opinions matter? people I mail, w/shared data, referrals collaborative filtering techniques
Conclusion
Libraries are not enough Haystack teases out individual knowledge Individualizes information access for user Exposes individual knowledge to benefit
community Current status: individual-user prototype.
Some data extraction, observation, adapting. Collaborative version in future.