individualized knowledge access david karger lynn andrea stein

Individualized Knowledge Access

David KargerLynn Andrea Stein

Web Search Tools

Indices search by keyword

Taxonomies classify by subject

Cool site of the day

A lot like libraries... Library catalogues

Dewey Digital

New book shelf, suggested reading

Is a universal library enough?

Library/Web Limitations

Huge: too many answers, mostly irrelevant

Only published material miss info known to few, leading-edge

content Rigid:

all get same search results even if come back and try againThe library is the last place we look

Bookshelves First

My data: information gathered personally high quality, easy for me to understand not limited to publicly available content annotations

My organization: choose own subject arrangement optimize for my kind of searching

Adapts to my needs

Then a Friend

Leverage they organize information for their access so quickly find things for me

Personal expertise they know things not in any library

Trust their recommendations are good

Shared vocabulary they know me and what I want

Last the Library

Answer usually there but hard to find would be nice to rearrange to my needs

For hardest problems, need librarian they have broad knowledge of library but not as deep as an expert on

question

Lessons

Individualized access: The best tools adapt to individual ways of organizing and seeking data.

Individualized knowledge: People know much more than they publish. That knowledge is useful.

Haystack: a Tool for Oxygen

Independent but interacting repositories that adapt to their individual users

Individualize access My data collection, organization My search tools, with answers for me

Leverage individual knowledge Collaborative retrieval with others Motivate people to organize their data for

their own benefit and thus for others’

Example

Have probabilistic models been used in data mining? My haystack doesn’t know, but “probability”

is in lots of mail I got from Tommi Jaakola Tommi told his haystack that “Bayesian”

refers to “probability models” Tommi has read several papers on Bayesian

methods in data mining His haystack suggests them to mine

Research Threads

Heterogeneous data and metadata archive whatever user wants

Human-Computer Interaction let user express/use own organizational rules observe user to detect unexpressed knowledge

Machine learning use gathered data to improve performance

Collaborative filtering use others’ decisions to help me

My data

Haystack archives anything web pages browsed, email sent and received,

documents written, scanned images, home directory, people known, projects worked on

And any properties, relationships text of object (if know how) author, title, color, citations, quotations,

annotations, quality, last usage Users freely adds types, relationships

Gathering My Data

Active user input interfaces let user add data, note relationships

Mining data from haystack plug-in services opportunistically extract data e.g., find author/title/text in MSWord document or, detect that one document quotes another

Observing user plug-ins to other interfaces report user actions web pages browsed, mail sent, queries made

Adaptation

Remember user’s attempts to tune a query instead of first query attempt, use last one record items user picked as good matches future similar queries do better right away

Stored content shows what user knows/likes modify queries to big search engines filter results coming back personalized “cool site of the day”

Collaborative Access

Leverage others’ work organizing data no need to “publish” expertise exposed automatically self interest helps others

Privacy/permission concerns allowing exposure easier than publishing much public info: mailing lists, papers read

Whose opinions matter? people I mail, w/shared data, referrals collaborative filtering techniques

Conclusion

Libraries are not enough Haystack teases out individual knowledge Individualizes information access for user Exposes individual knowledge to benefit

community Current status: individual-user prototype.

Some data extraction, observation, adapting. Collaborative version in future.

individualized knowledge access david karger lynn andrea stein

Documents