remembrance of data past

17
Remembrance of Data Past Using Context in Personal Information Search Amélie Marian, Rutgers University Thu D. Nguyen, Rutgers University Daniela Vianna, Rutgers University Luan Nguyen, Rutgers University

Upload: amelie-marian

Post on 14-Jul-2015

349 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Remembrance of data past

Remembrance of Data PastUsing Context in Personal Information SearchAmélie Marian, Rutgers University

Thu D. Nguyen, Rutgers University

Daniela Vianna, Rutgers University

Luan Nguyen, Rutgers University

Page 2: Remembrance of data past

What was the name of that restaurant?• I went there with Julia

• We had dinner

• It was pouring rain

Some Sources of helpful data

“With Julia”: Calendar, email, text

“Restaurant”: Check-ins, cell phone GPS logs

“Restaurant”: Credit Card statements

“Pouring rain”: Historical Weather reports

Amélie Marian - Rutgers University

Page 3: Remembrance of data past

The Web

hypertext universal library of text

and multimedia

personal/private data social data

Amélie Marian - Rutgers University

Page 4: Remembrance of data past

Personal data is fragmented

Amélie Marian - Rutgers University

Page 5: Remembrance of data past

We remember our data based on context clues• “Serge sent me this file while we were on a

conference call with Alkis”

Skype, Google hangout, email, calendar, filesystem

• “I found this shopping web site while talking to Tova on Skype, She was wearing a bue dress.”

Skype (+ snaphot), calendar, browser history

• “Are my insurance reimbursements up to date?”

Calendar, insurance account, bank account

Amélie Marian - Rutgers University

Page 6: Remembrance of data past

We also remember data from our social network• “Mohan posted this interesting article on CS

education on Facebook, or maybe on Twitter, or maybe it was Moshe Vardi who posted it”

Facebook, Twitter, browser history

• “What are the books my friends recommended”

Facebook (and comments), Twitter, emails

• “What are the place in Maui that my friends enjoyed”

Facebook, Twitter, emails, Foursquare

Amélie Marian - Rutgers University

Page 7: Remembrance of data past

Data dimensions

• Follow natural interrogative words:

• what? (content)

• who? (with whom, from whom, to whom,...)

• where? (physical or logical, in the real-world and in the system)

• when? (time and date, but also what was happening concurrently, before and after)

• why? (sequence of data/events that are connected)

• how? (application, author, environment). Amélie Marian - Rutgers University

Page 8: Remembrance of data past

What is an answer?

• Content

• Email

• File

• Link

• List of objects (insurance reimbursements)

• But also part of the context

• Location

• Meeting participants

• Time

Amélie Marian - Rutgers University

Page 9: Remembrance of data past

Personal Data Context

• Explicit• Metadata information stored by the file system or

application, e.g., timestamp, GPS location, tags, directory structure.

• Implicit• Identified through application-based semantic

information, e.g., email recipients, calendar meeting participants, check-in location

• Inferred• Knowledge about the environment of the data collection.

• System environment (Which applications/documents were opened concurrently with a given document)

• Social environment (Which Facebook members had access to an event)

• Real world environment (Who was physically in the room –RFID tags, skype –, weather).

Amélie Marian - Rutgers University

Page 10: Remembrance of data past

Challenges

• Indexing content and context• Semantic analysis for extracted context

• Data integration

• Identify inferred context• Store and index as it is produced (system environment)

• Use API calls on-demand or copy information (social and real-world environment)

• Unified data model• Content and structure

• Data in context

• Navigation

Amélie Marian - Rutgers University

Page 11: Remembrance of data past

Challenges (2)

• Powerful data tools• Access and query (possibly remote) sources

• Search based on content and contextual clues• Approximate matching

• Explore data to get relevant information

• Discover new relevant information• “It’s been six month, you need to make a dentist

appointment!”

• “You forgot to pay the home insurance bill!”

• “Last time you bought toothpaste was a month ago, you are probably running out.”

Amélie Marian - Rutgers University

Page 12: Remembrance of data past

Previous results:Unified Structure, Content, and Metadata Search

• Data and query models that unify content and structure along one dimension

• System metadata seen as a separate dimension• A unified multi-dimensional scoring mechanism

• IDF-based scores for each dimension• Individual dimension scores easily combined• TF scores to break ties

• Query processing algorithms and index structures to score and rank answers efficiently

EDBT’08ICDE’08 (demo)EDBT’11TKDE’12with Wei Wang, Chris Peery, andThu D. Nguyen

Amélie Marian - Rutgers University

Page 13: Remembrance of data past

Unified Structure and ContentTarget file: Halloween party pictures taken at home where someone

wears a witch costume

File

Boundaryroot

“Halloween” “witch”

Home

//Home*.//“Halloween” and .//“witch”+

Amélie Marian - Rutgers University

Page 14: Remembrance of data past

Unified IDF ScoreFor a unified data tree T, a path query PQ, and a file

F, we define:

• IDF Score

where N is total number of files, and is the set of files that match PQ in T.

N

PQTmatches

N

PQscoreidf

log

),(log

)(

),( PQTmatches

Amélie Marian - Rutgers University

Page 15: Remembrance of data past

Case Study

Target file: Electronic version of the novel SeaWolf by Jack London

Content and filtering QueryKeywords: sea, wolf, jack, londonDirectory: /JackLondon/Ebooks

Target file does not appear in result

Approximate QueryKeywords: sea, wolf, jack, londonDirectory: /JackLondon/Ebooks

Target file atRank 3

Date: 26 Feb 07File Extension: .txtDirectory:Personal/Ebook/Novel/JackLondon

Content and filtering QueryKeywords: sea, wolf, jack, londonDate:19 Feb 07; type: pdfDirectory: /JackLondon/Ebooks

Target file does not appear in result

Approximate QueryKeywords: sea, wolf, jack, londonDate: 19 Feb 07; type: pdfDirectory: /JackLondon/Ebooks

Target file atRank 2

Amélie Marian - Rutgers University

Page 16: Remembrance of data past

Conclusions

• First step towards an automated Personal Data Assistant• Looks at data and its context

• Gathers personal data from remote sources• Cloud applications, social networks, emails, phone

logs, financial accounts, friends public data,…

• Integrates data in a unified data model• Based on natural questions

• Provide search and discovery capabilities• Beyond keyword search

• Context-aware

Funded by a Google Research AwardAmélie Marian - Rutgers University

Page 17: Remembrance of data past

Ushi Wakamaru!(that’s the restaurant)

Amélie Marian - Rutgers University