Speaker: Nattiya Kanhabua
L3S Research Center / University of Hannover
Concise Preservation by combining
Managed Forgetting and
Contextualized Remembering Research Talk, May 9, 2014
University of Twente, Enschede
An interdisciplinary team of experts in:
• Preservation, information management, information extraction
• Multimedia analysis, storage computing, cognitive psychology
ForgetIT Project Consortium
Overview of the ForgetIT project
• Motivation
• Example use cases
Work Package 3: Managed forgetting
• Objective
• Achievements in Year 1
Outline
However, we are facing:
• Dramatic increase in content creation (e.g. digital photos)
• Increasing use of mobile devices with restricted capacity
• Information overload and changing professional + private lives
• Inadvertent forgetting in lack of systematic preservation
Forgetting plays a crucial role for human remembering and life
(focus, stress on important information, forgetting of details)
A Computer that forgets ?
Intentionally ??
And in context of preservation???
Shouldn't there be something like
forgetting in digital memories as well?
Forget IT
Motivation
major progress in preservation technology
maturing Information extraction technology
storage as service (e.g. clouds)
Opportunities increasing amount of digital content handled over decades
more or less systematic backup strategies used
non-paper practices for long-term perspective required
Needs
large gap for adoption
high-up front cost
no established practices
lack of understanding of benefit
reluctance to invest
Major Obstacles
Vision: Building a Bridge
major progress in preservation technology
maturing information extraction technology
storage as service (e.g. clouds)
Opportunities increasing amount of
digital content handled over decades
more or less systematic backup strategies used
non-paper practices for long-term perspective required
Needs
ForgetIT
Enabling
smooth
transition to
preservation
Creating
immediate
benefit +
reducing effort
Opening
alternatives to
“keep it all” and
“forgetting by
accident”
Easing
interpretation
in the long run
taking inspiration
from and
complementing
human memory
large gap for adoption
high-up front cost
no established practices
lack of understanding of benefit
reluctance to invest
Major Obstacles
Building the Bridge
Managed Forgetting
Synergetic Preservation
Contextualized Remembering
• bringing back information
into active use in a
meaningful way
• as opposed to the current
“forgetting by accident”
• inspired by human
forgetting
• couples information
management and
preservation management
• High
awareness
of trip details
• Showing of
pictures
• Sorting out
redundant
pictures
• Sub-
grouping
and sorting
Simple Example: Holidays
+20 Years +5-10 Years +1 Years after trip +1 month
• Trip to
Paris with
Friends
• Thousands
of pictures
• Life goes on
• Pictures go
out of focus
• Creation of a
small
diverse
subset for
showing
occasionally
• Creation of
summary
page
• Addition of
context info
• Further
reduction of
redundancy
• Rest of
pictures into
archive
February 2015
Paris
Team: Me, Mary
Christine, Tom
• Changes in
life (e.g.
marriage)
• Addition/
update of
context
information
• Dealing
with
preservatio
n issues
girlfriend
• High
awareness
of trip details
• Showing of
pictures
• Sorting out
redundant
pictures
• Sub-
grouping
and sorting
Simple Example: Holidays
+20 Years +5-10 Years +1 Years after trip +1 month
• Trip to
Paris with
Friends
• Thousands
of pictures
• Life goes on
• Pictures go
out of focus
• Creation of a
small
diverse
subset for
showing
occasionally
• Creation of
summary
page
• Addition of
context info
• Further
reduction of
redundancy
• Rest of
pictures into
archive
February 2015
Paris
Team: Me, Mary
Christine, Tom
• Changes in
life (e.g.
marriage)
• Addition/
update of
context
information
• Dealing
with
preservatio
n issues
girlfriend Girlfriend
wife
• High
awareness
of trip details
• Showing of
pictures
• Sorting out
redundant
pictures
• Sub-
grouping
and sorting
Simple Example: Holidays
+20 Years +5-10 Years +1 Years after trip +1 month
• Trip to
Paris with
Friends
• Thousands
of pictures
• Life goes on
• Pictures go
out of focus
• Creation of a
small
diverse
subset for
showing
occasionally
• Creation of
summary
page
• Addition of
context info
• Further
reduction of
redundancy
• Rest of
pictures into
archive
February 2015
Paris
Team: Me, Mary
Christine, Tom
• Changes in
life (e.g.
marriage)
• Addition/
update of
context
information
• Dealing
with
preservatio
n issues
girlfriend Girlfriend
wife
• Revisiting
of Photo of
trip photos
• Re-
integration
into overall
photo
collection
(link into
context)
Managed Forgetting
Inspired by central role of human forgetting:
• help in identifying and focus on relevant information
• support preservation content selection
• replace inadvertent forgetting
Based on:
• Careful information value assessment
• Forgetting strategies via policies
• Forgetting options to integrate final manual checking
before deletion
• Combination with multi-tier storage solution
possible
Managed forgetting ≠ automatic deletion Instead: range of forgetting options e.g. • resource condensation
• change of indexing & ranking
• reduction of redundancy
decreasing
memory
buoyancy
Use of
tiers
Contextualized Remembering
Aim:
Bring back information into active use in a meaningful way even if a lot of
time has passed
Aim for semantic level of preservation
Based on:
Take into account relevant parts of context when moving to archive
Increase contextualization of preserved content
Consider context evolution over time (evolution-aware contextualization)
A. Ceroni, N. K. Tran, N. Kanhabua and C. Niederée, Bridging Temporal Context Gaps using
Time-Aware Re-Contextualization, (To appear) SIGIR’2014
Evolution-aware Contextualization & Re-contextualization
Context of
Interpretation
t
C C‘
Archival Information
System
Pres(D‘)
Pres(C‘)
Information
System
Human Forgetting
Change in focus
Structural changes
C‘‘
Evolution-aware
Contextualization
Re-contextualization
Pres(D‘)
Pres(C‘‘)
Semantic evolution
Structural evolution
Terminology evolution
Pres(D‘)
Pres(C‘‘)
D
Contextualization
C‘‘‘
D
Context-aware
Preservation
Semantic Evolution
Detection
D D
Work Package 3: Managed Forgetting
V. Mayer-Schönberger. Delete - The Virtue of Forgetting
in the Digital Age. Morgan Kaufmann Publishers, 2009.
WP3 Objectives
• Conceptual model for managed forgetting Foundations of human-brain inspired managed forgetting
• Development of managed forgetting methods Information value assessment
Set of methods for Preserve-or-Forget
Policy-driven approach to managed forgetting (Y2)
Focus of Year 1
• Conceptual model for managed forgetting
• Design and implement the core managed forgetting process
• Exploratory research of information value assessment
Objectives of WP3 and Year 1 Focus
Role in Preserve-or-Forget Architecture
Research questions and first ideas for complementing human memory
(co-worked with WP2, D3.1) • Episodic memory: reconstruct lifetime memories and support reminiscence
• Working memory: better focus in current information use
Information value assessment (co-worked with WP9, D3.2)
• Data model and a computation method based on Semantic Web technologies
• Integration to PIMO semantic desktop and Preserve-or-Forget middleware
Exploratory studies (D3.2)
• Analyzing collective memory of public events in Wikipedia
• Analyzing high-impact features for content retention in the Social Web
• Feature selection for efficiency and scalability
Achievements in Year 1
Goal: understand how to complement human memory processes
Focus on two types of memories:
• Episodic memory: support reminiscence of long-term autobiographical events
• Working memory: better focus in current information use, e.g. de-cluttering
personal information spaces
Two information values: memory buoyancy, and preservation value
Complementing Human Memory: Our First Ideas
Memory buoyancy
• Information objects sinking down with decreasing importance, usage, etc.
Preservation value
• Used to decide which information object will be preserved or archived
Information Value Assessment
Memory Buoyancy Preservation Value
Short-/Mid-term current interests
E.g. meeting or travel documents
Long-term need for future use
E.g. important life events
Subjective metrics
+ usage logs (views, edits, modifies)
+ time, e.g., aging or recency
+ social context, external influences
Objective metrics
+ diversity, coverage, quality
Rapidly forget details -> “less redundancy”
Reconstruct from similar events, context
Rely on common patterns -> “false memory”
Our first ideas:
• Store details differing among similar event types forgotten in human memory
• Event-centric organization of digital items can play an important role
Forgetting in Episodic Memory
Memory bumps or peaks in the forgetting curve
Reminded or triggered the original memory by:
• A physical object (e.g. a printed photo)
• A digital memory system
• Different subsequent events
Our ideas:
• Propagate increased interest in an event to related events
• Consider common things, e.g., same entities, or similar event types
• Increase relevance level or use of memory buoyancy
Triggering of Memories
Analyzing Collective Memory in Wikipedia
Identify catalysts for reviving memories
Analyze re-visiting behaviors
• Page views of a large set of events
• Time series analysis
11 Wikipedia categories
• Number of triggering events
• Number of events possibly triggered
Temporal and spatial distributions
• Strong focus on more recent events
• Better coverage with increasing popularity
• Most frequent locations depending on event types
Temporal and Spatial Distributions
Our Approach and Results
Remembering score as a function (e.g., detecting co-peaks in views) of revisiting behavior
Correlate remembering scores vs. time and location similarities
Hurricane Sandy Findings:
• Hurricane Sandy triggers 1991 Perfect Storm
initially formed around Canada area, which is
high impact (most destructive and costly) ones
• 2011 Christchurch earthquake triggers recent
events in the same region, i.e., 2010 Canterbury
earthquake
Our Approach and Results
Remembering score as a function (e.g., detecting co-peaks in views) of revisiting behavior
Correlate remembering scores vs. time and location similarities
Hurricane Sandy 2011 Christchurch earthquake Findings:
• Hurricane Sandy triggers 1991 Perfect Storm
initially formed around Canada area, which is
high impact (most destructive and costly) ones
• 2011 Christchurch earthquake triggers recent
events in the same region, i.e., 2010 Canterbury
earthquake
Memory Buoyancy: Simplified Computation
Me
mo
ry B
uo
ya
nc
y
Time
Compute: MB(D, t)
Time
Ac
ce
ss
Lo
gs
t1 t2
Memory Buoyancy: Simplified Computation
Me
mo
ry B
uo
ya
nc
y
Time
Compute: MB(D, t)
Time
Ac
ce
ss
Lo
gs
t1 t2
Memory Buoyancy: Simplified Computation
Me
mo
ry B
uo
ya
nc
y
Time
Compute: MB(D, t)
Time
Ac
ce
ss
Lo
gs
t1 t2
Proposed MB assessment framework:
• Initialize MB values of resources
using a time-decay forgetting function:
• Incrementally update MB using
Random Walk on resource graph:
Memory Buoyancy Assessment
|'|)( )( ttt DecayRatermb
r
e2
Edfringe photo (2011)
Photos @ iPhone
e3
Folder @ computer
e1
Shortcut folder @ desktop
e4 e6
Photo @ ForgetIT Meeting (2013)
contains
contains
contains
hasSamePlace
hasSamePlace
e5 hasEntity
Whiskey photo (2012)
2
)(
1
)(
2
1)( 4
)(6
)()1( embemb
rmbt
Dasht
DashtDash
Averaged value over
two inlinked resources
Less propagation
account for two outlinks
hasSamePlace
e5
Whiskey Tour (2009)
hasSamePlace
Social Web apps gain popularity
Personal Web archives
Study: Identifying memorable content • 20 participants, 15 male and 5 female
• Rate (3,330) posts by relevance for future
Content Retention in Social Web Applications
Year in Review: photo from the Internet
Machine learning techniques
• Support vector machine, Bayesian network, and decision tree (J48)
80 features from categories:
• Content types + meta data
• Social interactions
• Temporal
• Privacy
• Graph
Correlation-based feature selection (CFS) • Temporal: highest impact features
• Graph: low impact for memorable posts
Learning to Classify Memorable Content
Classification results: • Baseline Features (CS): No. of likes, comments, and shares
• Baseline 69% (F-Measure)
• Top 9 features 79% (F-Measure)
Classification Results
Classification results: • Baseline Features (CS): No. of likes, comments, and shares
• Baseline 69% (F-Measure)
• Top 9 features 79% (F-Measure)
Classification Results
1. M. Georgescu, D. D. Pham, N. Kanhabua, S. Zerr, S. Siersdorfer and W. Nejdl, Temporal Summarization of
Event-Related Updates in Wikipedia (demo), Proceedings of the 22nd International World Wide Web Conference
(WWW'13), May, 2013.
2. M. Georgescu, N. Kanhabua, D. Krause, W. Nejdl and S. Siersdorfer, Extracting Event-Related Information from
Article Updates in Wikipedia, Proceedings of the 35th European conference on Advances in Information Retrieval
(ECIR'13), March, 2013.
3. N. Kanhabua and C. Niederée, Preservation and Forgetting: Friends or Foes?, In the First International
Workshop on Archiving Community Memories (in conjunction with iPRES'2013), September, 2013.
4. N. Kanhabua, C. Niederée and W. Siberski, Towards Concise Preservation by Managed Forgetting: Research
Issues and Case Study, Proceedings of the 10th International Conference on Preservation of Digital Objects
(iPRES'2013), September, 2013.
5. K. D. Naini and I.S. Altingovde, Exploiting Result Diversification Methods for Feature Selection in Learning to
Rank, Proceedings of the 36th European conference on Advances in Information Retrieval (ECIR'2014), April, 2014.
6. A. Ceroni and M. Fisichella, Towards an Entity-based Automatic Event Validation, Proceedings of the 36th
European conference on Advances in Information Retrieval (ECIR'2014), April, 2014.
7. T. N. Nguyen and N. Kanhabua, Leveraging Dynamic Query Subtopics for Time-aware Search Result
Diversification, Proceedings of the 36th European conference on Advances in Information Retrieval (ECIR'2014),
April, 2014.
8. K. D. Naini, R. Kawase, N. Kanhabua and C. Niederée, Characterizing High-impact Features for Content
Retention in Social Web Applications (poster), Proceedings of the 23rd International World Wide Web
Conference (WWW'2014), Seoul, Korea, April, 2014.
9. T. A. Tran, M. Georgescu, X. Zhu and N. Kanhabua, Ars longa, vita brevis: Analysing the Duration of Trending
Topics in Twitter Using Wikipedia (poster), (To appear) Proceedings of the ACM Web Science 2014 Conference
(WebSci'2014), Bloomington, USA, June, 2014.
Publications
Thank you for your attention!