mining event histories: some new insights on personal swiss...

145
Mining Event Histories Mining Event Histories: Some New Insights on Personal Swiss Life Courses Gilbert Ritschard Dept of Econometrics and Laboratory of Demography, University of Geneva http://mephisto.unige.ch PaVie Seminar, Lausanne, October 22, 2008 19/11/2008gr 1/95

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event Histories

Mining Event Histories:Some New Insights on Personal Swiss Life Courses

Gilbert Ritschard

Dept of Econometrics and Laboratory of Demography, University of Genevahttp://mephisto.unige.ch

PaVie Seminar, Lausanne, October 22, 2008

19/11/2008gr 1/95

Page 2: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event Histories

My talk is about life courses,So, let me start with an example of scientific life course

date event1970-1979 Studies in econometrics1980-1992 Mathematical Economics1985-... Work with Social scientists (Family studies)

Interest in Statistics for social sciences1990-1995 Interest in Neural Networks2000-... KDD and data mining (Clustering, supervised learning)2003-... Work with historians, demographers, psychologists

(longitudinal data)2005-... KDD and Data mining approaches

for analysing life course data2007-... Start a SNF project on “Mining Event Histories”

19/11/2008gr 2/95

Page 3: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event Histories

My talk is about life courses,So, let me start with an example of scientific life course

date event1970-1979 Studies in econometrics1980-1992 Mathematical Economics1985-... Work with Social scientists (Family studies)

Interest in Statistics for social sciences1990-1995 Interest in Neural Networks2000-... KDD and data mining (Clustering, supervised learning)2003-... Work with historians, demographers, psychologists

(longitudinal data)2005-... KDD and Data mining approaches

for analysing life course data2007-... Start a SNF project on “Mining Event Histories”

19/11/2008gr 2/95

Page 4: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event Histories

My talk is about life courses,So, let me start with an example of scientific life course

date event1970-1979 Studies in econometrics1980-1992 Mathematical Economics1985-... Work with Social scientists (Family studies)

Interest in Statistics for social sciences1990-1995 Interest in Neural Networks2000-... KDD and data mining (Clustering, supervised learning)2003-... Work with historians, demographers, psychologists

(longitudinal data)2005-... KDD and Data mining approaches

for analysing life course data2007-... Start a SNF project on “Mining Event Histories”

19/11/2008gr 2/95

Page 5: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event Histories

Outline

1 Sequence Analysis in Social Sciences

2 Survival Trees

3 Characterizing, rendering and clustering sequence data

4 Mining Frequent Episodes

19/11/2008gr 3/95

Page 6: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social Sciences

Table of content

1 Sequence Analysis in Social Sciences

2 Survival Trees

3 Characterizing, rendering and clustering sequence data

4 Mining Frequent Episodes

19/11/2008gr 4/95

Page 7: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMotivation

Section content

1 Sequence Analysis in Social SciencesMotivationWhat kind of data?Issues with life course dataMethods for Longitudinal Data

19/11/2008gr 5/95

Page 8: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMotivation

Motivation

Individual life course paradigm.Following macro quantities (e.g. #divorces, fertility rate, meaneducation level, ...) over timeinsufficient for understanding social behavior.Need to follow individual life courses.

Data availabilityLarge panel surveys in many countries(SHP, CHER, SILC, GGP, ...)Biographical retrospective surveys (FFS, ...).Statistical matching of censuses, population registers and otheradministrative data.

19/11/2008gr 6/95

Page 9: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMotivation

Motivation

Individual life course paradigm.Following macro quantities (e.g. #divorces, fertility rate, meaneducation level, ...) over timeinsufficient for understanding social behavior.Need to follow individual life courses.

Data availabilityLarge panel surveys in many countries(SHP, CHER, SILC, GGP, ...)Biographical retrospective surveys (FFS, ...).Statistical matching of censuses, population registers and otheradministrative data.

19/11/2008gr 6/95

Page 10: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMotivation

Motivation

Need for suited methods for discovering interesting knowledgefrom these individual longitudinal data.Social scientists use

Essentially Survival analysis (Event History Analysis)More rarely sequential data analysis (Optimal Matching,Markov Chain Models)

Could social scientists benefit from data-mining approaches?Which methods?Are there specific issues with those methods for socialscientists?

19/11/2008gr 7/95

Page 11: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMotivation

Motivation

Need for suited methods for discovering interesting knowledgefrom these individual longitudinal data.Social scientists use

Essentially Survival analysis (Event History Analysis)More rarely sequential data analysis (Optimal Matching,Markov Chain Models)

Could social scientists benefit from data-mining approaches?Which methods?Are there specific issues with those methods for socialscientists?

19/11/2008gr 7/95

Page 12: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMotivation

Motivation

Need for suited methods for discovering interesting knowledgefrom these individual longitudinal data.Social scientists use

Essentially Survival analysis (Event History Analysis)More rarely sequential data analysis (Optimal Matching,Markov Chain Models)

Could social scientists benefit from data-mining approaches?Which methods?Are there specific issues with those methods for socialscientists?

19/11/2008gr 7/95

Page 13: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMotivation

Motivation: KD in Social sciences

In KDD (Knowledge discovery in databases) and data mining,focus on prediction and classification.Improve prediction and classification errors.

In Social science, aim is understanding/explaining (social)behaviors.Hence focus is on process rather than output.

19/11/2008gr 8/95

Page 14: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMotivation

Motivation: KD in Social sciences

In KDD (Knowledge discovery in databases) and data mining,focus on prediction and classification.Improve prediction and classification errors.

In Social science, aim is understanding/explaining (social)behaviors.Hence focus is on process rather than output.

19/11/2008gr 8/95

Page 15: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Section content

1 Sequence Analysis in Social SciencesMotivationWhat kind of data?Issues with life course dataMethods for Longitudinal Data

19/11/2008gr 9/95

Page 16: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

What kind of data?

What kind of data are we dealing with?Mainly categorical longitudinal data describing life courses

Data can be in different forms ...

19/11/2008gr 10/95

Page 17: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

What kind of data?

What kind of data are we dealing with?Mainly categorical longitudinal data describing life courses

Data can be in different forms ...

19/11/2008gr 10/95

Page 18: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

What kind of data?

What kind of data are we dealing with?Mainly categorical longitudinal data describing life courses

Data can be in different forms ...

19/11/2008gr 10/95

Page 19: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

ontology of longitudinal data (Aristotelean tree)

Longitudinal data

States

one state per time unit t

not

several states at each t

not

not

Events

time stamped events

notevent sequence

not

notspell duration

not

19/11/2008gr 11/95

Page 20: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Alternative views of Individual Longitudinal Data

Table: Time stamped events, record for Sandra

ending secondary school in 1970 first job in 1971 marriage in 1973

Table: State sequence view, Sandra

year 1969 1970 1971 1972 1973civil status single single single single marriededucation level primary secondary secondary secondary secondaryjob no no first first first

19/11/2008gr 12/95

Page 21: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Alternative views of Individual Longitudinal Data

Table: Time stamped events, record for Sandra

ending secondary school in 1970 first job in 1971 marriage in 1973

Table: State sequence view, Sandra

year 1969 1970 1971 1972 1973civil status single single single single marriededucation level primary secondary secondary secondary secondaryjob no no first first first

19/11/2008gr 12/95

Page 22: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Transforming time stamped events into state sequencesExample: the “BioFam” data

Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.

19/11/2008gr 13/95

Page 23: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Transforming time stamped events into state sequencesExample: the “BioFam” data

Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.

19/11/2008gr 13/95

Page 24: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Transforming time stamped events into state sequencesExample: the “BioFam” data

Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.

19/11/2008gr 13/95

Page 25: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Transforming time stamped events into state sequencesExample: the “BioFam” data

Data from the retrospective survey conducted in 2002 by theSwiss Household Panel (SHP)(with support of Federal Statistical Office, Swiss NationalFund for Scientific Research, University of Neuchatel.)Retrospective survey: 5560 individualsRetained familial life events: Leaving Home, First childbirth,First marriage and First divorce.Age 15 to 45 → 2601 remaining individuals, born between1909 et 1957.

19/11/2008gr 13/95

Page 26: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Creating state sequences

Example of time stamped data:individual LHome marriage childbirth divorce

1 1989 1990 1992 NA

19/11/2008gr 14/95

Page 27: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

Deriving the states

Need one state for each combination of events:

LHome marriage childbirth divorce0 no no no no1 yes no no no2 no yes yes/no no3 yes yes no no4 no no yes no5 yes no yes no6 yes yes yes no7 yes/no yes yes/no yes

19/11/2008gr 15/95

Page 28: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesWhat kind of data?

From events to states

Example of transformation :events:

individual LHome marriage childbirth divorce1 1989 1990 1992 NA

states:

individual ... 1988 1989 1990 1991 1992 1993 ...1 ... 0 0 1 3 3 6 ...

19/11/2008gr 16/95

Page 29: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data

Section content

1 Sequence Analysis in Social SciencesMotivationWhat kind of data?Issues with life course dataMethods for Longitudinal Data

19/11/2008gr 17/95

Page 30: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data

Issues with life course data

Incomplete sequencesCensored and truncated data:Cases falling out of observation before experiencing an event ofinterest.Sequences of varying length.

Time varying predictors.Example: When analysing time to divorce, presence of childrenis a time varying predictor.

Data collected by clustersExample: Household panel surveys.Multi-level analysis to account for unobserved sharedcharacteristics of members of a same cluster.

19/11/2008gr 18/95

Page 31: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data

Issues with life course data

Incomplete sequencesCensored and truncated data:Cases falling out of observation before experiencing an event ofinterest.Sequences of varying length.

Time varying predictors.Example: When analysing time to divorce, presence of childrenis a time varying predictor.

Data collected by clustersExample: Household panel surveys.Multi-level analysis to account for unobserved sharedcharacteristics of members of a same cluster.

19/11/2008gr 18/95

Page 32: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data

Issues with life course data

Incomplete sequencesCensored and truncated data:Cases falling out of observation before experiencing an event ofinterest.Sequences of varying length.

Time varying predictors.Example: When analysing time to divorce, presence of childrenis a time varying predictor.

Data collected by clustersExample: Household panel surveys.Multi-level analysis to account for unobserved sharedcharacteristics of members of a same cluster.

19/11/2008gr 18/95

Page 33: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesIssues with life course data

Multi-level: Simple linear regression example

y = 3.2 + 0.2 x

y = 6.2 - 0.8 x

y = 15.6 - 0.8 x

y = 12.5 - 0.8 x

0

1

2

3

4

5

6

7

8

9

1 3 5 7 9 11 13 15

Education

Chi

ldre

n

19/11/2008gr 19/95

Page 34: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data

Section content

1 Sequence Analysis in Social SciencesMotivationWhat kind of data?Issues with life course dataMethods for Longitudinal Data

19/11/2008gr 20/95

Page 35: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data

Classical statistical approachesSurvival Approaches

Survival or Event history analysis (Blossfeld and Rohwer, 2002)Focuses on one event.Concerned with duration until event occursor with hazard of experiencing event.

Survival curves: Distribution of duration until event occurs

S(t) = p(T ≥ t) .

Hazard models: Regression like models for S(t, x) or hazardh(t) = p(T = t | T ≥ t)

h(t, x) = g(t, β0 + β1x1 + β2x2(t) + · · ·

).

19/11/2008gr 21/95

Page 36: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data

Classical statistical approachesSurvival Approaches

Survival or Event history analysis (Blossfeld and Rohwer, 2002)Focuses on one event.Concerned with duration until event occursor with hazard of experiencing event.

Survival curves: Distribution of duration until event occurs

S(t) = p(T ≥ t) .

Hazard models: Regression like models for S(t, x) or hazardh(t) = p(T = t | T ≥ t)

h(t, x) = g(t, β0 + β1x1 + β2x2(t) + · · ·

).

19/11/2008gr 21/95

Page 37: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data

Survival curves (Switzerland, SHP 2002 biographical survey)

Women

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80

AGE (years)

Surv

ival

pro

babi

lity

Leaving home Marriage 1st Chilbirth Parents' deathLast child left Divorce Widowing19/11/2008gr 22/95

Page 38: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data

Analysis of sequences

Frequencies of given subsequencesEssentially event sequences, e.g. (First job → Marriage).Subsequences considered as categories ⇒ Methods forcategorical data apply (Frequencies, cross tables, log-linearmodels, logistic regression, ...).

Markov chain modelsState sequences.Focuses on transition rates between states.Does the rate also depend on previous states?How many previous states are significant?

Optimal Matching (Abbott and Forrest, 1986) .State sequences.Edit distance (Levenshtein, 1966; Needleman and Wunsch,1970) between pairs of sequences.Clustering of sequences.

19/11/2008gr 23/95

Page 39: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSequence Analysis in Social SciencesMethods for Longitudinal Data

Typology of methods for life course data

IssuesQuestions duration/hazard state/event sequencingdescriptive • Survival curves: • Frequencies of given

Parametric patterns(Weibull, Gompertz, ...) • Optimal matching

and non parametric clustering, MDS(Kaplan-Meier, Nelson- • Rendering sequencesAalen) estimators. • Discovering typical

episodescausality • Hazard regression models • Markov models

(Cox, ...) • Mobility trees• Survival trees • Discriminating episodes

• Sequence HeterogeneityAnalysis (Anova)

19/11/2008gr 24/95

Page 40: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival Trees

Table of content

1 Sequence Analysis in Social Sciences

2 Survival Trees

3 Characterizing, rendering and clustering sequence data

4 Mining Frequent Episodes

19/11/2008gr 25/95

Page 41: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data

Section content

2 Survival TreesMarriage survival, SHP biographical dataSurvival Tree PrincipleExampleSocial Science Issues

19/11/2008gr 26/95

Page 42: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data

SHP biographical retrospective surveyhttp://www.swisspanel.ch

SHP retrospective survey: 2001 (860) and 2002 (4700 cases).We consider only data collected in 2002.Data completed with variables from 2002 wave (language).

Characteristics of retained data for divorce(individuals who get married at least once)

men women TotalTotal 1414 1656 30701st marriage dissolution 231 308 539

16.3% 18.6% 17.6%

19/11/2008gr 27/95

Page 43: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data

SHP biographical retrospective surveyhttp://www.swisspanel.ch

SHP retrospective survey: 2001 (860) and 2002 (4700 cases).We consider only data collected in 2002.Data completed with variables from 2002 wave (language).

Characteristics of retained data for divorce(individuals who get married at least once)

men women TotalTotal 1414 1656 30701st marriage dissolution 231 308 539

16.3% 18.6% 17.6%

19/11/2008gr 27/95

Page 44: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data

Distribution by birth cohortBirth year

year

Fre

quen

cy

1910 1920 1930 1940 1950 1960

010

020

030

040

050

0

19/11/2008gr 28/95

Page 45: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data

Marriage duration until divorceSurvival curves

0 8

0.85

0.9

0.95

1

vie

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0 10 20 30 40

prob

. de

surv

Durée du mariage, Femmes

1942 et avant

1943-1952

1953 et après

0 8

0.85

0.9

0.95

1

vie

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0 10 20 30 40pr

ob. d

e su

rvDurée du mariage, Hommes

1942 et avant

1943-1952

1953 et après

0 8

0.85

0.9

0.95

1

vie

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0 10 20 30 40

prob

. de

surv

Durée du mariage, Femmes

1942 et avant

1943-1952

1953 et après

19/11/2008gr 29/95

Page 46: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesMarriage survival, SHP biographical data

Marriage duration until divorceHazard model

Discrete time model (logistic regression on person-year data)exp(B) gives the Odds Ratio, i.e. change in the odd h/(1− h)when covariate increased by 1 unit.

exp(B) Sig.birthyr 1.0088 0.002university 1.22 0.043child 0.73 0.000language unknwn 1.47 0.000

French 1.26 0.007German 1 refItalian 0.89 0.537

Constant 0.0000000004 0.000

19/11/2008gr 30/95

Page 47: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesSurvival Tree Principle

Section content

2 Survival TreesMarriage survival, SHP biographical dataSurvival Tree PrincipleExampleSocial Science Issues

19/11/2008gr 31/95

Page 48: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesSurvival Tree Principle

Survival trees: Principle

Target is survival curve or some other survival characteristic.Aim: Partition data set into groups thatdiffer as much as possible (max between class variability)

Example: Segal (1988) maximizes difference in KM survivalcurves by selecting split with smallest p-value of Tarone-WareChi-square statistics

TW =∑

i

wi

(di1 − E(Di )

)(w2

i var(Di ))1/2

are as homogeneous as possible (min within class variability)Example: Leblanc and Crowley (1992) maximize gain indeviance (-log-likelihood) of relative risk estimates.

19/11/2008gr 32/95

Page 49: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesSurvival Tree Principle

Survival trees: Principle

Target is survival curve or some other survival characteristic.Aim: Partition data set into groups thatdiffer as much as possible (max between class variability)

Example: Segal (1988) maximizes difference in KM survivalcurves by selecting split with smallest p-value of Tarone-WareChi-square statistics

TW =∑

i

wi

(di1 − E(Di )

)(w2

i var(Di ))1/2

are as homogeneous as possible (min within class variability)Example: Leblanc and Crowley (1992) maximize gain indeviance (-log-likelihood) of relative risk estimates.

19/11/2008gr 32/95

Page 50: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesSurvival Tree Principle

Survival trees: Principle

Target is survival curve or some other survival characteristic.Aim: Partition data set into groups thatdiffer as much as possible (max between class variability)

Example: Segal (1988) maximizes difference in KM survivalcurves by selecting split with smallest p-value of Tarone-WareChi-square statistics

TW =∑

i

wi

(di1 − E(Di )

)(w2

i var(Di ))1/2

are as homogeneous as possible (min within class variability)Example: Leblanc and Crowley (1992) maximize gain indeviance (-log-likelihood) of relative risk estimates.

19/11/2008gr 32/95

Page 51: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesExample

Section content

2 Survival TreesMarriage survival, SHP biographical dataSurvival Tree PrincipleExampleSocial Science Issues

19/11/2008gr 33/95

Page 52: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesExample

Divorce, Switzerland, Differences in KM Survival Curves I

Zoom

� � � � � � � � � � �

� � � � � � � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � � � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � �

� � � � � � � � � �

� � �

� � � � � � � � � � � �

� � � � � �� � � � � � � � � �

� � � � � � � � � � �

� � � � ! " " #$ � "

� � � � � � � � � � � �

� � � � � � �� � � � � � � � �

� � � � � � � � � �

% & � � � � � � � # � � � ' � � � # � � � �

� � � � � � � �

% & � � � � � � � � # � � ' � � � # � � � � % & � � � � � � � # � � � ' � � � # � � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � � �

� � � � � � � � � �

� � � � � � � � � � �

� � � � � � � � �� � � � � � � � � �

� � � � � � � � � �

� � � � � � � � ( � ) � *� � � � � � � � � �

� � � � � � � �

% & � � � � � � � # � � � � ' � � � # � � � �

� � � � � � � � � � �

� � � � � � � � �� � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � �

� � � � � � � � �

$ � "� �

� � � � � � � �

% & � � � � � � � # � � � ' � � � # � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

$ � "� �

� � � � � � � �

% & � � � � � � � # � � � � ' � � � # � � �

+ � + �

+

+ � + + � + �19/11/2008gr 34/95

Page 53: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesExample

Divorce, Switzerland, Differences in KM Survival Curves II

0 10 20 30 40

0.5

0.6

0.7

0.8

0.9

1.0

Cohort <=1940 & Non French Speaking & University

Cohort <=1940 & Non French Speaking & < University

Cohort <=1940 & French Speaking

Cohort > 1940 & No Child & University

Cohort > 1940 & No Child & < University

Cohort > 1940 & Child & German or Italian Speaking

Cohort > 1940 & Child & French or Unknown Speaking

19/11/2008gr 35/95

Page 54: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesExample

Divorce, Switzerland, Relative risk

� � � � � � �

� � � � � � � � � � � � � � �

� � � � � �

� � � � � � � � � � � � � �

� � � �

� � � � � � � � � �

� � � � �� � � � � � � � �

� � � � � � � � � � �

� � � � � � � � �� �

� � � � � � � � � �

� � � � � � � �

� � � � � � � � � � � � � � � � � � � � �

� � � � �

� � � � � � � � � � � � � �

� � � � � � � �

� � � � � � � � � � � � � �

� � � � � � �

� � � � � � � � � � � � � �

� � � � � � �

� � � � � � � � � � � � �

� � � � � � �

� � � � � � � � � � � � � �

19/11/2008gr 36/95

Page 55: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesExample

Hazard model with interaction

Adding interaction effects detected with the tree approachimproves significantly the fit (sig ∆χ2 = 0.004)

exp(B) Sig.born after 1940 1.78 0.000university 1.22 0.049child 0.94 0.619language unknwn 1.50 0.000

French 1.12 0.282German 1 refItalian 0.92 0.677

b_before_40*French 1.46 0.028b_after_40*child 0.68 0.010

Constant 0.008 0.00019/11/2008gr 37/95

Page 56: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesSocial Science Issues

Section content

2 Survival TreesMarriage survival, SHP biographical dataSurvival Tree PrincipleExampleSocial Science Issues

19/11/2008gr 38/95

Page 57: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesSocial Science Issues

Issues with survival trees in social sciences

1 Dealing with time varying predictorsSegal (1992) discusses few possibilities, none being reallysatisfactory.Huang et al. (1998) propose a piecewise constant approachsuitable for discrete variables and limited number of changes.Room for development ...

2 Multi-level analysisHow can we account for multi-level effects in survival trees,and more generally in trees?Conjecture: Should be possible to include unobserved sharedeffect in deviance-based splitting criteria.

19/11/2008gr 39/95

Page 58: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSurvival TreesSocial Science Issues

Issues with survival trees in social sciences

1 Dealing with time varying predictorsSegal (1992) discusses few possibilities, none being reallysatisfactory.Huang et al. (1998) propose a piecewise constant approachsuitable for discrete variables and limited number of changes.Room for development ...

2 Multi-level analysisHow can we account for multi-level effects in survival trees,and more generally in trees?Conjecture: Should be possible to include unobserved sharedeffect in deviance-based splitting criteria.

19/11/2008gr 39/95

Page 59: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence data

Table of content

1 Sequence Analysis in Social Sciences

2 Survival Trees

3 Characterizing, rendering and clustering sequence data

4 Mining Frequent Episodes

19/11/2008gr 40/95

Page 60: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories

Section content

3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences

EntropyLongitudinal within sequence entropies

Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences

19/11/2008gr 41/95

Page 61: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories

Sequence analysis

Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).

Rendering sequencesColorize your life courses

Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.

19/11/2008gr 42/95

Page 62: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories

Sequence analysis

Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).

Rendering sequencesColorize your life courses

Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.

19/11/2008gr 42/95

Page 63: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories

Sequence analysis

Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).

Rendering sequencesColorize your life courses

Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.

19/11/2008gr 42/95

Page 64: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories

Sequence analysis

Survival approaches not useful in a unitary (holistic)perspective of the whole life course.Sequence analysis of whole collection of life events bettersuited for such holistic approach (Billari, 2005).

Rendering sequencesColorize your life courses

Results from the analysis of the retrospective Swiss HouseholdPanel (SHP) survey.Focus on visualization of life course data.

19/11/2008gr 42/95

Page 65: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories

Evolution tendencies in familial life course trajectories

Sequence analysis techniques permit to test hypotheses aboutevolution in these familial life trajectories. (Elzinga and Liefbroer,2007):

De-standardization: Some states and events of familial life areshared by decreasing proportions of the population, occur atmore dispersed ages and their duration is also more scattered.De-institutionalization: Social and temporal organization oflife courses becomes less driven by normative, legal orinstitutional rules.Differentiation: Number of distinct steps lived by individualincreases.

19/11/2008gr 43/95

Page 66: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories

Evolution tendencies in familial life course trajectories

Sequence analysis techniques permit to test hypotheses aboutevolution in these familial life trajectories. (Elzinga and Liefbroer,2007):

De-standardization: Some states and events of familial life areshared by decreasing proportions of the population, occur atmore dispersed ages and their duration is also more scattered.De-institutionalization: Social and temporal organization oflife courses becomes less driven by normative, legal orinstitutional rules.Differentiation: Number of distinct steps lived by individualincreases.

19/11/2008gr 43/95

Page 67: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataLife trajectories

Evolution tendencies in familial life course trajectories

Sequence analysis techniques permit to test hypotheses aboutevolution in these familial life trajectories. (Elzinga and Liefbroer,2007):

De-standardization: Some states and events of familial life areshared by decreasing proportions of the population, occur atmore dispersed ages and their duration is also more scattered.De-institutionalization: Social and temporal organization oflife courses becomes less driven by normative, legal orinstitutional rules.Differentiation: Number of distinct steps lived by individualincreases.

19/11/2008gr 43/95

Page 68: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Section content

3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences

EntropyLongitudinal within sequence entropies

Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences

19/11/2008gr 44/95

Page 69: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Characterizing sets of sequences

Sequence of transversal measures (between entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·

Summary of longitudinal measures (sequence entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·

Other global characteristics: Central sequence, Sequencediversity, ...

19/11/2008gr 45/95

Page 70: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Characterizing sets of sequences

Sequence of transversal measures (between entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·

Summary of longitudinal measures (sequence entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·

Other global characteristics: Central sequence, Sequencediversity, ...

19/11/2008gr 45/95

Page 71: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Characterizing sets of sequences

Sequence of transversal measures (between entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·

Summary of longitudinal measures (sequence entropy, ...)id t1 t2 t3 · · ·1 B B D · · ·2 A B C · · ·3 B B A · · ·

Other global characteristics: Central sequence, Sequencediversity, ...

19/11/2008gr 45/95

Page 72: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Entropy

Entropy: Measure of uncertainty regarding state predictability.

pi , proportion of cases (or time points) in state i .Shannon h(p) =

∑i −pi log2(pi )

Other types of entropies: Quadratic (Gini), Daroczy, ...Two ways of using entropies.

(Transversal) entropy of the state at each time (age) point:Entropy increases with diversity of states observed at eachtime point (age).(Longitudinal) entropy of each individual sequences: Entropyincreases with diversity of states during the observed lifecourse and varies with the time spend in each state.

19/11/2008gr 46/95

Page 73: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Entropy

Entropy: Measure of uncertainty regarding state predictability.

pi , proportion of cases (or time points) in state i .Shannon h(p) =

∑i −pi log2(pi )

Other types of entropies: Quadratic (Gini), Daroczy, ...Two ways of using entropies.

(Transversal) entropy of the state at each time (age) point:Entropy increases with diversity of states observed at eachtime point (age).(Longitudinal) entropy of each individual sequences: Entropyincreases with diversity of states during the observed lifecourse and varies with the time spend in each state.

19/11/2008gr 46/95

Page 74: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Entropy

Entropy: Measure of uncertainty regarding state predictability.

pi , proportion of cases (or time points) in state i .Shannon h(p) =

∑i −pi log2(pi )

Other types of entropies: Quadratic (Gini), Daroczy, ...Two ways of using entropies.

(Transversal) entropy of the state at each time (age) point:Entropy increases with diversity of states observed at eachtime point (age).(Longitudinal) entropy of each individual sequences: Entropyincreases with diversity of states during the observed lifecourse and varies with the time spend in each state.

19/11/2008gr 46/95

Page 75: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Illustrative data

Data from the 2002 SHP biographical surveyInterested in relationship between

Cohabitational trajectories (10 states)Professional trajectories (8 states)

We use the coding retained by Gauthier (2007)Focus on ages 20 to 45 (sequence length = 26 years)1503 cases (751 women, 752 men)

19/11/2008gr 47/95

Page 76: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Transversal entropy at each time (age) point

Living Arrangement Trajectories

Age

Ent

ropy

A20 A23 A26 A29 A32 A35 A38 A41 A44

0.3

0.4

0.5

0.6

0.7

0.8

Professional Trajectories

Age

Ent

ropy

A20 A23 A26 A29 A32 A35 A38 A41 A44

0.3

0.4

0.5

0.6

0.7

0.8 1910−1940 1941−1950 1951−1957

19/11/2008gr 48/95

Page 77: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Transversal entropy at each time (age) point

Men : Living Arrangement Trajectories

Age

Ent

ropy

A20 A23 A26 A29 A32 A35 A38 A41 A44

0.3

0.4

0.5

0.6

0.7

0.8

Women : Living Arrangement Trajectories

Age

Ent

ropy

A20 A23 A26 A29 A32 A35 A38 A41 A44

0.3

0.4

0.5

0.6

0.7

0.8 1910−1940 1941−1950 1951−1957

19/11/2008gr 49/95

Page 78: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Transversal entropy at each time (age) point

Men: Professional Trajectories

Age

Ent

ropy

A20 A23 A26 A29 A32 A35 A38 A41 A44

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Women: Professional Trajectories

Age

Ent

ropy

A20 A23 A26 A29 A32 A35 A38 A41 A44

0.2

0.3

0.4

0.5

0.6

0.7

0.8 1910−1940 1941−1950 1951−1957

19/11/2008gr 50/95

Page 79: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Hypothesis about longitudinal entropies

Cohabitational and professional life trajectoriesbecome less stablemore diversified

Their entropy tends to increase for younger generations.Are increases in professional trajectories related to increases incohabitational trajectories?

19/11/2008gr 51/95

Page 80: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Hypothesis about longitudinal entropies

Cohabitational and professional life trajectoriesbecome less stablemore diversified

Their entropy tends to increase for younger generations.Are increases in professional trajectories related to increases incohabitational trajectories?

19/11/2008gr 51/95

Page 81: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Hypothesis about longitudinal entropies

Cohabitational and professional life trajectoriesbecome less stablemore diversified

Their entropy tends to increase for younger generations.Are increases in professional trajectories related to increases incohabitational trajectories?

19/11/2008gr 51/95

Page 82: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Entropy of cohabitational trajectories

●●●●●●●●●●●●●●

●●

●●●●●●●

1910−1940 1941−1950 1951−1957

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Men: Living Arrangment Trajectories

1910−1940 1941−1950 1951−19570.

00.

10.

20.

30.

40.

50.

60.

7

Women: Living Arrangment Trajectories

p(F > f ) = .000∗∗∗ p(F > f ) = .073∗all 2 by 2 differences significant coh3 significantly (.02) different from coh1

19/11/2008gr 52/95

Page 83: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Entropy of professional trajectories

●●

1910−1940 1941−1950 1951−1957

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Men: Professional Trajectories

1910−1940 1941−1950 1951−19570.

00.

10.

20.

30.

40.

50.

60.

7

Women: Professional Trajectories

p(F > f ) = .002∗∗∗ p(F > f ) = .001∗∗∗coh3 not significantly different from coh2

19/11/2008gr 53/95

Page 84: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataCharacterizing sets of sequences

Correlation between cohabitational and professional entropies

Overall Men Women1910-1940 0.08 ∗ 0.11 ∗ 0.19 ∗∗∗1941-1950 0.12 ∗∗ 0.14 ∗∗ 0.30 ∗∗∗1951-1957 0.15 ∗∗ 0.25 ∗∗∗ 0.31 ∗∗∗

19/11/2008gr 54/95

Page 85: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Section content

3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences

EntropyLongitudinal within sequence entropies

Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences

19/11/2008gr 55/95

Page 86: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Clustering, Multidimensional scaling and more

Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesAnalyse the trajectory heterogeneity (Generalized ANOVA)Make scatter plot representation of sets of sequences usingmultidimensional scaling.

19/11/2008gr 56/95

Page 87: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Clustering, Multidimensional scaling and more

Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesAnalyse the trajectory heterogeneity (Generalized ANOVA)Make scatter plot representation of sets of sequences usingmultidimensional scaling.

19/11/2008gr 56/95

Page 88: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Clustering, Multidimensional scaling and more

Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesAnalyse the trajectory heterogeneity (Generalized ANOVA)Make scatter plot representation of sets of sequences usingmultidimensional scaling.

19/11/2008gr 56/95

Page 89: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Clustering, Multidimensional scaling and more

Once you are able to compute 2 by 2 distances betweensequences you can among others:Cluster sequencesAnalyse the trajectory heterogeneity (Generalized ANOVA)Make scatter plot representation of sets of sequences usingmultidimensional scaling.

19/11/2008gr 56/95

Page 90: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Distances between sequences

Edit distance (known as Optimal matching in Social sciences)(Levenshtein, 1966; Needleman and Wunsch, 1970; Abbott andForrest, 1986)

d(x , y) Total cost of insert, deletion and substitution changesrequired to transform sequence x into y .Different solutions depending on indel and substitution costs.

Other metrics proposed by (Elzinga, 2008)LCP: Longest common prefix (also longest common postfix)LCS: Longest common subsequence(same as OM with indel cost = 1, and substitution cost = 2).NMS: Number of matching subsequences...

Elzinga (2008) proposes a nice formalization of these metrics.

19/11/2008gr 57/95

Page 91: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Distances between sequences

Edit distance (known as Optimal matching in Social sciences)(Levenshtein, 1966; Needleman and Wunsch, 1970; Abbott andForrest, 1986)

d(x , y) Total cost of insert, deletion and substitution changesrequired to transform sequence x into y .Different solutions depending on indel and substitution costs.

Other metrics proposed by (Elzinga, 2008)LCP: Longest common prefix (also longest common postfix)LCS: Longest common subsequence(same as OM with indel cost = 1, and substitution cost = 2).NMS: Number of matching subsequences...

Elzinga (2008) proposes a nice formalization of these metrics.

19/11/2008gr 57/95

Page 92: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Clustering with OM distances: Dendrograms[1

][1

92]

[290

][3

19]

[360

][1

197]

[133

3][1

412]

[103

6][5

04]

[532

][2

04]

[388

][1

472]

[128

0][3

43]

[40]

[633

][4

2][5

20]

[632

][3

57]

[833

][4

1][6

42]

[944

][1

046]

[112

3][6

73]

[257

][6

85]

[795

][1

068]

[107

4][3

18]

[508

][1

069]

[138

][1

161]

[306

][1

220]

[352

][5

43]

[517

][1

188]

[162

][8

19]

[135

9][6

37]

[409

][9

41]

[121

5][7

06]

[101

1][1

375]

[221

][9

86]

[255

][4

37]

[940

][5

76]

[126

9][4

55]

[139

1][7

60]

[136

4][5

52]

[227

][7

79]

[281

][3

27]

[144

5][3

26]

[270

][1

315]

[465

][1

494]

[743

][1

493]

[120

3][1

355]

[114

][1

022]

[102

3][1

23]

[588

][2

58]

[124

8][4

28]

[210

][1

387]

[415

][7

52]

[712

][9

97]

[292

][1

331]

[524

][5

05]

[957

][3

17]

[439

][9

29]

[955

][1

438]

[528

][1

039]

[122

2][2

73]

[492

][5

65]

[658

][5

42]

[560

][8

44]

[135

6][6

34]

[933

][1

214]

[104

0][1

071]

[145

0][3

24]

[132

6][4

12]

[109

3][7

92]

[15]

[175

][2

24]

[783

][1

304]

[561

][1

075]

[122

][1

283]

[924

][6

40]

[641

][7

26]

[101

2][1

303]

[143

3][1

063]

[998

][4

38]

[463

][9

70]

[105

2][1

330]

[143

6][1

446]

[137

1][3

1][6

7][4

88]

[782

][1

089]

[110

7][1

274]

[80]

[163

][3

40]

[556

][5

74]

[110

8][1

297]

[143

2][5

53]

[402

][7

89]

[519

][9

87]

[103

7][1

279]

[222

][3

07]

[499

][1

201]

[444

][8

18]

[456

][6

60]

[956

][9

64]

[995

][1

374]

[26]

[157

][8

11]

[104

7][5

5][7

9][6

6][8

13]

[468

][1

148]

[116

5][1

357]

[97]

[692

][1

135]

[101

7][1

485]

[269

][1

219]

[113

6][1

133]

[464

][5

30]

[127

1][4

71]

[765

][1

281]

[149

][1

81]

[784

][8

07]

[981

][1

358]

[639

][1

175]

[111

8][1

117]

[747

][7

64]

[133

2][2

40]

[500

][5

39]

[503

][1

256]

[123

4][1

496]

[132

7][1

2][1

8][4

5][4

6][5

6][6

3][9

6][1

32]

[144

][1

59]

[169

][1

80]

[189

][2

15]

[246

][2

63]

[264

][2

95]

[297

][3

58]

[429

][4

41]

[442

][4

57]

[467

][5

06]

[558

][5

62]

[568

][5

69]

[585

][5

95]

[596

][5

98]

[664

][7

05]

[717

][7

46]

[761

][7

72]

[775

][7

76]

[777

][8

55]

[863

][8

67]

[879

][8

97]

[950

][9

96]

[100

5][1

067]

[108

3][1

084]

[110

0][1

101]

[111

3][1

114]

[114

4][1

147]

[125

8][1

260]

[134

4][1

345]

[135

4][1

376]

[149

8][1

499]

[72]

[414

][1

353]

[59]

[938

][4

93]

[945

][5

59]

[850

][2

96]

[144

3][1

019]

[111

6][6

2][5

84]

[952

][2

17]

[251

][9

92]

[583

][6

50]

[675

][1

60]

[494

][1

079]

[250

][6

71]

[23]

[119

1][1

64]

[350

][8

37]

[140

5][1

406]

[141

][7

98]

[566

][9

37]

[103

8][1

259]

[165

][7

90]

[16]

[623

][1

225]

[425

][1

321]

[20]

[22]

[814

][1

264]

[225

][2

37]

[110

9][1

18]

[619

][1

018]

[118

5][9

31]

[216

][3

92]

[146

5][9

34]

[141

9][1

15]

[126

8][1

143]

[122

1][4

96]

[511

][8

62]

[184

][1

142]

[121

0][4

35]

[346

][6

69]

[138

0][5

18]

[115

4][6

36]

[114

9][6

97]

[899

][1

119]

[117

][3

05]

[405

][4

66]

[932

][9

36]

[228

][8

75]

[781

][1

45]

[202

][1

273]

[145

7][1

70]

[147

8][4

79]

[688

][1

322]

[155

][1

93]

[943

][1

207]

[226

][9

06]

[119

6][1

9][7

78]

[127

6][2

35]

[313

][6

96]

[243

][6

74]

[763

][8

96]

[921

][5

34]

[918

][2

38]

[834

][1

488]

[510

][8

68]

[128

4][1

313]

[145

4][8

02]

[766

][1

173]

[144

0][2

36]

[342

][7

33]

[799

][1

470]

[801

][2

84]

[331

][3

02]

[976

][1

213]

[720

][7

70]

[137

][5

36]

[117

9][6

01]

[100

9][1

307]

[294

][9

02]

[137

3][2

06]

[332

][1

430]

[578

][1

073]

[917

][1

302]

[131

1][3

0][1

20]

[690

][8

40]

[418

][4

58]

[128

7][1

177]

[232

][2

85]

[145

5][3

11]

[132

0][4

51]

[809

][6

83]

[124

7][1

79]

[947

][8

15]

[126

3][8

80]

[905

][1

21]

[140

3][2

86]

[140

8][7

08]

[554

][1

34]

[434

][6

57]

[762

][2

76]

[110

4][3

83]

[2]

[651

][6

52]

[102

][9

12]

[201

][2

75]

[391

][1

395]

[133

6][3

90]

[134

0][1

458]

[39]

[627

][1

155]

[820

][1

453]

[50]

[110

3][1

88]

[507

][1

98]

[86]

[711

][1

19]

[128

5][1

349]

[291

][8

00]

[448

][8

49]

[604

][1

126]

[148

9][6

0][4

00]

[404

][5

81]

[913

][4

13]

[871

][1

111]

[771

][8

66]

[119

0][5

79]

[139

7][1

398]

[38]

[985

][7

04]

[124

6][3

61]

[698

][7

10]

[124

5][1

97]

[616

][1

473]

[127

0][9

61]

[836

][8

9][2

47]

[804

][5

45]

[106

][1

267]

[105

][1

024]

[659

][1

085]

[207

][1

346]

[135

2][3

20]

[107

0][2

34]

[373

][3

74]

[573

][1

474]

[106

1][1

449]

[120

2][1

312]

[239

][6

26]

[299

][9

60]

[105

6][1

125]

[14]

[680

][8

89]

[885

][3

62]

[821

][5

3][3

96]

[575

][4

20]

[136

5][6

67]

[54]

[888

][5

72]

[145

6][1

54]

[729

][2

18]

[125

5][1

080]

[118

0][1

206]

[140

][4

19]

[486

][8

60]

[916

][7

3][1

102]

[701

][8

91]

[214

][1

416]

[848

][5

44]

[107

2][3

28]

[138

8][7

93]

[980

][1

360]

[602

][6

72]

[823

][5

93]

[756

][1

235]

[132

9][1

94]

[622

][5

16]

[681

][3

63]

[116

4][5

14]

[702

][7

85]

[138

1][8

12]

[141

3][1

157]

[407

][6

03]

[978

][9

94]

[835

][1

242]

[147

5][4

69]

[113

9][1

277]

[883

][6

87]

[144

2][5

1][1

01]

[85]

[287

][1

090]

[443

][9

04]

[325

][6

78]

[753

][1

409]

[146

8][1

492]

[112

][1

041]

[230

][1

350]

[769

][1

305]

[547

][1

097]

[132

3][9

90]

[108

8][1

128]

[101

3][1

059]

[136

8][1

029]

[131

][9

22]

[100

6][3

67]

[126

1][1

56]

[476

][2

78]

[886

][1

50]

[662

][3

66]

[314

][1

160]

[724

][4

78]

[830

][8

53]

[48]

[118

3][1

193]

[725

][1

288]

[293

][1

112]

[125

2][5

57]

[262

][1

053]

[120

8][5

02]

[768

][8

10]

[124

9][6

1][8

32]

[143

][1

168]

[546

][3

36]

[703

][2

41]

[427

][2

67]

[105

5][1

447]

[100

3][1

167]

[119

5][1

39]

[116

9][9

93]

[109

1][8

39]

[620

][6

35]

[213

][1

158]

[977

][1

134]

[107

6][1

239]

[136

1][1

53]

[629

][1

174]

[828

][8

87]

[100

4][1

166]

[454

][1

393]

[890

][9

35]

[132

8][2

77]

[645

][1

186]

[104

5][1

317]

[146

1][5

09]

[907

][9

08]

[563

][1

057]

[666

][1

184] [3]

[323

][6

53]

[654

][1

266]

[126

][1

394]

[354

][2

83]

[911

][1

015]

[134

1][5

26]

[138

5][3

6][1

29]

[523

][1

78]

[110

6][2

56]

[242

][8

45]

[846

][1

13]

[203

][7

74]

[322

][1

087]

[309

][1

316]

[694

][1

194]

[127

8][6

93]

[126

5][9

19]

[803

][2

74]

[738

][1

205]

[339

][5

40]

[894

][3

71]

[125

0] [7]

[424

][1

82]

[308

][1

490]

[44]

[282

][3

00]

[148

6][6

08]

[100

7][1

189]

[142

6][3

64]

[140

0][7

94]

[124

4][9

82]

[5]

[121

1][4

89]

[723

][1

044]

[112

9][1

427]

[329

][4

03]

[132

4][2

89]

[399

][4

30]

[408

][7

35]

[927

][8

70]

[105

4][1

121]

[127

2][1

290]

[35]

[580

][5

87]

[142

0][2

98]

[303

][3

33]

[341

][1

152]

[453

][1

423]

[621

][1

081]

[379

][7

00]

[452

][4

90]

[431

][5

70]

[676

][7

07]

[21]

[49]

[148

7][1

159]

[137

0][6

5][1

00]

[88]

[791

][4

83]

[668

][7

18]

[594

][8

54]

[825

][1

343]

[529

][8

38]

[104

8][1

229]

[43]

[744

][4

26]

[369

][1

131]

[436

][1

431]

[123

3][1

096]

[87]

[773

][8

72]

[137

8][1

3][5

82]

[605

][1

30]

[416

][2

48]

[749

][9

53]

[954

][2

11]

[113

8][6

86]

[119

2][3

4][4

47]

[103

4][2

52]

[882

][1

87]

[138

6][6

10]

[527

][8

95]

[147

7][2

33]

[116

2][1

163]

[930

][1

231]

[142

2][3

21]

[140

2][4

84]

[577

][1

362]

[37]

[498

][1

300]

[52]

[312

][1

026]

[112

7][1

292]

[93]

[107

][7

48]

[841

][1

301]

[541

][1

294]

[133

8][8

4][1

141]

[910

][1

286]

[140

1][1

47]

[345

][9

68]

[249

][1

031]

[136

7][1

74]

[126

2][6

25]

[691

][7

54]

[145

2][3

44]

[102

1][5

67]

[847

][7

13]

[118

7] [8]

[555

][1

212]

[355

][1

241]

[69]

[571

][3

3][1

467]

[714

][1

48]

[245

][2

72]

[822

][8

65]

[121

7][1

464]

[387

][6

18]

[141

0][2

08]

[900

][1

91]

[624

][6

48]

[117

8][3

16]

[334

][1

216]

[125

4][5

91]

[989

][1

0][9

4][9

5][3

56]

[377

][4

60]

[551

][5

92]

[649

][6

77]

[928

][9

42]

[109

8][1

124]

[121

8][1

407]

[142

8][9

46]

[70]

[115

6][5

99]

[140

4][1

83]

[109

2][7

80]

[521

][8

06]

[817

][9

65]

[105

1][1

237]

[138

3][7

1][1

172]

[338

][1

451]

[423

][1

425]

[135

][9

01]

[151

][1

060]

[533

][7

88]

[372

][1

335]

[401

][6

47]

[564

][1

153]

[874

][1

318]

[138

2][9

1][1

314]

[158

][1

309]

[150

1][2

71]

[966

][2

59]

[148

1][1

001]

[398

][4

46]

[732

][8

58]

[983

][1

36]

[537

][6

09]

[351

][3

86]

[432

][6

89]

[750

][6

06]

[612

][6

15]

[728

][7

55]

[103

3][1

243]

[145

9][7

36]

[903

][1

014]

[113

7][4

7][1

04]

[172

][3

76]

[394

][5

89]

[878

][1

035]

[123

8][1

348]

[142

9][1

480]

[149

7][4

77]

[116

][6

65]

[859

][1

150]

[138

4][1

424]

[330

][7

37]

[105

8][4

17]

[550

][1

463]

[607

][8

3][3

35]

[133

][3

75]

[487

][1

25]

[661

][1

293]

[111

][8

29]

[742

][9

79]

[108

2][1

130] [4]

[385

][5

38]

[684

][7

40]

[876

][1

414]

[141

5][1

46]

[185

][9

23]

[974

][3

10]

[353

][7

30]

[613

][6

14]

[124

][2

65]

[406

][4

62]

[915

][9

91]

[127

5][1

500]

[186

][2

79]

[347

][3

93]

[459

][6

70]

[864

][8

84]

[898

][9

09]

[962

][9

63]

[103

2][1

049]

[110

5][1

176]

[125

7][1

295]

[597

][1

008]

[17]

[119

9][6

46]

[106

5][1

306]

[133

4][2

4][1

351]

[114

0][1

77]

[861

][1

441]

[27]

[209

][4

50]

[90]

[231

][7

22]

[739

][9

58]

[107

7][3

70]

[881

][1

227]

[122

8][1

363]

[365

][1

434]

[411

][1

122]

[395

][9

73]

[797

][1

392]

[474

][7

67]

[481

][7

31]

[869

][1

151]

[25]

[472

][8

51]

[972

][1

016]

[137

2][1

90]

[513

][1

095]

[148

4][3

37]

[525

][9

59]

[118

1][1

310]

[92]

[102

0][1

337]

[137

9][5

12]

[104

2][1

094]

[827

][1

389]

[969

][1

200]

[497

][8

26]

[106

2][1

435]

[147

6][6

8][9

8][4

10]

[656

][7

09]

[719

][1

099]

[129

8][1

469]

[288

][3

01]

[758

][4

70]

[515

][4

61]

[824

][5

86]

[123

2][8

31]

[139

9][7

8][8

57]

[939

][1

28]

[843

][1

396]

[786

][1

291]

[176

][4

82]

[535

][1

96]

[110

][7

21]

[926

][9

71]

[107

8][3

04]

[449

][8

05]

[102

5][1

064]

[111

5] [6]

[57]

[127

][1

42]

[200

][2

53]

[260

][2

68]

[397

][6

31]

[638

][6

43]

[644

][8

42]

[892

][9

49]

[102

8][1

030]

[112

0][1

145]

[114

6][1

198]

[120

4][1

282]

[150

3][7

5][8

2][1

08]

[440

][4

91]

[751

][7

96]

[134

2][1

71]

[548

][1

027]

[106

6][3

59]

[873

][1

209]

[378

][1

002]

[58]

[161

][1

66]

[168

][1

73]

[195

][2

23]

[261

][4

22]

[475

][4

80]

[590

][6

00]

[617

][6

28]

[699

][7

15]

[787

][8

56]

[877

][9

67]

[984

][1

000]

[105

0][1

224]

[130

8][1

319]

[141

7][1

482]

[148

3][1

491]

[150

2][2

9][1

67]

[199

][3

68]

[389

][4

33]

[445

][4

85]

[611

][6

30]

[734

][1

086]

[113

2][1

170]

[132

5][1

377]

[144

8][1

110]

[143

7][1

296]

[109

][2

20]

[229

][2

80]

[384

][4

21]

[655

][9

20]

[975

][2

19]

[380

][1

223]

[147

1][7

4][7

6][7

7][2

12]

[266

][5

49]

[759

][9

48]

[104

3][1

289]

[143

9] [9]

[473

][8

08]

[988

][1

182]

[123

0][1

299]

[64]

[315

][1

52]

[816

][1

444]

[381

][4

95]

[501

][7

45]

[852

][9

51]

[999

][1

010]

[125

3][1

366]

[142

1][1

466]

[757

][9

9][6

63]

[205

][7

41]

[11]

[103

][2

44]

[348

][6

79]

[682

][8

93]

[925

][1

226]

[123

6][1

251]

[133

9][1

347]

[139

0][1

479]

[149

5][2

8][3

2][2

54]

[522

][5

31]

[716

][9

14]

[117

1][1

240]

[136

9][1

411]

[141

8][1

460]

[146

2][8

1][3

49]

[695

][3

82]

[727

]

050

0010

000

1500

020

000

Cohabitational trajectories, Ward method

(OM Distances, Indel=1, Subst. Cost based on Trans. Rate)Individual trajectories

Hei

ght

1 2 5 6 7 13 24 29 30 34 36 37 38 42 43 46 50 53 62 64 68 69 76 84 86 92 95 97 101

104

112

116

117

118

120

124

129

137

139

149

156

158

163

166

168

172

173

177

181

188

190

197

202

206

210

214

215

217

220

224

228

229

230

232

235

236

246

249

254

257

258

264

269

271

272

275

276

278

282

287

289

294

296

297

302

307

308

314

315

318

322

325

330

332

333

337

343

347

348

351

352

353

360

362

365

367

369

372

375

376

378

384

392

393

399

403

412

415

419

421

426

428

434

438

441

451

453

457

461

463

464

466

468

477

478

483

485

489

490

492

494

495

499

505

506

507

514

518

521

525

526

528

532

533

538

540

543

549

550

552

554

558

561

562

566

568

569

570

576

578

579

582

586

588

592

594

595

596

598

599

604

606

613

616

626

629

631

632

633

638

640

648

653

657

661

662

664

669

671

676

679

684

685

689

691

693

700

705

708

710

712

717

720

733

734

736

737

738

740

747

748

752

753

754

756

758

761

769

794

798

799

800

801

802

803

805

808

816

820

827

828

835

840

850

853

856

858

866

868

870

871

875

877

881

884

888

890

892

894

899

902

904

909

913

914

917

924

925

929

937

938

940

942

943

945

948

949

952

958

960

964

966

969

970

971

972

973

974

976

978

983

984

987

990

991

995

996

1000

1004

1006

1011

1017

1018

1020

1023

1024

1025

1027

1032

1033

1034

1038

1040

1043

1051

1057

1059

1066

1070

1071

1073

1075

1077

1087

1089

1090

1094

1096

1112

1113

1114

1115

1120

1121

1123

1128

1130

1152

1157

1162

1166

1169

1171

1175

1176

1183

1184

1185

1187

1188

1189

1191

1195

1197

1199

1201

1204

1208

1211

1213

1215

1219

1222

1228

1238

1243

1247

1253

1258

1261

1262

1263

1265

1267

1270

1274

1275

1277

1279

1282

1290

1297

1302

1303

1307

1309

1313

1318

1320

1324

1325

1329

1330

1332

1333

1339

1341

1343

1348

1358

1360

1363

1365

1366

1367

1371

1374

1376

1377

1383

1387

1392

1396

1398

1404

1409

1410

1415

1417

1418

1419

1420

1422

1425

1427

1428

1431

1432

1438

1441

1444

1445

1447

1448

1449

1450

1456

1463

1464

1470

1473

1476

1479

1489

1490

1494

1495

1498

1499

1501

1408 12 15 89 98 114

150

174

192

260

281

355

391

410

465

471

475

523

531

539

567

600

617

619

635

636

646

651

658

659

678

713

765

789

807

809

818

842

847

886

906

955

1007

1081

1099

1104

1110

1135

1182

1236

1250

1288

1295

1345

1350

1354

1382

1388

1389

1401

1403

1405

1413

1430

1439

1442

1471

1484 87 30

512

0719

610

6911

94 498

1047 64

912

317

119

920

985

525

084

332

065

610

09 922

1052

1165 3

110

1911

01 143

1393 91

135

427

453

447

260

270

785

112

5612

64 182

216

238

266

313

383

406

470

724

766

773

860

915

919

956

1002

1030

1055

1056

1063

1144

1147

1198

1210

1272

1314

1322

1327

1446

1452 99

2 48 331

481

6114

35 179

993

1352 13

444

241

790

710

1513

53 270

1496 79

191

898

1 463

750

491

021

212

35 373

444

723

625

1370 45

973

569

027 41 23

425

127

943

956

367

268

168

373

978

210

2910

9211

2511

4311

9612

0212

3213

0013

4914

86 40

502

82 146

285

370

519

557

781

811

830

837

1045

1048

1091

1203

1223

1229

1242

1455 9

122 20

549

654

559

371

574

175

182

586

111

7011

7312

3412

4412

76 138

1064 72

123 67 75 34

535

936

144

559

060

871

974

376

786

487

310

6111

3113

0413

3413

7314

81 47 385

1337

730

824

335

1159

1381 630

810

1072 91

695

4 10 336

155

364

982

170

786

931

327

509

19 610

1039

1306 93

068

057

413

1114

8513

75 3510

37 255

624

17 20 49 55 110

221

277

310

407

517

839

876

1323

1368 5

792

183

614

16 59 901

1126 43

079

099

7 7857

210

36 7919

484

810

010

08 838

780

119

620

660

262

328

947

1200

1257 62

226

711

90 857 74

655

1317

1468

164

729

905

564

1326

1328

1174

1453 3

772

418

1231 95

013

1213

3814

9110

7610

5311

3414

143

114

54 284

1108

1158 28

679

612

17 280

565

1139

1378 20

445

678

512

9810

41 241

686

529

1399 560

703

1237 26 169

544

1119

985

1466

1168 60 244

1097 24

563 67

311 56 13

114

422

322

730

031

734

935

736

639

740

141

342

342

943

344

945

448

648

851

551

653

654

255

357

357

564

164

466

367

470

976

077

778

481

382

283

492

693

496

399

911

0011

0511

4511

5411

6012

0912

1412

4912

6912

7312

8713

0514

0714

5914

7214

8314

9715

02 72 159

273

524

696

1078

1083

1412 11

148

010

35 559

1179 84

512

614

577

530

180

474

4 88 201

440

776

252

701

1137

157

731

1013

1284 53

098

821

376

387

998

651 13

238

069

277

812

59 295

611

1423 79

789

710

5810

7911

77 189

381

424

455

479

556

121

771

208

727

770

1281 41

410

058 81 99 482

218

779

416

826

953

812

832

1268

154

994

831

1192

420

891

535

687

476

634

883

793

547

1357

764

1088

1474 69

769

851

158

915

311

33 312

1141

334

603

338

1218

1347 37

773 42

714

214

75 107

462

903

371

702

1085

1239 50

057

710

7411

63 21

1167

1193 82

355

511

4913

1514

88 4512

52 239

342

382

762

833

510

8516

528

897

736

377

411

07 374 94

1461 81

911

4810

4614

36 7710

6034

051

311

0213

56 844

1103 89

616

754

682

158

598

910

2211

1113

1613

31 814

1434 19

850

387

288

713

3633

963

914

6511

1811

51 726

667

1117 90

013

7210

5413

6414

92 32 247

583

1361 72

512

21 54

222

1385 66

624

087

434

467

784

112

46 311

1240

324

704

319

1255

1248

458

512

409

927

1180

1245

914

013

4611

81 1639

510

26 309

405

746

957

1310

103

647

1344 85

254

110

65 732

788

1062 44 368

920

422

1283

1308 65

213

91 125

1362

1095

127

147

1129 86

226

878

740

825

693

649

713

55 668

80 469

1395

178

1294

1164 29

912

830

654

893

526

394

446

714

51 6579

588

272

869

913

42 152

303

1116

584

1429 10

218

387

810

01 316

527

675

829

979

1254 39 52 66 113

219

446

493

597

654

749

854

908

959

1028

1289 58 28

310

0310

1411

0611

3212

2012

7113

94 321

1042

1296 23

713

5910

4913

5114

8710

912

216

232

945

249

161

574

510

1014

7715

03 358

975

1044 59

110

1619

575

593

910

6813

1914

58 341

1227

1402

1421 75

038

962

816

196

835

091

274

275

718

584

696

134

645

0 9668

811

5513

35 105

980

207

1379

1278

1340 62

724

211

40 759

1127 89

829

111

2211

524

843

584

986

993

341

132

362

320

326

532

692

312

0514

93 443

614

694

1457

1500 19

310

50 711

642

2513

69 893

1280 33 148

191

394

1156

1380 96

512

41 93 390

447

618

650

932

1299

1462 10

815

126

161

211

53 356

551

621

1233

387

716

941

474

670

28 487

259

1478

1293

133

298

386

520

537

682

951

1012

1084

1251

1460 18

710

2113

0113

014

43 211

184

253

398

501

643

967

1146

1216

1482 39

622

629

011

5014

2410

8611

2413

90 425

437

460

865

867

885

928

1172

1321

1480 14 571

1260 99

810

31 70 581

817

508

1206 16

012

24 473

580

645

176

718

889

7110

67 607

783

1142 66

571

429

313

84 863

400

946

1225 18 379

605

1292

1406 89

511

3813

86 83

292

587

1098

1433 20

076

810

8211

7814

6714

6922

514

00 484

722

243

1426

1437 90 706

1285 88

030

410

612

86 609

1093

1411

1161 79

213

514

40 186

1186

231

601

1230

1136 136

436

815

1109

1226

175

522

388

962

180

402

859

1212

1414 23

369

580

613

9710

80 404

1291 43

244

812

66

050

0010

000

1500

020

000

Professional trajectories, Ward method

(OM Distances, Indel=1, Subst. Cost based on Trans. Rate)Individual trajectories

Hei

ght

LA Prof19/11/2008gr 58/95

Page 93: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

LA, State distribution by age, within cluster

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 1 : Conjugal Trajectories (16 %)

Fre

q. (

n=23

5)

0.0

0.2

0.4

0.6

0.8

1.0

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 2 : Parental Trajectories, Slow Transition (19 %)

Fre

q. (

n=28

5)

0.0

0.2

0.4

0.6

0.8

1.0

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 3: Parental Trajectories, Fast Transition (48 %)

Fre

q. (

n=71

4)

0.0

0.2

0.4

0.6

0.8

1.0

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 4 : Nestalgic Trajectories (7 %)

Fre

q. (

n=11

0)

0.0

0.2

0.4

0.6

0.8

1.0

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 5 : Solo or Reconstituted Family Trajectories (11 %)

Fre

q. (

n=15

9)

0.0

0.2

0.4

0.6

0.8

1.0

Biological father and motherOne biological parentOne biological parent with her/his partnerAloneWith partnerPartner and biological childPartner and non biological childBiological child and no partnerFriendsOther

19/11/2008gr 59/95

Page 94: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

LA, Most frequent sequences by cluster

3.4%

3.4%

3%

2.5%

2.5%

2.1%

2.1%

2.1%

2.1%

1.7%

Type 1 : Conjugal Trajectories (16 %)

Fre

q. (

n=23

5)

A20 A23 A26 A29 A32 A35 A38 A41 A44

1.1%

1.1%

1.1%

1.1%

0.7%

0.7%

0.7%

0.7%

0.7%

0.7%

Type 2 : Parental Trajectories, Slow Transition (19 %)

Fre

q. (

n=28

5)

A20 A23 A26 A29 A32 A35 A38 A41 A44

4.5%

3.5%

2.5%

2.4%

2.4%

2.2%

2%

1.8%

1.7%

1.5%

Type 3: Parental Trajectories, Fast Transition (48 %)

Fre

q. (

n=71

4)

A20 A23 A26 A29 A32 A35 A38 A41 A44

61.8%

1.8%1.8%0.9%0.9%0.9%0.9%0.9%0.9%0.9%

Type 4 : Nestalgic Trajectories (7 %)

Fre

q. (

n=11

0)

A20 A23 A26 A29 A32 A35 A38 A41 A44

3.1%

1.9%

1.9%

1.9%

1.9%

1.3%

1.3%

1.3%

1.3%

1.3%

Type 5 : Solo or Reconstituted Family Trajectories (11 %)

Fre

q. (

n=15

9)

A20 A23 A26 A29 A32 A35 A38 A41 A44

Biological father and motherOne biological parentOne biological parent with her/his partnerAloneWith partnerPartner and biological childPartner and non biological childBiological child and no partnerFriendsOther

19/11/2008gr 60/95

Page 95: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

LA, Sequence diversity within cluster

19/11/2008gr 61/95

Page 96: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

LA, Birth year distribution by clusterType 1 : Conjugal Trajectories (16 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Type 2 : Parental Trajectories, Slow Transition (19 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Type 3: Parental Trajectories, Fast Transition (48 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Type 4 : Nestalgic Trajectories (7 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Type 5 : Solo or Reconstituted Family Trajectories (11 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Overall

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

19/11/2008gr 62/95

Page 97: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Prof, State distribution by age, within cluster

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 1 : Full Time Trajectoires (53 %)

Fre

q. (

n=79

5)

0.0

0.2

0.4

0.6

0.8

1.0

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 2 : Mixed Part Time − Home Trajectories (13 %)

Fre

q. (

n=15

5)

0.0

0.2

0.4

0.6

0.8

1.0

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 3 : At Home Trajectories (16 %)

Fre

q. (

n=27

7)

0.0

0.2

0.4

0.6

0.8

1.0

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 4 : Part Time Trajectories (7 %)

Fre

q. (

n=10

1)

0.0

0.2

0.4

0.6

0.8

1.0

A20 A23 A26 A29 A32 A35 A38 A41 A44

Type 5 : Missing Data (11 %)

Fre

q. (

n=17

5)

0.0

0.2

0.4

0.6

0.8

1.0

MissingFull timePart timeNegative breakPositive breakAt homeRetiredEducation

19/11/2008gr 63/95

Page 98: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Prof, Most frequent sequences by cluster

56.6%

8.4%

3.8%

2.8%2.4%2.3%1.9%1.8%0.8%0.6%

Type 1 : Full Time Trajectoires (53 %)

Fre

q. (

n=79

5)

A20 A23 A26 A29 A32 A35 A38 A41 A44

2.6%

2.6%

2.6%

1.9%

1.9%

1.9%

1.9%

1.9%

1.3%

1.3%

Type 2 : Mixed Part Time − Home Trajectories (13 %)

Fre

q. (

n=15

5)

A20 A23 A26 A29 A32 A35 A38 A41 A44

5.4%

4%

4%

3.2%

3.2%

3.2%

2.9%

2.2%

2.2%

2.2%

Type 3 : At Home Trajectories (16 %)

Fre

q. (

n=27

7)

A20 A23 A26 A29 A32 A35 A38 A41 A44

5%

5%

5%

4%

2%

2%

2%

2%

2%

2%

Type 4 : Part Time Trajectories (7 %)

Fre

q. (

n=10

1)

A20 A23 A26 A29 A32 A35 A38 A41 A44

34.9%

4.6%

3.4%

3.4%

2.9%

2.3%

2.3%1.7%1.7%1.7%

Type 5 : Missing Data (11 %)

Fre

q. (

n=17

5)

A20 A23 A26 A29 A32 A35 A38 A41 A44

MissingFull timePart timeNegative breakPositive breakAt homeRetiredEducation

19/11/2008gr 64/95

Page 99: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Prof, Sequence diversity within cluster

19/11/2008gr 65/95

Page 100: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataDistances between sequences: Clustering

Prof, Birth year distribution by clusterType 1 : Full Time Trajectoires (53 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Type 2 : Mixed Part Time − Home Trajectories (13 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Type 3 : At Home Trajectories (16 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Type 4 : Part Time Trajectories (7 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Type 5 : Missing Data (11 %)

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Overall

Birth year

Den

sity

1910 1920 1930 1940 1950 1960

0.00

0.01

0.02

0.03

0.04

0.05

0.06

19/11/2008gr 66/95

Page 101: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination

Section content

3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences

EntropyLongitudinal within sequence entropies

Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences

19/11/2008gr 67/95

Page 102: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination

Heterogeneity of set of sequences

Sum of squares can be expressed in terms of the distancesbetween each pair of points

SS =n∑

i=1(yi − y)2 =

1n

n∑i=1

n∑j=i+1

(yi − yj)2

=1n

n∑i=1

n∑j=i+1

dij

Setting dij to the OM, LCP, LCS, ... distance, we get ameasure of diversity or heterogeneity of sequences.Can apply ANOVA principle to sequences.

19/11/2008gr 68/95

Page 103: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination

Heterogeneity of set of sequences

Sum of squares can be expressed in terms of the distancesbetween each pair of points

SS =n∑

i=1(yi − y)2 =

1n

n∑i=1

n∑j=i+1

(yi − yj)2

=1n

n∑i=1

n∑j=i+1

dij

Setting dij to the OM, LCP, LCS, ... distance, we get ameasure of diversity or heterogeneity of sequences.Can apply ANOVA principle to sequences.

19/11/2008gr 68/95

Page 104: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination

Heterogeneity of set of sequences

Sum of squares can be expressed in terms of the distancesbetween each pair of points

SS =n∑

i=1(yi − y)2 =

1n

n∑i=1

n∑j=i+1

(yi − yj)2

=1n

n∑i=1

n∑j=i+1

dij

Setting dij to the OM, LCP, LCS, ... distance, we get ameasure of diversity or heterogeneity of sequences.Can apply ANOVA principle to sequences.

19/11/2008gr 68/95

Page 105: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination

Heterogeneity analysisProfessional trajectories by sex

19/11/2008gr 69/95

Page 106: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataHeterogeneity analysis and sequence discrimination

Sequence Tree

19/11/2008gr 70/95

Page 107: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences

Section content

3 Characterizing, rendering and clustering sequence dataLife trajectoriesCharacterizing sets of sequences

EntropyLongitudinal within sequence entropies

Distances between sequences: ClusteringHeterogeneity analysis and sequence discriminationMultidimensional Scaling representation of sequences

19/11/2008gr 71/95

Page 108: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences

Multidimensional Scaling: Principle

Let D be a distance matrix between sequences.D computed using OM, LPS, LCS, ... metrics.Multidimensional Scaling consists in

Finding a set of real valued variables (f1, f2) such that theδij =

√(fi1 − fj1)2 + (fi2 − fj2)2 best approximate the

distances between sequences.Plotting the points in the (f1, f2) space.

19/11/2008gr 72/95

Page 109: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences

Multidimensional Scaling: Principle

Let D be a distance matrix between sequences.D computed using OM, LPS, LCS, ... metrics.Multidimensional Scaling consists in

Finding a set of real valued variables (f1, f2) such that theδij =

√(fi1 − fj1)2 + (fi2 − fj2)2 best approximate the

distances between sequences.Plotting the points in the (f1, f2) space.

19/11/2008gr 72/95

Page 110: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences

Multidimensional Scaling: Principle

Let D be a distance matrix between sequences.D computed using OM, LPS, LCS, ... metrics.Multidimensional Scaling consists in

Finding a set of real valued variables (f1, f2) such that theδij =

√(fi1 − fj1)2 + (fi2 − fj2)2 best approximate the

distances between sequences.Plotting the points in the (f1, f2) space.

19/11/2008gr 72/95

Page 111: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesCharacterizing, rendering and clustering sequence dataMultidimensional Scaling representation of sequences

Multidimensional Scaling

−20 −10 0 10 20

−20

−10

010

2030

40

Multidimensional scaling representation, colored cluster of cohabitation

MDS axis 1

MD

S a

xis

2

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

● ●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

Type 1 : Conjugal Trajectories (16 %)Type 2 : Parental Trajectories, Slow Transition (19 %)Type 3: Parental Trajectories, Fast Transition (48 %)Type 4 : Nestalgic Trajectories (7 %)Type 5 : Solo or Reconstituted Family Trajectories (11 %)

−20 −10 0 10 20

−20

−10

010

2030

40

Multidimensional scaling representation, colored cluster of professional trajectories

MDS axis 1

MD

S a

xis

2

●●● ●

● ●

●●●● ●

●●●● ●●●

●●

●●●●●●●●●

● ●●

●●●

●●● ●●●●●

● ●●●●●●●●●

●● ●

●●●●

●●●

● ●●●●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●●

Type 1 : Full Time Trajectoires (53 %)Type 2 : Mixed Part Time − Home Trajectories (13 %)Type 3 : At Home Trajectories (16 %)Type 4 : Part Time Trajectories (7 %)Type 5 : Missing Data (11 %)

19/11/2008gr 73/95

Page 112: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent Episodes

Table of content

1 Sequence Analysis in Social Sciences

2 Survival Trees

3 Characterizing, rendering and clustering sequence data

4 Mining Frequent Episodes

19/11/2008gr 74/95

Page 113: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent Episodes

Mining Frequent Episodes

(Time stamped) event sequencesWhat can we expect from frequent episodes mining?

GSP (Srikant and Agrawal, 1996)MINEPI, WINEPI (Mannila et al., 1997)TCG, TAG (Bettini et al., 1996)SPADE (Zaki, 2001)

Are there specific issues when applying these methods insocial sciences?

19/11/2008gr 75/95

Page 114: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent Episodes

Mining Frequent Episodes

(Time stamped) event sequencesWhat can we expect from frequent episodes mining?

GSP (Srikant and Agrawal, 1996)MINEPI, WINEPI (Mannila et al., 1997)TCG, TAG (Bettini et al., 1996)SPADE (Zaki, 2001)

Are there specific issues when applying these methods insocial sciences?

19/11/2008gr 75/95

Page 115: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Section content

4 Mining Frequent EpisodesWhat Is It About?Example: Counting Alternate Episode StructuresFrequent and discriminant episodes

19/11/2008gr 76/95

Page 116: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Frequent episodes. What is it?

Episode: Collection of events occurring frequently together.Mining typical (frequent) episodes:

Specialized case of mining frequent itemsets.Time dimension ⇒ Partially ordered events.

More complex than unordered itemsets: User mustspecify time constraints (and episode structure constraints).select a counting method.

19/11/2008gr 77/95

Page 117: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Frequent episodes. What is it?

Episode: Collection of events occurring frequently together.Mining typical (frequent) episodes:

Specialized case of mining frequent itemsets.Time dimension ⇒ Partially ordered events.

More complex than unordered itemsets: User mustspecify time constraints (and episode structure constraints).select a counting method.

19/11/2008gr 77/95

Page 118: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Episode structure constraints

For people who leave home within 2 years from their 17, what aretypical events occurring until they get married and have a firstchild?

LH,17

w = 2

??

w = 1

C1

M(0, 4)

(0,3)

(0, 1, 10)

elastic

event constraints

parallel

node constraint

edge constraints

19/11/2008gr 78/95

Page 119: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Episode structure constraints

For people who leave home within 2 years from their 17, what aretypical events occurring until they get married and have a firstchild?

LH,17

w = 2

??

w = 1

C1

M(0, 4)

(0,3)

(0, 1, 10)

elastic

event constraints

parallel

node constraint

edge constraints

19/11/2008gr 78/95

Page 120: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Episode structure constraints

For people who leave home within 2 years from their 17, what aretypical events occurring until they get married and have a firstchild?

LH,17

w = 2

??

w = 1

C1

M(0, 4)

(0,3)

(0, 1, 10)

elastic

event constraints

parallel

node constraint

edge constraints

19/11/2008gr 78/95

Page 121: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Counting methods (Joshi et al., 2001)

20 21 22 23 24

U UUC C C

Searching (U,C)min gap= 1, max gap= 2, win size= 2

indiv. with episode COBJ = 1

windows with episode CWIN = 3

min win. with episode CminWIN = 2

distinct occurrences CDIS_o = 5

dist. occ. without overlap CDIS = 3

19/11/2008gr 79/95

Page 122: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Counting methods (Joshi et al., 2001)

20 21 22 23 24

U UUC C C

Searching (U,C)min gap= 1, max gap= 2, win size= 2

indiv. with episode COBJ = 1

windows with episode CWIN = 3

min win. with episode CminWIN = 2

distinct occurrences CDIS_o = 5

dist. occ. without overlap CDIS = 3

19/11/2008gr 79/95

Page 123: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Counting methods (Joshi et al., 2001)

20 21 22 23 24

U UUC C C

Searching (U,C)min gap= 1, max gap= 2, win size= 2

indiv. with episode COBJ = 1

windows with episode CWIN = 3

min win. with episode CminWIN = 2

distinct occurrences CDIS_o = 5

dist. occ. without overlap CDIS = 3

19/11/2008gr 79/95

Page 124: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesWhat Is It About?

Counting methods (Joshi et al., 2001)

20 21 22 23 24

U UUC C C

Searching (U,C)min gap= 1, max gap= 2, win size= 2

indiv. with episode COBJ = 1

windows with episode CWIN = 3

min win. with episode CminWIN = 2

distinct occurrences CDIS_o = 5

dist. occ. without overlap CDIS = 3

19/11/2008gr 79/95

Page 125: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesExample: Counting Alternate Episode Structures

Section content

4 Mining Frequent EpisodesWhat Is It About?Example: Counting Alternate Episode StructuresFrequent and discriminant episodes

19/11/2008gr 80/95

Page 126: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesExample: Counting Alternate Episode Structures

Example: Counting alternate structures (COBJ, no max gap)

0%

5%

10%

15%

20%

25%

30%

Child <

Marr

iage

Marriag

e < C

hild

Child =

Marr

iage

Child <

Job

Job <

Chil

d

Child =

Job

Child <

Edu

c end

Educ e

nd <

Child

Child =

Edu

c end

Marriag

e < Jo

b

Job <

Marr

iage

Marriag

e = Jo

b

Marriag

e < Edu

c end

Educ e

nd <

Marriag

e

Marriag

e = Edu

c end

Job <

Edu

c end

Educ e

nd <

Job

Job =

Edu

c end

Switzerland, SHP 2002 biographical survey (n = 5560).19/11/2008gr 81/95

Page 127: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesFrequent and discriminant episodes

Section content

4 Mining Frequent EpisodesWhat Is It About?Example: Counting Alternate Episode StructuresFrequent and discriminant episodes

19/11/2008gr 82/95

Page 128: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesFrequent and discriminant episodes

Frequent episodes, cohabitational and professional trajectories0.

00.

10.

20.

30.

40.

50.

6

(Bio

logi

cal f

athe

r an

d m

othe

r)

(With

par

tner

>P

artn

er a

nd b

iolo

gica

l chi

ld)

(Bio

logi

cal f

athe

r an

d m

othe

r)−

(With

par

tner

>P

artn

er a

nd b

iolo

gica

l chi

ld)

(Alo

ne>

With

par

tner

)

(Bio

logi

cal f

athe

r an

d m

othe

r>W

ith p

artn

er)

(Bio

logi

cal f

athe

r an

d m

othe

r)−

(Bio

logi

cal f

athe

r an

d m

othe

r>W

ith p

artn

er)

(Bio

logi

cal f

athe

r an

d m

othe

r>A

lone

)

(Bio

logi

cal f

athe

r an

d m

othe

r)−

(Bio

logi

cal f

athe

r an

d m

othe

r>A

lone

)

(Bio

logi

cal f

athe

r an

d m

othe

r>P

artn

er a

nd b

iolo

gica

l chi

ld)

(Bio

logi

cal f

athe

r an

d m

othe

r)−

(Bio

logi

cal f

athe

r an

d m

othe

r>P

artn

er a

nd b

iolo

gica

l chi

ld)

(Bio

logi

cal f

athe

r an

d m

othe

r>W

ith p

artn

er)−

(With

par

tner

>P

artn

er a

nd b

iolo

gica

l chi

ld)

(Bio

logi

cal f

athe

r an

d m

othe

r)−

(Bio

logi

cal f

athe

r an

d m

othe

r>W

ith p

artn

er)−

(With

par

tner

>P

artn

er a

nd b

iolo

gica

l chi

ld)

(Alo

ne>

With

par

tner

)−(W

ith p

artn

er>

Par

tner

and

bio

logi

cal c

hild

)

(Bio

logi

cal f

athe

r an

d m

othe

r)−

(Alo

ne>

With

par

tner

)

(Bio

logi

cal f

athe

r an

d m

othe

r>A

lone

)−(A

lone

>W

ith p

artn

er)

0.0

0.1

0.2

0.3

0.4

0.5

(Ful

l tim

e)

(Edu

catio

n)

(Edu

catio

n>F

ull t

ime)

(Edu

catio

n)−

(Edu

catio

n>F

ull t

ime)

(Ful

l tim

e>A

t hom

e)

(Ful

l tim

e)−

(Ful

l tim

e>A

t hom

e)

(Ful

l tim

e>P

art t

ime)

(At h

ome>

Par

t tim

e)

(Mis

sing

)

(Ful

l tim

e)−

(Ful

l tim

e>P

art t

ime)

(Ful

l tim

e>A

t hom

e)−

(At h

ome>

Par

t tim

e)

(Ful

l tim

e)−

(At h

ome>

Par

t tim

e)

(Par

t tim

e>F

ull t

ime)

(Ful

l tim

e)−

(Ful

l tim

e>A

t hom

e)−

(At h

ome>

Par

t tim

e)

19/11/2008gr 83/95

Page 129: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesFrequent and discriminant episodes

Discriminant episodes, professional trajectories, women

FullTime Mixed AtHome PartTime

(Full time>At home)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

FullTime Mixed AtHome PartTime

(Full time)−(Full time>At home)

0.0

0.1

0.2

0.3

0.4

0.5

FullTime Mixed AtHome PartTime

(At home>Part time)

0.0

0.1

0.2

0.3

0.4

FullTime Mixed AtHome PartTime Missing

(Full time>Part time)

0.0

0.1

0.2

0.3

0.4

0.5

Mixed AtHome PartTime

(Full time>At home)−(At home>Part time)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Mixed AtHome PartTime

(Full time)−(At home>Part time)

0.00

0.05

0.10

0.15

0.20

0.25

19/11/2008gr 84/95

Page 130: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesMining Frequent EpisodesFrequent and discriminant episodes

Discriminant episodes, professional trajectories, men

FullTime Mixed Missing

(Missing)

0.0

0.1

0.2

0.3

0.4

FullTime Mixed PartTime Missing

(Education>Missing)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

FullTime Mixed PartTime Missing

(Full time)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

FullTime Mixed PartTime Missing

(Education>Full time)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

FullTime Mixed PartTime Missing

(Education)−(Education>Full time)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

FullTime Mixed PartTime Missing

(Education)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

19/11/2008gr 85/95

Page 131: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSummary

Summary

Data mining approaches (survival trees, clustering sequences,frequent episodes) have promising future in life courseanalysis.

Complement classical statistical outcomes with new insights.

Their use within social sciences raises specific issues:Accounting for multi-level effects when growing survival tree ormining association rules.Handling time varying predictors in survival trees.Selecting relevant counting methods (event dependent)?Suitable criteria for measuring association strength betweenfrequent episodes....

19/11/2008gr 86/95

Page 132: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSummary

Summary

Data mining approaches (survival trees, clustering sequences,frequent episodes) have promising future in life courseanalysis.

Complement classical statistical outcomes with new insights.

Their use within social sciences raises specific issues:Accounting for multi-level effects when growing survival tree ormining association rules.Handling time varying predictors in survival trees.Selecting relevant counting methods (event dependent)?Suitable criteria for measuring association strength betweenfrequent episodes....

19/11/2008gr 86/95

Page 133: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSummary

Our TraMineR R-package

Let me finish with an Add ...

TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://cran.r-project.org (CRAN)see also http://mephisto.unige.ch/biomining

19/11/2008gr 87/95

Page 134: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSummary

Our TraMineR R-package

Let me finish with an Add ...

TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://cran.r-project.org (CRAN)see also http://mephisto.unige.ch/biomining

19/11/2008gr 87/95

Page 135: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSummary

Our TraMineR R-package

Let me finish with an Add ...

TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://cran.r-project.org (CRAN)see also http://mephisto.unige.ch/biomining

19/11/2008gr 87/95

Page 136: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSummary

Our TraMineR R-package

Let me finish with an Add ...

TraMineR, a free life trajectory mining toolfor the free open source R statistical environment.downloadable from http://cran.r-project.org (CRAN)see also http://mephisto.unige.ch/biomining

19/11/2008gr 87/95

Page 137: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesSummary

Thank You!Thank You!

19/11/2008gr 88/95

Page 138: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesAppendixZoomed tree

Divorce, Switzerland, Differences in KM Survival CurvesI

Return

� � � � � � � � � � �

� � � � � � � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � � � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � �

� � � � � � � � � �

� � �

� � � � � � � � � � � �

� � � � � �� � � � � � � � � �

� � � � � � � � � � �

� � � � ! " " #$ � "

� � � � � � � � � � � �

� � � � � � �� � � � � � � � �

� � � � � � � � � �

% & � � � � � � � # � � � ' � � � # � � � �

� � � � � � � �

% & � � � � � � � � # � � ' � � � # � � � � % & � � � � � � � # � � � ' � � � # � � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � � �

� � � � � � � � � �

� � � � � � � � � � �

� � � � � � � � �� � � � � � � � � �

� � � � � � � � � �

� � � � � � � � ( � ) � *� � � � � � � � � �

� � � � � � � �

% & � � � � � � � # � � � � ' � � � # � � � �

� � � � � � � � � � �

� � � � � � � � �� � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � �

� � � � � � � � �

$ � "� �

� � � � � � � �

% & � � � � � � � # � � � ' � � � # � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

$ � "� �

� � � � � � � �

% & � � � � � � � # � � � � ' � � � # � � �

+ � + �

+

+ � + + � + �19/11/2008gr 89/95

Page 139: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesAppendixZoomed tree

Return

� � � � � � � � � � �

� � � � � � � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � � � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � �

� � � � � � � � � �

� � �

� � � � � � � � � � � �

� � � � � �� � � � � � � � � �

� � � � � � � � � � �

� � � � ! " " #$ � "

� � � � � � � � � � � �

� � � � � � �� � � � � � � � �

� � � � � � � � � �

% & � � � � � � � # � � � ' � � � # � � � �

� � � � � � � �

% & � � � � � � � � # � � ' � � � # � � � � % & � � � � � � � # � � � ' � � � # � � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � � �

� � � � � � � � � �

� � � � � � � � � � �

� � � � � � � � �� � � � � � � � � �

� � � � � � � � � �

� � � � � � � � ( � ) � *� � � � � � � � � �

� � � � � � � �

% & � � � � � � � # � � � � ' � � � # � � � �

� � � � � � � � � � �

� � � � � � � � �� � � � � � � �

� � � � � � � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � �

� � � � � � � � �

$ � "� �

� � � � � � � �

% & � � � � � � � # � � � ' � � � # � � �

� � � � � � � � � � � �

� � � � � � �� � � � � � � �

� � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � �� � � � � � � � �

� � � � � � � � � �

$ � "� �

� � � � � � � �

% & � � � � � � � # � � � � ' � � � # � � �

+ � + �

+

+ � + + � + �

19/11/2008gr 89/95

Page 140: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesAppendixFor Further Reading

For Further Reading I

Abbott, A. and J. Forrest (1986). Optimal matching methods forhistorical sequences. Journal of Interdisciplinary History 16,471–494.

Bettini, C., X. S. Wang, and S. Jajodia (1996). Testing complextemporal relationships involving multiple granularities and itsapplication to data mining (extended abstract). In PODS ’96:Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGARTsymposium on Principles of database systems, New York, pp.68–78. ACM Press.

19/11/2008gr 90/95

Page 141: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesAppendixFor Further Reading

For Further Reading II

Billari, F. C. (2005). Life course analysis: Two (complementary)cultures? Some reflections with examples from the analysis oftransition to adulthood. In R. Levy, P. Ghisletta, J.-M. Le Goff,D. Spini, and E. Widmer (Eds.), Towards an InterdisciplinaryPerspective on the Life Course, Advances in Life CourseResearch, Vol. 10, pp. 267–288. Amsterdam: Elsevier.

Blossfeld, H.-P. and G. Rohwer (2002). Techniques of EventHistory Modeling, New Approaches to Causal Analysis (2nded.). Mahwah NJ: Lawrence Erlbaum.

Elzinga, C. H. (2008). Sequence analysis: Metric representationsof categorical time series. Sociological Methods and Research.forthcoming.

19/11/2008gr 91/95

Page 142: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesAppendixFor Further Reading

For Further Reading III

Elzinga, C. H. and A. C. Liefbroer (2007). De-standardization offamily-life trajectories of young adults: A cross-nationalcomparison using sequence analysis. European Journal ofPopulation 23, 225–250.

Gauthier, J.-A. (2007). Empirical categorizations of socialtrajectories: A sequential view on the life course. Thèse,Université de Lausanne, Faculté des sciences sociales et politique(SSP), Lausanne.

Huang, X., S. Chen, and S. Soong (1998). Piecewise exponentialsurvival trees with time-dependent covariates. Biometrics 54,1420–1433.

19/11/2008gr 92/95

Page 143: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesAppendixFor Further Reading

For Further Reading IV

Joshi, M. V., G. Karypis, and V. Kumar (2001). A universalformulation of sequential patterns. In Proceedings of theKDD’2001 workshop on Temporal Data Mining, San Fransisco,August 2001.

Leblanc, M. and J. Crowley (1992). Relative risk trees for censoredsurvival data. Biometrics 48, 411–425.

Levenshtein, V. (1966). Binary codes capable of correctingdeletions, insertions, and reversals. Soviet Physics Doklady 10,707–710.

Mannila, H., H. Toivonen, and A. I. Verkamo (1997). Discovery offrequent episodes in event sequences. Data Mining andKnowledge Discovery 1(3), 259–289.

19/11/2008gr 93/95

Page 144: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesAppendixFor Further Reading

For Further Reading VNeedleman, S. and C. Wunsch (1970). A general method

applicable to the search for similarities in the amino acidsequence of two proteins. Journal of Molecular Biology 48,443–453.

Segal, M. R. (1988). Regression trees for censored data.Biometrics 44, 35–47.

Segal, M. R. (1992). Tree-structured methods for longitudinaldata. Journal of the American Statistical Association 87(418),407–418.

Srikant, R. and R. Agrawal (1996). Mining sequential patterns:Generalizations and performance improvements. In P. M. G.Apers, M. Bouzeghoub, and G. Gardarin (Eds.), Advances inDatabase Technologies – 5th International Conference onExtending Database Technology (EDBT’96), Avignon, France,Volume 1057, pp. 3–17. Springer-Verlag.

19/11/2008gr 94/95

Page 145: Mining Event Histories: Some New Insights on Personal Swiss …mephisto.unige.ch/pub/publications/gr/slides/bm_rits... · 2011. 9. 23. · MiningEventHistories Mytalkisaboutlifecourses,

Mining Event HistoriesAppendixFor Further Reading

For Further Reading VI

Zaki, M. J. (2001). SPADE: An efficient algorithm for miningfrequent sequences. Machine Learning 42(1/2), 31–60.

19/11/2008gr 95/95