the evidential value of mobile phone colocation

31
The evidential value of mobile phone co-location Richard Gill Mathematical Institute, Leiden University http://www.math.leidenuniv.nl/~gill Joint work with Helena van Eijck (master thesis, Statistical Science programme) http://www.math.leidenuniv.nl/nl/theses/382/ Data Science Meetup Utrecht, 23 January 2014

Upload: richard-gill

Post on 10-May-2015

199 views

Category:

Science


1 download

DESCRIPTION

Criminals are well aware that making a phone call leaves a trace behind, which might later be used by police, and later still, in court. Therefore they will often switch phones, and preferably use more or less anonymous phones for "business". However, at the same time as they are using one phone for their work activities, they are possibly using another phone for legitimate business or for ordinary private purposes. This leads to the phenomenon called "co-location": two mobile phones apparently moving together, each separately making calls, but as if the two phones are in the same hands. How can one find phones, and then co-locating phones, associated with some crime? Can one decide from a short history of apparent co-location whether or not the two phones were in the same hands? How strong is the weight of the evidence in discriminating between two hypotheses: the phones colocate by chance (defence hypothesis) or they colocate because they are in the same hands (prosecution hypothesis)? We have to distinguish two phases of "research": exploratory (criminal investigation) and confirmatory (criminal prosecution). I discuss the roles of statistics in these two phases of forensic statistical analysis of mobile phone co-location.

TRANSCRIPT

Page 1: The evidential value of mobile phone colocation

The evidential value of mobile phone co-location

Richard Gill Mathematical Institute, Leiden University

http://www.math.leidenuniv.nl/~gill

Joint work with Helena van Eijck (master thesis, Statistical Science programme)

http://www.math.leidenuniv.nl/nl/theses/382/

Data Science Meetup Utrecht, 23 January 2014

Page 2: The evidential value of mobile phone colocation

The chance of coincidence?• DNA match

• Finger print match

• Handwriting match

• ... and so on ...

Match probability = P(Coincidence | Hdefence); or better, !Likelihood Ratio (LR) = P(Coincidence | Hdefence): P(Coincidence | Hprosecution)

Page 3: The evidential value of mobile phone colocation

Mobile phone co-location

• Mobile phone co-location: two cell phones used over a long time period in a way consistent with them being carried by one person

Page 4: The evidential value of mobile phone colocation

Visualisation (simulated data)

(all analysis done in , of course)

Page 5: The evidential value of mobile phone colocation

Visualisation (simulated data)

Page 6: The evidential value of mobile phone colocation

Hariri Case• 14 February 2005: assassination, Beirut

• Lebanon Police investigation, continued by UNIIIC (2005), and STL (2009)

• 2011: STL publishes indictment

• 2014: trial opens“The case against the Accused is built in large part on circumstantial evidence. Circumstantial evidence, which works logically by inference and deduction, is often more reliable than direct evidence, which can suffer from first-hand memory loss or eye-witness distortion. It is a recognised legal principle that circumstantial evidence has similar weight and probative value as direct evidence and that circumstantial evidence can be stronger than direct evidence.”

Page 7: The evidential value of mobile phone colocation

http://www.stl-tsl.org/en/the-cases/stl-11-01

Page 8: The evidential value of mobile phone colocation

Analysis of CDR revealed co-locating phones ...

• “Red network” phones associated with surveillance and assassination (covert: anonymous & closed)

• “Blue network” phones associated with logistics, preparation (anonymous but open)

• “Green network” phones associated with chain of command (covert)

• PMP’s (personal mobile phones)

• ...

“Call Data Records”: Per call: Cell towers, time, phone numbers

Page 9: The evidential value of mobile phone colocation

How they found co-locating phones

• Given: a “target phone” (already associated with crime)

• Select notable patterns of movement

• Look for candidate co-locators (match same pattern)

• Follow-up the “hits” in time: do they de-co-locate? (look for an anomaly)

Page 10: The evidential value of mobile phone colocation

Issues• Texas sharp-shooter (testing a hypothesis

suggested by the data)

• Likelihood ratio: needs two models

• Is a model of typical behaviour relevant to evaluation of specific case?

• Is a sample from the population relevant to evaluation of a specific case?

Page 11: The evidential value of mobile phone colocation

Our approach

• Part I: investigate reliability of search procedure

• Part II: quantify evidential value of each specific pair of co-locating phones using permutation approach

Page 12: The evidential value of mobile phone colocation

Interlude: How good is the data?

• Intermittency

• Inaccuracy

Page 13: The evidential value of mobile phone colocation

Does CDR data uniquely characterise you?

Unique in the Crowd: The privacy boundsof human mobilityYves-Alexandre de Montjoye1,2, Cesar A. Hidalgo1,3,4, Michel Verleysen2 & Vincent D. Blondel2,5

1Massachusetts Institute of Technology, Media Lab, 20 Ames Street, Cambridge, MA 02139 USA, 2Universite catholique deLouvain, Institute for Information and Communication Technologies, Electronics and Applied Mathematics, Avenue GeorgesLemaıtre 4, B-1348 Louvain-la-Neuve, Belgium, 3Harvard University, Center for International Development, 79 JFK Street,Cambridge, MA 02138, USA, 4Instituto de Sistemas Complejos de Valparaıso, Paseo 21 de Mayo, Valparaıso, Chile,5Massachusetts Institute of Technology, Laboratory for Information and Decision Systems, 77 Massachusetts Avenue, Cambridge,MA 02139, USA.

We study fifteen months of human mobility data for one and a half million individuals and find that humanmobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly,and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points areenough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find aformula for the uniqueness of human mobility traces given their resolution and the available outsideinformation. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10power of their resolution. Hence, even coarse datasets provide little anonymity. These findings representfundamental constraints to an individual’s privacy and have important implications for the design offrameworks and institutions dedicated to protect the privacy of individuals.

Derived from the Latin Privatus, meaning ‘‘withdraw from public life,’’ the notion of privacy has beenfoundational to the development of our diverse societies, forming the basis for individuals’ rights such asfree speech and religious freedom1. Despite its importance, privacy has mainly relied on informal pro-

tection mechanisms. For instance, tracking individuals’ movements has been historically difficult, making themde-facto private. For centuries, information technologies have challenged these informal protection mechanisms.In 1086, William I of England commissioned the creation of the Doomsday book, a written record of majorproperty holdings in England containing individual information collected for tax and draft purposes2. In the late19th century, de-facto privacy was similarly threatened by photographs and yellow journalism. This resulted inone of the first publications advocating privacy in the U.S. in which Samuel Warren and Louis Brandeis arguedthat privacy law must evolve in response to technological changes3.

Modern information technologies such as the Internet and mobile phones, however, magnify the uniqueness ofindividuals, further enhancing the traditional challenges to privacy. Mobility data is among the most sensitivedata currently being collected. Mobility data contains the approximate whereabouts of individuals and can beused to reconstruct individuals’ movements across space and time. Individual mobility traces T [Fig. 1A–B] havebeen used in the past for research purposes4–18 and to provide personalized services to users19. A list of potentiallysensitive professional and personal information that could be inferred about an individual knowing only hismobility trace was published recently by the Electronic Frontier Foundation20. These include the movements of acompetitor sales force, attendance of a particular church or an individual’s presence in a motel or at an abortionclinic.

While in the past, mobility traces were only available to mobile phone carriers, the advent of smartphones andother means of data collection has made these broadly available. For example, AppleH recently updated its privacypolicy to allow sharing the spatio-temporal location of their users with ‘‘partners and licensees’’21. 65.5B geo-tagged payments are made per year in the US22 while Skyhook wireless is resolving 400 M user’s WiFi locationevery day23. Furthermore, it is estimated that a third of the 25B copies of applications available on Apple’s AppStoreSM access a user’s geographic location24,25, and that the geo-location of ,50% of all iOS and Android traffic isavailable to ad networks26. All these are fuelling the ubiquity of simply anonymized mobility datasets and aregiving room to privacy concerns.

A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier.Yet, if individual’s patterns are unique enough, outside information can be used to link the data back to anindividual. For instance, in one study, a medical database was successfully combined with a voters list to extract

SUBJECT AREAS:APPLIED PHYSICS

APPLIED MATHEMATICS

STATISTICS

COMPUTATIONAL SCIENCE

Received

1 October 2012

Accepted

4 February 2013

Published

25 March 2013

Correspondence andrequests for materials

should be addressed toY.-A. de M. (yva@mit.

edu)

SCIENTIFIC REPORTS | 3 : 1376 | DOI: 10.1038/srep01376 1

NATURE/SCIENTIFIC REPORTS March 2013

Page 14: The evidential value of mobile phone colocation

Does CDR data uniquely characterise you?

NATURE/SCIENTIFIC REPORTS March 2013

function fits the data better than other two-parameters functionssuch as a 2 exp (lx), a stretched exponential a 2 exp xb, or astandard linear function a 2 bx (see Table S1). Both estimators fora and b are highly significant (p , 0.001)32, and the mean pseudo-R2

is 0.98 for the Ip54 case and the Ip510 case. The fit is good at all levelsof spatial and temporal aggregation [Fig. S3A–B].

The power-law dependency of e means that, on average, each timethe spatial or temporal resolution of the traces is divided by two, theiruniqueness decreases by a constant factor , (2)2b. This implies thatprivacy is increasingly hard to gain by lowering the resolution of adataset.

Fig. 2B shows that, as expected, e increases with p. The mitigatingeffect of p on e is mediated by the exponent b which decays linearlywith p: b 5 0.157 2 0.007p [Fig. 4E]. The dependence of b on pimplies that a few additional points might be all that is needed toidentify an individual in a dataset with a lower resolution. In fact,given four points, a two-fold decrease in spatial or temporal resolu-tion makes it 9.3% less likely to identify an individual, while given tenpoints, the same two-fold decrease results in a reduction of only 6.2%(see Table S1).

Because of the functional dependency of e on p through the expo-nent b, mobility datasets are likely to be re-identifiable usinginformation on only a few outside locations.

DiscussionOur ability to generalize these results to other mobility datasetsdepends on the sensitivity of our analysis to extensions of the data

to larger populations, or geographies. An increase in populationdensity will tend to decrease e. Yet, it will also be accompanied byan increase in the number of antennas, businesses or WiFi hotspotsused for localizations. These effects run opposite to each other, andtherefore, suggest that our results should generalize to higher popu-lation densities.

Extensions of the geographical range of observation are alsounlikely to affect the results as human mobility is known to be highlycircumscribed. In fact, 94% of the individuals move within an averageradius of less than 100 km17. This implies that geographical exten-sions of the dataset will stay locally equivalent to our observations,making the results robust to changes in geographical range.

From an inference perspective, it is worth noticing that the spatio-temporal points do not equally increase the likelihood of uniquelyidentifying a trace. Furthermore, the information added by a point ishighly dependent from the points already known. The amount ofinformation gained by knowing one more point can be defined as thereduction of the cardinality of S(Ip) associated with this extra point.The larger the decrease, the more useful the piece of information is.Intuitively, a point on the MIT campus at 3AM is more likely tomake a trace unique than a point in downtown Boston on a Fridayevening.

This study is likely to underestimate e, and therefore the ease of re-identification, as the spatio-temporal points are drawn at randomfrom users’ mobility traces. Our Ip are thus subject to the user’sspatial and temporal distributions. Spatially, it has been shown thatthe uncertainty of a typical user’s whereabouts measured by its

Figure 2 | (A) Ip52 means that the information available to the attacker consist of two 7am-8am spatio-temporal points (I and II). In this case, the targetwas in zone I between 9am to 10am and in zone II between 12pm to 1pm. In this example, the traces of two anonymized users (red and green) arecompatible with the constraints defined by Ip52. The subset S(Ip52) contains more than one trace and is therefore not unique. However, the green tracewould be uniquely characterized if a third point, zone III between 3pm and 4pm, is added (Ip53). (B) The uniqueness of traces with respect to the numberp of given spatio-temporal points (Ip). The green bars represent the fraction of unique traces, i.e. | S(Ip) | 5 1. The blue bars represent the fraction of | S(Ip) |# 2. Therefore knowing as few as four spatio-temporal points taken at random (Ip54) is enough to uniquely characterize 95% of the traces amongst 1.5 Musers. (C) Box-plot of the minimum number of spatio-temporal points needed to uniquely characterize every trace on the non-aggregated database. Atmost eleven points are enough to uniquely characterize all considered traces.

10 6

10 5

10 4

10 3

10 0 10 1 10 2 10 3

Number of antennas

Inha

bita

nts

Pro

babi

lity

dens

ity fu

nctio

n

Median inter-interactions time per user [h]0 12 24 36 48 60 72 84 96

10 0

10 -1

10 -2

10 -3

10 -4

10 0

10 -1

10 -2

10 -3

10 -4

10 -5

0 500 1000 1500 2000 2500Number of interactions

Pro

babi

lity

dens

ity fu

nctio

n

A B C

Figure 3 | (A) Probability density function of the amount of recorded spatio-temporal points per user during a month. (B) Probability density functionof the median inter-interaction time with the service. (C) The number of antennas per region is correlated with its population (R2 5 .6426). These plotsstrongly emphasize the discrete character of our dataset and its similarities with datasets such as the one collected by smartphone apps.

www.nature.com/scientificreports

SCIENTIFIC REPORTS | 3 : 1376 | DOI: 10.1038/srep01376 3

Unique in the Crowd: The privacy boundsof human mobilityYves-Alexandre de Montjoye1,2, Cesar A. Hidalgo1,3,4, Michel Verleysen2 & Vincent D. Blondel2,5

1Massachusetts Institute of Technology, Media Lab, 20 Ames Street, Cambridge, MA 02139 USA, 2Universite catholique deLouvain, Institute for Information and Communication Technologies, Electronics and Applied Mathematics, Avenue GeorgesLemaıtre 4, B-1348 Louvain-la-Neuve, Belgium, 3Harvard University, Center for International Development, 79 JFK Street,Cambridge, MA 02138, USA, 4Instituto de Sistemas Complejos de Valparaıso, Paseo 21 de Mayo, Valparaıso, Chile,5Massachusetts Institute of Technology, Laboratory for Information and Decision Systems, 77 Massachusetts Avenue, Cambridge,MA 02139, USA.

We study fifteen months of human mobility data for one and a half million individuals and find that humanmobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly,and with a spatial resolution equal to that given by the carrier’s antennas, four spatio-temporal points areenough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find aformula for the uniqueness of human mobility traces given their resolution and the available outsideinformation. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10power of their resolution. Hence, even coarse datasets provide little anonymity. These findings representfundamental constraints to an individual’s privacy and have important implications for the design offrameworks and institutions dedicated to protect the privacy of individuals.

Derived from the Latin Privatus, meaning ‘‘withdraw from public life,’’ the notion of privacy has beenfoundational to the development of our diverse societies, forming the basis for individuals’ rights such asfree speech and religious freedom1. Despite its importance, privacy has mainly relied on informal pro-

tection mechanisms. For instance, tracking individuals’ movements has been historically difficult, making themde-facto private. For centuries, information technologies have challenged these informal protection mechanisms.In 1086, William I of England commissioned the creation of the Doomsday book, a written record of majorproperty holdings in England containing individual information collected for tax and draft purposes2. In the late19th century, de-facto privacy was similarly threatened by photographs and yellow journalism. This resulted inone of the first publications advocating privacy in the U.S. in which Samuel Warren and Louis Brandeis arguedthat privacy law must evolve in response to technological changes3.

Modern information technologies such as the Internet and mobile phones, however, magnify the uniqueness ofindividuals, further enhancing the traditional challenges to privacy. Mobility data is among the most sensitivedata currently being collected. Mobility data contains the approximate whereabouts of individuals and can beused to reconstruct individuals’ movements across space and time. Individual mobility traces T [Fig. 1A–B] havebeen used in the past for research purposes4–18 and to provide personalized services to users19. A list of potentiallysensitive professional and personal information that could be inferred about an individual knowing only hismobility trace was published recently by the Electronic Frontier Foundation20. These include the movements of acompetitor sales force, attendance of a particular church or an individual’s presence in a motel or at an abortionclinic.

While in the past, mobility traces were only available to mobile phone carriers, the advent of smartphones andother means of data collection has made these broadly available. For example, AppleH recently updated its privacypolicy to allow sharing the spatio-temporal location of their users with ‘‘partners and licensees’’21. 65.5B geo-tagged payments are made per year in the US22 while Skyhook wireless is resolving 400 M user’s WiFi locationevery day23. Furthermore, it is estimated that a third of the 25B copies of applications available on Apple’s AppStoreSM access a user’s geographic location24,25, and that the geo-location of ,50% of all iOS and Android traffic isavailable to ad networks26. All these are fuelling the ubiquity of simply anonymized mobility datasets and aregiving room to privacy concerns.

A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier.Yet, if individual’s patterns are unique enough, outside information can be used to link the data back to anindividual. For instance, in one study, a medical database was successfully combined with a voters list to extract

SUBJECT AREAS:APPLIED PHYSICS

APPLIED MATHEMATICS

STATISTICS

COMPUTATIONAL SCIENCE

Received

1 October 2012

Accepted

4 February 2013

Published

25 March 2013

Correspondence andrequests for materials

should be addressed toY.-A. de M. (yva@mit.

edu)

SCIENTIFIC REPORTS | 3 : 1376 | DOI: 10.1038/srep01376 1

Page 15: The evidential value of mobile phone colocation

How accurate is CDR location?

• “Deventer murder case”: under “exceptional” atmospheric conditions, a cell phone uses a cell tower 25 Km away, rather than close-by cell towers

Forensic Statistics and Graphical Models:

Deventer moordzaak, phonecall A28

Maikel Bargpeter

February 3, 2012

This analysis is mainly based on ’Leugens over Louwes’.

The main reason Louwes got involved in the Deventer moordzaak is that hewas the accountant of Mw. Wittenberg and called her on his mobile phoneright before the killing. According to Louwes he was on the highway A28, 25km away from Deventer where the murder took place. So he claims that he isnot the killer.

The police found out the telephone call from Louwes was picked up at a stationin Deventer, a strange place since he claims he was on the A28. Now since itis possible that he was around Deventer at the time of the call, he is lying andthe prosecutor claims (also based on this ’fact’) Louwes is the killer.The prosecutor does its research and finds the following (summary):1: After testing (KPN, expert) under normal conditions a phone signal is lostafter 6.8 km away from the station in Deventer. But also claims that under otherconditions it is not impossible that a phonecall of this type could be made.2: Other research shows that no connection could be made from the A28 andthe signal is lost after driving more than 6 to 7 km away from Deventer. Experts

1

claim it is very unlikely such a connection from the A28 could be made.

Unfortunately most of the research can not be integrated into the graphicalmodel at first sight.

The only way out is: the normal conditions which might be absent at the timeof the phonecall. Hans Meijer looked up reports at a institute in the USA andfind that around that time and place these special conditions did happen in theatmosphere.

Under these special conditions it is possible that a phonecall from that distancecould be made. Making it much more likely Louwes was at the A28 indeed atthe time of the phonecall.Trying to fit this into a graphical model we first we start with the story ofLouwes and try to put it in a graphical model. He claims, he is not the killerand must stand on the fact that under these conditions a call from the A28 ispossible, supporting his story. The prosecutor claims a phonecall is not possibleat any time. The graphical model I made looks like the following:(see model inGeNie)

Here the data inside the definition is made up and based on the informationavailable but should be looked up carefully if used in court. If we condition onthe fact that extreme conditions holds true and the phonecall connected to astation in Deventer we find around 11% chance that he was at the A28. Whichis not as small as the prosecutor claims.

These missing arguments should be researched better by the prosecutor be-fore stating a possibly false claim.

2

Page 16: The evidential value of mobile phone colocation

How accurate is CDR location?

• Deventer murder case: under “exceptional” atmospheric conditions, a cell phone uses a cell tower 25 Km away, rather than close-by cell towers

8

event zijn de kansen dat dit matcht met de verdachte geschat op 0.60 net als voor ouders en ander woonachtig in nabijheid van ouders, voor A op 0.25, M 0.40 en de ander niet woonachtig nabij ouderlijk huis op 0.25.

6.1.11 Event 11 De vijf berichten van 9 oktober die dit event kenmerken, hebben binnen een drie kwartier plaats gevonden tussen half twee en kwart over twee, waarvan drie keer de zendmast gelegen aan de Reinaert de Vosstraat is aangestraald en de zendmasten gelegen aan de Hugo de Grootkade en Donker Curtiusstraat zijn beiden eenmalig aangestraald.

De zendmasten blijken rondom de woning van M (paarse punt) te liggen, waarvan de meest aangestraalde zendmast het verst weg is gesitueerd. Gegeven de locaties van de zendmasten is het meest aannemelijk dat dit matcht het meest met M en is daarom ook geschat op 0.70. Voor alle andere is dit minder aannemelijk maar niet onwaarschijnlijk is en daarom zijn de kansen van de anderen op 0.40 geschat.

6.1.12 Event 12 Dit event telt 20 berichten en is verspreidt over drie dagen. In de ochtend en de avond van de eerste dag worden de zendmasten nabij het ouderlijk huis aangestraald. De daaropvolgende dag zijn de zendmasten in Duivendrecht en Purmerend aangestraald. De gebruiker van de telefoon kan hier niet mee geïdentificeerd worden, maar uit de berichten kan wel worden opgemaakt dat de dag erop een transactie plaats vindt. op 11 oktober bevindt de telefoon zich nabij de Termietengouw te Amsterdam. De verbalisant heeft opgemerkt dat de zendmast aan de Termietengouw ligt nabij de Diopter. Een week hiervoor heeft Y een offerte voor een lening ontvangen om een woning in deze straat te kopen. In december heeft hij de woning ook daadwerkelijk gekocht. Daarna reist de gebruiker van de telefoon af naar Almere om daarna terug te gaan naar Amsterdam. Vervolgens wordt de zendmast gelegen aan de

9

Donker Curtiusstraat, welke gelegen is nabij de woning van M, aangestraald. Gegeven dat ‘s ochtends de telefoon aangestraald is nabij het ouderlijk huis en twee dagen later nabij het pand waar de verdachte een week eerder een offerte voor een lening heeft ontvangen, is de kans dat hij de telefoon in zijn bezit heeft geschat op 0.8. Voor zijn ouders is het minder aannemelijk dat zij bij de Diopter zijn wezen kijken en daarna via Almere terug naar Amsterdam, is de kans dat zij de telefoon in hun bezit hebben geschat op 0.65. Voor K1 hebben we de kans geschat op 0.55. Dit event wijst niet direct naar A of M. Daarom hebben we hun kansen op 0.25 geschat. Voor K2 is het nog lager, namelijk 0.20. 6.1.13 Event 13 Het enige bericht dat is verzonden is verstuurd in de nabijheid van Rijnstraat 35 in Amsterdam. Deze aangestraalde zendmast ligt in de buurt van een doorlopende weg en is mogelijk in de richting van de woning van broer A. Omdat dit niet heel nauwkeurig is, hebben we besloten dit bericht niet in de verdere analyse mee te nemen. 6.1.14 Event 14 Dit event bevat vijf berichten. Bij één bericht is de locatie niet bekend. ‘s Ochtends is de telefoon aangestraald nabij het ouderlijk huis. Twee uur later worden twee verschillende zendmasten aangestraald in dezelfde minuut. Dit zijn de zendmasten Den Briel straat en de Donker Curtiusstraat te Amsterdam.

Een mogelijke verklaring is dat de gebruiker van de telefoon onderweg is vanaf de snelweg (A10) riching de binnenstad van Amsterdam. Een andere verklaring zou kunnen zijn, dat de gebruiker van de telefoon op dat moment boodschappen aan het doen zou zijn op de Centrale Markt, gelegen in het grijze gebied tussen de locaties van de twee zendmasten in. Dit zou overeen kunnen komen met het profiel van M,

event 11 event 14

An Amsterdam drugs case – 2 of 19 events blue = cell towers, purple = addresses associated with suspect

Page 17: The evidential value of mobile phone colocation

How accurate is CDR location?

• Deventer murder case: under “exceptional” atmospheric conditions, a cell phone uses a cell tower 25 Km away, rather than close-by cell towers

RDG, 12 August 2012 Data: Google latitude; my trip: train

Page 18: The evidential value of mobile phone colocation

End of interlude. Now: Our approach

• Part I: investigate reliability of search procedure

• Part II: quantify evidential value of specific pairs of co-locating phones using permutation approach

Page 19: The evidential value of mobile phone colocation

Part I: the experiment• Chose one target phone from case

• Identified all notable three-point patterns of movement

• Identified all matches (“hits”) to each pattern

• Followed each hit forwards in time to first dis-location event (“anomaly”)

Page 20: The evidential value of mobile phone colocation

Part I• Measure mobility, and (phone) activity, of hit and of

target, in first four days

• Mobility: Km travelled

• Activity: number of calls

• Investigate relation between these four variables and time to first anomaly for our sample of hits

Page 21: The evidential value of mobile phone colocation

Summary

• Dichotomise each of four variables (“high” vs “low”)

• Score each hit by number of highs (0 to 4)

Page 22: The evidential value of mobile phone colocation

Chance of anomaly per day is roughly constant

Joint

Exponential Fit

Page 23: The evidential value of mobile phone colocation

Chance of anomaly per day is roughly constant

• Very high: sum score 3 and 4: half life (of time to anomaly) is one day

• Medium: sum score 2: half life is two days

• Very low: sum score 0 and 1: half life is four days

Page 24: The evidential value of mobile phone colocation

If we believe this, then ...

• no anomaly for 10 half lives: 1 in a thousand

• no anomaly for 20 half lives: 1 in a million

Page 25: The evidential value of mobile phone colocation

Conclusion of part 1• The “chance of coincidence” depends strongly on individual

characteristics of particular two phones

• The investigative procedure is reliable

• first, identify suspects (pattern-hits which continue to colocate a few days)

• second, confirm suspects (long term follow-up)

• … so we needn’t worry about Texas sharpshooter (we’ll analyse long term follow-up data)

• We do have a major reference class problem

Page 26: The evidential value of mobile phone colocation

Part 2• Take two co-locating phones: could this be coincidence?

• We need to compare the observed history of a pair of phones with that of similar pairs of phones of different persons

• Especially: similar activity, similar mobility, frequenting the same locations

• Assumption: if two persons are completely unrelated then we may as well compare Mr X Day A with Mr Y Day B, as Mr X Day A with Mr Y Day A

Page 27: The evidential value of mobile phone colocation

Problems• “Completely unrelated but similar” persons do live in

the same neighbourhood, work in the same neighbourhood, frequent the same shops, cafés, places of worship, beach clubs, sporting events, ...

• We should condition on confounders (all days are not exchangeable)

• Problem of observational (as opposed to experimental) studies: the unknown unknowns

Page 28: The evidential value of mobile phone colocation

Our solution• Compare history of phone X with artificial histories like

phone Y ’s, obtained by permuting (shuffling)Y ’s days

• Shuffle weekdays and weekend-days separately

• Distance between two histories: total kilometers between consecutive calls on same day of different phones

• Note: “artificial histories” need not be realistic in all respects – they should just be realistic in relevant respects

Page 29: The evidential value of mobile phone colocation

Original vs. Shuffled (simulated data)

Page 30: The evidential value of mobile phone colocation

Findings

• Discovered co-locations are statistically very significant

• In retrospect we could better have used a different similarity measure, etc…

• We reported to the court exactly what we did do, and all that we did do

Page 31: The evidential value of mobile phone colocation

Future research• Invent better distance measure (model based LR?)

for higher power (note: not for validity)

• Should refine permutation procedure (shuffled histories may be unrealistic when overnight location can vary)

• As we condition on more confounders, reference population shrinks, prior probabilities change – relevant evidence moves out of our analysis but is still relevant