information fusion methods for location data analysis
TRANSCRIPT
Information fusion methods for location data analysis
Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei
Doctorate School in Industrial Innovation Engineering
Thesis outline
• Introduction
• Data Fusion for Event Detection and Event Description Using Agg. CDR
• Re-identification of Anonymized CDR Records Using Information Fusion
• Privacy issues
• Conclusions
Data Fusion and Location data
• Data Fusion
• Location Data types:
- CDR (Call Description Records) aggregated or individual.
- Geo-tagged social network data or LBS as Foursquare
- Location data as Open data. Example: census data.
Data fusion for event detection by using aggregated CDR and geo-tagged social network data
Detecting and describing events happening in urban areas by analysing spatio – temporal data
• Detecting and describing events happening in urban areas by analysing spatio – temporal data
• Prevoious works: Laura Ferrari, Marco Mamei, Massimo Colonna (2012) : “ People get together on special events: Discovering happenings in the city via cell network analysis ” Pervasive Computing and Communications Workshops (PERCOM Workshops), 2012 IEEE International Conference on.
• Publication: Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event
Detection” In: Journal of Ambient Intelligence and Humanized Computing, pp 1– 15.
The dataset: spatio-temporal aggregation
Spatial Aggregation
Temporal aggregation
Outlier detection method
IQR method : [LB,UB] = [Q25 – k*IQR, Q75 + k*IQR]
M method : [LB,UB] = [Q50 – k*Q50, Q50 + k*Q50]
Q75 method : [LB,UB] = [Q25 – k*Q25, Q25 + k*Q75]
Groundtruth dataset
Football matches
Fairs
Protests
Other events, large crowds
Events happening in the period of time the data covers
Measuring precision and recall of the system
True positives (tp)
False positives (fp)
False negatives (fn)
Precision = tp / (tp + fp)Recall = tp / (tp + fn)
Precision – Recall of event detection system : CDR
By combining the results from the two datasets• Improvement of precision –
recall performance of the method
• The improvement is limited in the long run by the main dataset.
• The same improvement can be observed also by joining the results of the other datasets.
Improving event detection results by data fusion
By using the CDR data the events can be detected but not described:
• By joining the results the data can complement and enrich each other.
• In this case the social dataset can be used to describe semantically the events
Data fusion for Event description
Re-identification of CDR data by using social network geo-tagged data
Information fusion for anonymized CDR data de-anonymization.
Montjoye, Y. et al. (2013). “Unique in the crowd. The privacy bounds ofhuman mobility”. In: Scientific Reports 3, pp. 161 –180
Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between Anonymized CDR and Social Network Data”. Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.
CDR and Social: event distribution and R.G
Mobility measures and uniqueness of users mobility (unique in the crowd)
Knowledge extraction : uniqueness of traces
Knowledge extraction : uniqueness of mobility traces
• Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two users are the same?
• Question is both novel (no other works addressing it in this domain) and fundamental
• Conditional probability
• Even the percentage is low in a data set of millions of users there is a consistent number of them that can be identified.
Re-identification : probabilistic approach
Conclusions• Information fusion as a an enabling process for novel applications
- Future work oriented towards the “structured data fusion” idea
• Privacy
- anonimty VS re-identification and remaining utility of data
- variations of existing privacy preserving techniques (Differential privacy.)
Publications• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “
Collective Awareness for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in enabling technologies for collaboration 17-20 2013.
• Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data ”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014.
• Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and Humanized Computing, pp 1– 15.
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227.
• Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.