IDENTIFYING SOCIAL MARKERS FROM NETWORK DATA BASED ON LOCATION,MOBILITY AND PROXIMITY
By
UDAYAN KUMAR
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2012
c⃝ 2012 Udayan Kumar
2
A Sanskrit saying which means “Good rapport and friendship develops among thosewho share a similar outlook on life and hobbies. Thus, deer flock with deer, cows withcows, and horses with horses. In the same manner, fools frequent fools and the wise
bond with the wise.”
3
ACKNOWLEDGMENTS
From my initial thoughts of pursuing a PhD to actually finishing it, almost every step
was confusing and the end of tunnel was never visible. However, the constant support
and encouragement I received along the way kept the hope alive; one by one all the
pieces of the puzzle fell into place. Reflecting back, I feel that there are more people to
acknowledge than I can possibly remember. Whether I think about my family members,
teachers, friends, and even random strangers who were tolerant enough to listen to
my crazy ideas and give me their point of view. However, if I look at a broader level,
I would like to thank not only these people but also their parents because obviously
these people are here because of their parents. But their parents also had parents
who themselves had parents, so I would like to thank everyone on this chain going
backwards all the way upto the first living creature on Earth. I am also thankful to the
creator of life of the Earth and the creator the Earth. Obviously life would not have
been possible without the creation of Sun and rest of the Universe. So I want to thank
the creator of the Universe. But this makes me wonder why would anybody undertake
such a giant enterprise? That is creating the whole universe, with billions and billions
of galaxies, stars, planets and life forms. May be this is someone’s PhD project. So, am
I a simulation object? In that case, I want to withdraw all my thanks, this was anyways
supposed to happen!
4
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 USER CLASSIFICATION AND FEATURE EXTRACTION FROM WLAN TRACES 20
2.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.1.1 Location Based Classification (LBC) . . . . . . . . . . . . . . . . . 24
2.1.1.1 Individual Behavior based Filtering (IBF) . . . . . . . . . 252.1.1.2 Group Behavior based Filtering (GBF) . . . . . . . . . . . 262.1.1.3 Hybrid Filtering (HF) . . . . . . . . . . . . . . . . . . . . 32
2.1.2 Name Based Classification (NBC) . . . . . . . . . . . . . . . . . . 332.2 Validation of Location Based Classification . . . . . . . . . . . . . . . . . 34
2.2.1 Temporal Consistency Validation Using Adjacent Months . . . . . . 352.2.2 IBF vs GBF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.2.3 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 User Behavior Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.3.1 User Spatial Distribution . . . . . . . . . . . . . . . . . . . . . . . . 392.3.2 Average Duration or Temporal Analysis . . . . . . . . . . . . . . . . 412.3.3 Device Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.4.1 Mobility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.4.2 Protocol Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.4.3 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.4.4 Resource Management . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5 Conclusion And Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 47
3 BREAKING ANONYMITY IN WLAN TRACES . . . . . . . . . . . . . . . . . . . 49
3.1 Information In WLAN Traces . . . . . . . . . . . . . . . . . . . . . . . . . . 513.2 Need For Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.4 Attack Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4.1 Identify Your Own MAC In Trace . . . . . . . . . . . . . . . . . . . . 573.4.2 Identifying Building Codes . . . . . . . . . . . . . . . . . . . . . . . 573.4.3 Identifying A Person . . . . . . . . . . . . . . . . . . . . . . . . . . 583.4.4 Multiple Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5
3.5 Analysis and Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.5.1 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 613.5.2 Practical/Trace Analysis . . . . . . . . . . . . . . . . . . . . . . . . 62
3.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 65
4 AN ENCOUNTER-BASED FRAMEWORK FOR TRUST . . . . . . . . . . . . . 67
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.2 Architectural overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.1 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.2.2 Overall Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 Trust Adviser Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.3.1 Aggregation Based Similarity . . . . . . . . . . . . . . . . . . . . . 75
4.3.1.1 Frequency of Encounters (FE) . . . . . . . . . . . . . . . 754.3.1.2 Duration of Encounters (DE) . . . . . . . . . . . . . . . . 76
4.3.2 Behavior Based Similarity . . . . . . . . . . . . . . . . . . . . . . . 764.3.2.1 Profile Vector (PV): . . . . . . . . . . . . . . . . . . . . . 764.3.2.2 Location Vector (LV): . . . . . . . . . . . . . . . . . . . . 774.3.2.3 Behavior Matrix (BM) . . . . . . . . . . . . . . . . . . . . 77
4.3.3 Hybrid Filter (HF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.4 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.1 Detection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.4.2 Attacker Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 Trace Based Evaluation and Analysis . . . . . . . . . . . . . . . . . . . . 864.5.1 Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.5.2 Filter Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5.2.1 Statistical Characterization . . . . . . . . . . . . . . . . . 904.5.2.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 904.5.2.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.5.2.4 Graph Analysis . . . . . . . . . . . . . . . . . . . . . . . . 914.5.2.5 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . 92
4.5.3 Selfishness & Trust Routing in DTN . . . . . . . . . . . . . . . . . . 934.6 Survey and Implementation Based Validation . . . . . . . . . . . . . . . . 96
4.6.1 Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.6.2 iTrust Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.6.2.1 Application Evaluation: . . . . . . . . . . . . . . . . . . . 1004.6.2.2 Energy Efficiency . . . . . . . . . . . . . . . . . . . . . . 1054.6.2.3 Location estimation . . . . . . . . . . . . . . . . . . . . . 105
4.7 Discussion: Other Trust Inputs . . . . . . . . . . . . . . . . . . . . . . . . 1064.7.1 Blacklist & Whitelist . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064.7.2 Recommendation & Reputation Systems . . . . . . . . . . . . . . . 1074.7.3 Contextual & Event Information . . . . . . . . . . . . . . . . . . . . 1074.7.4 Combined Trust Recommendation . . . . . . . . . . . . . . . . . . 108
4.8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 109
6
5 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . 112
APPENDIX
A CODE SNIPPETS FROM iTrust APPLICATION . . . . . . . . . . . . . . . . . . 115
A.1 Energy Efficient Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . 115A.2 Calculating LV ( Sec. 4.3.2) . . . . . . . . . . . . . . . . . . . . . . . . . . 116
B ENERGY EFFICIENT DEVICE DISCOVERY . . . . . . . . . . . . . . . . . . . 118
B.1 Available Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118B.2 Evaluations Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118B.3 Current Progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
B.3.1 Combining WiFi And Bluetooth Scanning . . . . . . . . . . . . . . . 120B.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
C USER BEHAVIOR ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123C.0.1 Spatial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 123C.0.2 Temporal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 123
D SURVEY FORM - iTrust VALIDATION . . . . . . . . . . . . . . . . . . . . . . . 126
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7
LIST OF TABLES
Table page
2-1 Average Silhouette Width for Sorority and Fraternities from University U1 andU2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2-2 Results of classification of users from U1 (LBC) and U2 (NBC). ‘Common’signifies the users which were common to both male and female population. . 34
2-3 Similarity in the user population selected after filtering fraternity users for U1 . 36
2-4 Similarity in the user population selected after filtering sorority users for U1 . . 37
2-5 Validation - comparing users selected by IBF and GBF for U1 . . . . . . . . . . 37
2-6 Cross validation of LBC by NBC for U2 . . . . . . . . . . . . . . . . . . . . . . . 38
3-1 WLAN trace sample: before and after anonymization . . . . . . . . . . . . . . . 51
3-2 Fields present in each record of wired trace, basically a IP-Header . . . . . . . 53
3-3 Result of finding users with similar location visiting sequences with varyingduration of the trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4-1 Overhead of Filters in terms of processing and storage. Here m is the totalno. of records in the encounter file, n is the no. of unique encountered user, lis no. of locations visited d represents the no. of days used for BM calculations.We also assume that m >> n. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4-2 Facts about studied traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4-3 False positives and negatives while using the proposed anomaly detection (inpercentage) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
B-1 Accuracy Loss using traces for 20 users, EE4 means 4 times the minimumscan period is the upper bound of scan interval, similarly in EE8, the upperbound on skip period is 8. This result used Bluetooth traces only. Lesser valuesis better . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B-2 Scan Efficiency using traces for 20 users, EE4 means 4 time the minimumscan period is the upper bound of scan interval, similarly in EE8 & EE16 its 8& 16 times respectively. This result used Bluetooth traces only. Higher valueis better . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
B-3 s/e ratio for Star, MIMD and FIBO algorithms . . . . . . . . . . . . . . . . . . . 121
B-4 Combining Wi-Fi and Bluetooth scanning . . . . . . . . . . . . . . . . . . . . . 122
C-1 Spatial Distribution of Users at U2 . . . . . . . . . . . . . . . . . . . . . . . . . 124
8
C-2 Spatial Distribution of Users at U1 . . . . . . . . . . . . . . . . . . . . . . . . . 124
C-3 Average Duration of Users at U2 . . . . . . . . . . . . . . . . . . . . . . . . . . 124
C-4 Average Duration of Users at U1 . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9
LIST OF FIGURES
Figure page
2-1 Query based user grouping technique . . . . . . . . . . . . . . . . . . . . . . . 22
2-2 A sample trace database snapshot . . . . . . . . . . . . . . . . . . . . . . . . . 22
2-3 Gender grouping in Fraternities and Sororities . . . . . . . . . . . . . . . . . . . 24
2-4 Session count for fraternity and sorority users . . . . . . . . . . . . . . . . . . . 27
2-5 Session count for fraternity and sorority users . . . . . . . . . . . . . . . . . . . 28
2-6 Session count for fraternity and sorority users . . . . . . . . . . . . . . . . . . . 30
2-7 Session count for fraternity and sorority users . . . . . . . . . . . . . . . . . . . 31
2-8 Session count for fraternity and sorority users . . . . . . . . . . . . . . . . . . . 31
2-9 Session count for fraternity and sorority users . . . . . . . . . . . . . . . . . . . 32
2-10 Comparison of user distribution across the university U1 campus (in Percentage) 40
2-11 Comparison of user distribution across the university U2 campus (in Percentage) 41
2-12 Average duration of male and females in different Areas of university U1 campus 42
2-13 Average duration of male and females in different Areas of the university U2campus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2-14 Device distribution by manufacturer at university U1 . . . . . . . . . . . . . . . 44
2-15 Device distribution by manufacturer at university U2 . . . . . . . . . . . . . . . 45
3-1 Attacker capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3-2 Percentage of no. of users found, when 111 filters based on gender+major+manufacturerare applied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3-3 UL at n = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3-4 Results of the combination generation and sequence matching for randomlychosen 230 users out of 27K users belonging to the month of Nov 2007. Thisgraph shows Pi and ni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4-1 Block Diagram overview of the iTrust architecture. Dotted lines indicate modulesneeded by iTrust. Shaded blocks indicate modules discussed in this work. . . . 73
4-2 Location Vector LV for a user . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4-3 Behavior Matrix for a user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
10
4-4 The growth of trust score using FE filter for a specific user. Each line correspondsto an encounterd user. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4-5 The growth of trust score using FE filter using the attacker model. Each linecorresponds to an instance of attacker generated by the model. . . . . . . . . . 85
4-6 Similarity score for various filter for all the encountered pairs of users in Nov2007 from U1 trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4-7 Correlation between the trusted lists produced by various filters at T=40% . . . 88
4-8 Comparison of trust list belonging to different history for various filters at T=40%(note that the y-axis scale for DE , FE , and LV − C starts at 85% and for LV −D and BM the scale starts at 35%) . . . . . . . . . . . . . . . . . . . . . . . . . 89
4-9 Normalized Clustering Coefficient and Normalized Path Length . . . . . . . . . 92
4-10 Flow chart for iTrust routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4-11 Average unreachability with varying Trust and Selfishness using DE filter . . . 97
4-12 Hybrid filter results when T=40%. Number on the legend indicated the ratio ofscore from each filter. For e.g. 1211 implies αDE = 0.2, αFE = 0.4, αLV−D =0.2, and αBM = 0.2 and 0100 implies αDE = 0, αFE = 1, αLV−D = 0, andαBM = 0 (Sec. 4.3.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4-13 Survey Results showing user’s propensity to communicate with other users invarious communication scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4-14 Illustration of iTrust’s component and their interactions . . . . . . . . . . . . . . 99
4-15 Screenshots of iTrust application. Fig. A shows the main screen where encounterusers are sorted by the filter score. Current encounters marked with Greencircles. Trusted users are shown in Blue color. Fig. B shows details for an encountereduser. Fig. C shows user encounters on Map. Fig. D shows the registrationscreen for optional users information discovery service. Fig. E shows screenwhere display order of encountered users can modified. Fig. F shows the screento select weights for the Hybrid filter (in the app it is referred as combined filter).Fig G. Shows the screen where user can check self statistics regarding encounters.It also shows the number of scans saved due the use of energy efficient scanner.Fig H. Shows the menu. Menu allows the user to jump from one screen toanther. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11
4-16 Continuation of screenshots of iTrust application. Fig. A shows the settingsscreen. Fig. B shows number of encounters the user had with a particularuser over a period of time. This feature allows a user to know more about encounteringusers. Fig. C shows a graphs from the Self-Stat screen of the application.Here the graphs show the total number of encounter this user had with respectto time. Fig. D shows the about page with author information and web link foriTrust. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4-17 iTrust evaluations based on application usage. Fig. A shows the percentageof trusted users in 1 to 10 Top user, 11 to 20 Top users for each filter. Fig.Bshows the percentage of total trusted users in Top 1 to 10, 11 to 20, etc. FigC. shows fraction of encounter users needed (from top) to capture ‘x’% of trustedusers for each filter. Fig D. shows the Normalized Discount Cumulative Gainscore for iTrust recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . 103
A-1 Evolution of features in the iTrust app based on feedback from user. . . . . . . 115
12
Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy
IDENTIFYING SOCIAL MARKERS FROM NETWORK DATA BASED ON LOCATION,MOBILITY AND PROXIMITY
By
Udayan Kumar
December 2012
Chair: Ahmed HelmyMajor: Computer Engineering
The ubiquitous spread of mobile devices, global connectivity and tight coupling of
mobile phones with the users has lead to an era, where mobile phones have become
alter ego of the users. Mobile devices accompany users to places where not even the
closest of family and friends are allowed (e.g. office, meetings, conferences among other
places). The access to these movement and network access logs from mobile devices
can shed light on human behavior, which in turn can be used to solve several research
challenges.
In this work, we present our measurements, analysis and designs obtained by
utilizing network traces collected at both personal and group level. We have used
network traces from several thousands of devices to understand, identify and extract
social markers or characteristics. The social markers we have studied include social
grouping based on gender, proximity-based trust and the difficulty of anonymizing traces
because of mobility.
In the first part, we discuss how social-grouping information can be extract from
anonymized network traces. Using a gender-based case study, we demonstrate our
approach, along with different methods to validate the results. In the second part, we
study the fundamental trade off between the utility of WLAN traces and privacy of the
users. We show how privacy of users in anonymized traces can be compromised. In the
third and the final part, we implement, and evaluate an effective framework to establish
13
trust in mobile networks through a protocol that we call iTrust. We present results of
our trace based-analysis and of user-study based on the deployment of iTrust mobile
application.
14
CHAPTER 1INTRODUCTION
The ubiquitous spread of mobile devices, global connectivity and tight coupling of
mobile phones with the users has lead to an era, where mobile phones have become
alter ego of the users. Mobile devices accompany users to places where not even the
closest of family and friends are allowed (e.g. office, meetings, conferences among
other places). This tight coupling can be used to not only provide connectivity to the
Internet but also to provide personalized services based on the behavior patterns of
the user. An example application can be a game application that customizes itself
based on the free time a user has. Lets say that a user always commutes to work
using public transport and on the way uses the mobile device to play games. A device
that can detect this context can pass on the approximate duration of the commute to
the game application, allowing the game application to generate a game that can be
finished during the commute. Here the phone was reading the sensors such as GPS
and accelerometer to inferring user context. The general idea is that we can use/design
sensors that can sense everything experienced by a user. Once we have these sensor
readings, everything presented to a user can be customized.
Applications in the above example were based on the behavior sensing from a
single device, what if we have access to sensor information from all the devices? Can
we predict traffic congestion even before it happens by considering the total number
of people heading towards the freeway. Can we study the movement patterns of the
population to predict the spread of infectious diseases. Can we guess the type of
relationships existing between a pair of users? Several crowd sourcing applications have
been developed to collect data from a large pool of users (if not all) to get a global view.
The challenges that still remain even after having access to this kind of data include
handling this data (scale of data can be huge, imagine that a data tuple is generated
for every person at every minute), developing algorithms that can generate meaningful
15
inferences, and implications over the privacy of the users. In many cases, obtaining
inferences is challenging due to non-existence of standard models and theorem that can
relate sensor readings with human behavior characteristics. It is also difficult to validate
and verify inference propositions as it is a challenge by itself to access the ground truth
because of the scale data collection (from several hundreds to billions of users).
In this work, we present our measurements, analysis and designs obtained by
utilizing network traces collected at both personal and group level. We have restricted
ourselves to consider analysis and design based on location, mobility and proximity
features. We have attempted to related these features to social characteristics or
markers including social grouping based on gender, homophily based trust and the
implications on anonymization of traces due to mobility. Due to the lack of any suitable
verification methods, we have developed our own verification methods. We perceive that
several application can benefit from our analysis techniques and results, but also from
the verification methods developed in this work.
The access to the social markers and context can allow researchers to understand
and model characteristics of human behavior, create new services, make applications
context aware among other possibilities. For example, it has been shown how the
random mobility model does not capture the actual human mobility [16] and having
information on social and community structures can create better mobility models [37].
Recently, researchers have shown how context can be sensed and used to provide new
applications [51]. In last part of this work, we present how social science’s principle of
homophily can be measured using mobile devices and how that can be used to generate
trust in the network. The understanding of social data and user behavior has lead to
development of a new field of study called Computational Sociology [52] [71].
In this work, we present methods to extract social markers such as gender based
grouping and proximity-based trust. The challenge we faced in accomplishing this task
include non-availability of any kind of personal information about the users (mainly due
16
to anonymization for privacy protection). Thus we have not only developed methods to
extract this kind of information from anonymized traces (data-sets), we also developed
methods to validate our results in absence of ground truth. This work is divided primarily
into three main topics: 1. user classification into groups with case study on gender
based grouping, 2. Challenges of anonymization of traces due to mobility, 3. Proximity
based trust.
In the study on user classification, we present two novel scientific techniques
to classify WLAN users into social groups. The first technique uses mapping of the
traces into buildings (e.g., dept. buildings, libraries, sororities and fraternities) to extract
affiliation and gender information based on network usage statistics. The second
technique utilizes directory (phone-book) information that can be linked to WLAN users
to extract useful information. For example, usernames of the WLAN users (if available)
can be used to find user’s gender based on first name and databases. As a case study
we perform classification and behavior analysis of users by gender. Extensive WLAN
traces from two major universities are collected over three years and analyzed. Results
from both the methods are cross-validated and show more than 90% correspondence.
Comparing usage patterns provided interesting results including males spend more time
online that females and females prefer Apple computers over PCs.
In the second part we study the fundamental trade off between the utility of WLAN
traces and privacy of the users. The study provides several realistic case studies in
which privacy attacks may be conducted. We then provide an analysis of these attacks
and drawbacks of the existing anonymization techniques. Our initial quantitative analysis
to estimate mobile users’ k-anonymity in WLAN traces shows surprisingly unique usage
patterns, which may compromise anonymity. The main contribution of this work is to
articulate the compelling challenges facing anonymization of wireless networks traces
and to shed light on the answer to an intriguing question: Just how private are wireless
networks traces?
17
For the third part, we implement, and evaluate an effective framework to establish
trust in mobile networks through a protocol that we call iTrust. The goal of iTrust is
to provide accurate and robust trust scores to encountered devices, in an efficient,
privacy-preserving and resilient manner. We borrow from the social science principle
of homophily ; a tendency of individuals to interact and trust similar others. We
introduce and analyze a family of encounter-based trust adviser filters that make trust
recommendations based on encounter frequency, duration, location behavior-vector,
and location preference behavior-matrix. We present a proof of concept application for
Android and Linux-based mobile devices. We also conduct a user study to validate the
trust recommendations generated by iTrust. With this trust, several potential applications
can be enabled including mobile social networking, building groups and communities of
interest, localized alert and emergency notification, context-aware and similarity-based
networking.
The contributions of this work can be categorized into two components, 1.
Intellectual contributions and 2. Effort contributions.
Intellectual contributions include:
1. Applied existing data mining techniques to classify network users into socialgroups. Created methods to statistically validate the results in the absence ofground truth.
2. Identified techniques to break anonymization of the traces by capitalizing on themobility of a user.
3. Introduced methods to infer/recommend trust/friendship among encountering user,using several outlooks. We proposed several privacy preserving filter or metricsthat can be used to measure similarity.
Effort contribution include:
1. iTrust and Profile-Cast implementation on mobile devices.
2. Collection of UF wireless network traces
3. Creation of Bluetooth trace library.
18
4. Developed several basic building blocks like scanner, parsers for Android, Nokiaand Openmoko platform.
In the following sections we present each piece of work in detail, starting with user
classification into groups, then anonymization and finally discussing proximity based
trust.
19
CHAPTER 2USER CLASSIFICATION AND FEATURE EXTRACTION FROM WLAN TRACES
In future mobile networks, with many hand held devices tightly coupled with a user,
communication performance is bound to user mobility and behavior. This applies to
various kinds of mobile networks, including cellular networks, but more particularly
ad-hoc and delay tolerant networks (DTNs), because every node may act as a router
and the network may be infrastructure-less. In such an environment, it is imperative
to understand the various aspects of user behavior, including mobility, commonalities,
differences in preference, and net activity between classes of users, in order to design
efficient protocols and effective network services.
We propose a new approach to classification and feature analysis of user behavior
based on social grouping, using a set of techniques which can be used to provide
information about a user from social perspective. The best source of information about
real user mobility and network usage comes from WLAN (Wireless LANs) traces. These
traces have been used in many studies whenever real user data is required. They
have been previously used to validate mobility models [37, 65] and understand user
associations [36] among other usages. We, in this work, propose to use WLAN traces
(generally considered for studying network characteristics) to mine social behavior of
the users based on gender, majors, and other interest groups. We present a general
methodology with an example case study of grouping by gender, and investigate gender
gaps in WLAN usage. The lack of such empirical data poses an interesting challenge
and raises several research (and privacy) questions: How can we meaningfully infer
gender information from such anonymous traces? Does gender information influence
user behavior and preference in a significant and consistent manner? Finally, what is the
impact of these finding on network modeling, protocol and service design in the future?
Gender based studies have been conducted in the past to study issues such as
difference in technology adoption for the wired Internet [30]. This paper is the first,
20
to our knowledge, to scientifically analyze WLAN usage patterns in mobile societies
across user groups. Our study begins by introducing a location-based method for
gender classification on campus. It provides robust filters, based on individual and
group network behavior, in addition to clustering techniques, to identify males and
females with high confidence. We analyze extensive Wireless LAN traces collected for
over 3 years from 2 major universities covering more than 50,000 users. The findings
are cross validated with ground truth from Name based method and yield over 90%
success. Once the gender classification is performed, a thorough investigation of the
spatio-temporal characteristics of the gender based network activity is conducted.
Among the parameters we have considered for evaluating the gender gaps, we found
enough statistical evidence to conclude that (for the traces used in our study) usage
patterns of males and females are different, and that gender does affect user activity
and vendor preference. We believe that such attributes will certainly enhance the
understanding of the mobile society and is essential to provide efficient network
protocols and services in the future. Our findings also indicate that the problem of
mobile user privacy should be re-visited.
Contributions: This paper provides following contributions: i. class and gender
inference methods based on location, usage and name filtering from extensive WLAN
traces, ii. providing the first gender-based trace-driven analysis in mobile societies,
including study of majors and device preferences, iii. identifying unique features in the
studied grouping that suggests consistent behavior and the design of potential future
applications.
The rest of the paper is outlined as follows: Sec. 2.1 discusses multiple techniques
for user classification, followed by Sec. 2.2, which provides several methods for
validating the classification. Sec. 2.3 provides the gender-based feature analysis
and results and Sec. 2.4 discusses potential applications. Conclusion and the future
work is presented in Sec. 2.5.
21
Figure 2-1. Query based user grouping technique
Figure 2-2. A sample trace database snapshot
2.1 Approach
In this work, we consider WLAN traces to understand usage characteristics/behavior
pattern of social groups. WLAN traces are logs of user association with a Wireless
Access Point (AP). Traces generally contain machine’s MAC address, associating time,
duration and associated AP. MAC address is always anonymized to protect privacy of
the user. How can we begin to classify all the students into social groups like gender
and study major using only the publicly available information and traces [7][41]? Having
a meaningful classification with this partial information is the main challenge that we
address in this work. Ideally, we would want to classify all users into groups. Taking a
first step in this direction we present a general technique, which can be used to classify
a smaller section of WLAN users into groups. Doing it for all the users still remains a
challenge as we shall see. Instead, we focus on obtaining a sample significant enough
for a statistical analysis.
22
Our technique works on raw WLAN SNMP and SYSLOG traces. The traces are
accumulated for a time period and parsed into a standard format as shown in Fig. 2-1.
We use the location information of the APs, in the form of buildings in which they are
located. This helps to identify the geographic locations of a user at a later stage. Mobility
of users can be tracked by looking at the approximate geographic locations of the APs.
The processed data is fed into a database on which SQL queries can be run easily
(and generically) to extract information of interest. Fig 2-2 illustrates the generic trace
database layout, which is used in our experiment. The fields include the following: 1.
anonymized MAC addresses of the wireless devices logged onto the WLAN, 2. the
session start time (in seconds), 3. the AP with which the wireless device associated, 4.
Duration of the association with the AP, 5. the manufacturer of the wireless card (which
we inferred from partial MAC address), and 6. the building at which the AP is located
(inferred based on a map), this field is external to the actual traces. Two-dimensional
co-ordinates can be inbuilt into the database based on a campus grid map to allow
mobility based queries to be performed as well. In some cases, if more information such
as usernames are available, we can add more fields to the database. The advantage
of having a standard schema for the database is that similar queries can be used on
traces coming from multiple sources. We have used this same database framework to
analyze traces from USC[41], Dartmouth [7], UF and UNC[3]; the method is general and
applicable to many traces (campuses and urban) and several grouping criteria.
Trace collection process, environment, and anonymization used have a great impact
on the utility of the traces and since traces coming from different sources may have
totally different processing and information. Its very difficult to find one general method,
which would classify users in all settings, therefore we propose multiple methods. As it is
very difficult to get hold of this data, it is even more difficult to validate it. We have used
several statistical methods to give us confidence in the classification and cross-validated
with name-based approach; closest possible to the ground truth at a large scale.
23
Figure 2-3. Gender grouping in Fraternities and Sororities
We use traces from two universities, U1 and U2 (names withheld for privacy
reasons) that provide information as shown in Fig. 2-2 except that university U2 trace
also provides the usernames. Traces from U1 belong to Feb 2006, Oct 2006 and Feb
2007, and Traces from U2 belong to Nov 2007 and Apr 2008. The grouping parameter
we use in this work for investigation is gender based. To do this categorization, we
propose two novel techniques: Location based Classification (LBC) and Name based
Classification (NBC), and subsequently, we examine and discuss their advantages.
2.1.1 Location Based Classification (LBC)
Most US universities have sororities (female organizations) and fraternities (male
organizations) as social organizations. The buildings, which houses these organizations
also serve as residences for most of the members. Given the physical location of APs
on campus, APs located in sororities and fraternities are identified, and the users
associated with them are classified as females or males respectively. Fig. 2-3 illustrates
how grouping is done in this setting. This method can also be used to classify users
by other grouping criteria such as study major. For example all users associating with
Computer Science building AP can be classified as Computer Science major students.
Since wireless networks may be used by anyone in the physical proximity to the AP, this
kind of classification will also have un-related users or visitors accessing these APs,
24
which can make the classification inaccurate. We next present techniques to filter out
regular users from visitors at an AP.
Filtering: LBC requires filtering, as fraternities and sororities have male and
female visitors. Without further refinements and filtering, this method would not be
accurate. But even if we validate the presence of visitors, how can we filter them from
our classification? First, visitors are infrequent users of the mobile network in the visited
locations. Second, we expect a significant difference between residents and visitors
in terms of network activity (in number and duration of on-line sessions). Third, a user
who is visitor at one location can be a regular user at some other location. Hence,
we can define a visitor as a user with less number of sessions and smaller duration
of sessions than the average user in that location (group behavior) or as user who
has more sessions and larger online duration at other locations (individual behavior).
Our filtering techniques rate users based on two metrics: the number of sessions and
session duration. Once we rate all the users on these two metrics, we apply cut-off
thresholds to determine regular users. Filtering can be performed on these ratings
considering individual and/or group behavior as described in rest of the section.
2.1.1.1 Individual Behavior based Filtering (IBF)
In Individual Behavior based Filtering (IBF), we find the probability of a user being
male or female by counting the number of sessions and measuring the duration he/she
spends in fraternities versus sororities. This can be done using the equations below.
The probability of a user being male, considering only session counts at fraternities
and sororities is given by:
PCM(u) = Cf (u)Cf (u)+Cs(u)
where function Cf gives session count for user u in fraternities and function Cs gives the
session count for user u in sororities.
Similarly, the probability of a user being male, considering only session durations at
fraternities and sororities is given by:
25
PDM(u) = Df (u)Df (u)+Ds(u)
where function Df gives the total duration of sessions for user u in fraternities and
function Ds gives the total duration of sessions for user u in sororities.
Fig. 2-4 shows users who visited fraternity and/or sororities in decreasing order
of PCM(u) and PDM(u) for traces from university U1. Interesting observation is that
both PCM and PDM follow a similar trend and there is a sudden drop (transition) from
1 to 0 (between 500th and 700th user), essentially separating males from females. In
Fig. 2-4A, Out of 1119 users, there is a large number (∼ 425) of users whose probability
of being male is 1. These users have never associated with sororities APs. We also
have large number (∼ 362) of users who have never associated with fraternities AP
(PCM = 0 and PDM = 0), who we can classify as females. As fraternities and sororities
have visitors, many males will have probability less than 1 (vice-versa for females), if
we only consider users with probability 1 or 0, we would considerably remove legitimate
users who have visited and used WLAN at other locations (sororities for males and
fraternities for females).
We have instead classified all the users having PCM > 0.80 and PDM > 0.80 as
males and PCM < 0.20 and PDM < 0.20 as females, using the 80-20 rule or the Pareto
principle such that 80% of the regular users should fall in top 20% probability. Other
users discarded from the our studies. The results from University U2 are also similar
(2-5). This method, IBF, is generic and can also be used in other grouping criteria such
as study major among others.
2.1.1.2 Group Behavior based Filtering (GBF)
In Group based Filtering (GBF), we filter a user based on where his usage pattern
lies with respect to all the users at a particular location. GBF is also useful when traces
are available only from limited number of buildings and we cannot use IBF due to lack
of traces from all the buildings. For example lets consider that at a particular location,
we discover that average session duration of regular users is 3000sec and their session
26
0 200 400 600 800 1000 1200
0.0
0.2
0.4
0.6
0.8
1.0
Pro
babi
lity
of b
eing
Mal
e
Users in decreasing order of their Male probability (U1 feb2006)
PCM PDM
0 200 400 600 800 1000 1200 1400
0.0
0.2
0.4
0.6
0.8
1.0
Pro
babi
lity
of b
eing
Mal
e
Users in decreasing order of their Male probability (U1 Oct2006)
PCM PDM
0 200 400 600 800 1000 1200
0.0
0.2
0.4
0.6
0.8
1.0
Pro
babi
lity
of b
eing
Mal
e
Users in decreasing order of their Male probability (U1 feb2007)
PCM PDM
Figure 2-4. Users Vising Fraternity and/or Sorority in decreasing order of their Maleprobabilty at University U1. A) Feb 2006. B) Oct 2006. C) Feb 2007.
count is 10 in a period of one month. So all users who at least meet these criteria can
become regular users and are classified as male or female based on the location,
everyone else is considered a visitor and therefore removed. Finding these thresholds is
not a trivial task as these thresholds would vary from building to building and may also
change with time. For this task we employ clustering techniques [18] (one of the key
methods for unsupervised learning) to partition our data into regular users and visitors.
Clustering: Clustering can be used to divide a set of users into several subsets
such that users in each subset are most similar based on WLAN usage metrics
(duration, session count, distinct login days). From two general category of clustering
27
-200 0 200 400 600 800 1000 1200 1400 1600 1800 2000
0.0
0.2
0.4
0.6
0.8
1.0
Pro
babi
lty o
f bei
ng M
ale
Users in decreasing order of their Male probability (U2 Nov 2007)
PCM PDM
A
0 500 1000 1500 2000 2500
0.0
0.2
0.4
0.6
0.8
1.0
Pro
babi
lty o
f bei
ng M
ale
Users in decreasing order of their Male probabilty (U2 Apr 2008)
PCM PDM
B
Figure 2-5. Users Vising Fraternity and/or Sorority in decreasing order of their Maleprobabilty at University U2. A) Nov 2007. B) Apr 2008.
algorithms; namely hierarchical and partition scheme, we choose a robust partitioning
method called Partitioning Around Mediods (PAM) [45]. This method has distinct
advantages (over standard k-means [18]) in that it uses dissimilarity score to minimize
dissimilarity in the same cluster, making clusters robust to outliers. It also provides
a novel method called Silhouette Widths and Plots for estimating cluster quality. The
average Silhouette Widths are useful in estimating the number of clusters present in
the data (often a challenging job in cluster analysis). One has to run PAM several times,
28
each time for different number of clusters and then compare the resulting Silhouette
Widths. The clustering size that produces maximum average width is the best clustering
possible. The average width can also be used to estimate the quality of the clustering;
above 0.70 for strong clustering, between 0.50 – 0.70 for a reasonable structure and
below 0.50 for weak structure [45].
We use PAM to distinguish visitors from regular users (i.e residents). We use
number of distinct days of login, session count, and sum of session durations as the
metrics for user evaluation. This metrics can help identify and thus separate users who
make several sessions only in few days (may be visitors) from users who make sessions
everyday. We applied this clustering technique to Sororities and Fraternity user trace
from both Universities U1 and U2. We found that the best cluster size in each case
is 2. In each set we found that average silhouette width is above 0.65, 0.84 being the
maximum in one of the cases (more results in Tab. 2-1). The cluster size of 2 clearly
identifies our intuition of regular users and visitors and separates them using usage
behavior in that particular building/location. Also, the high average silhouette width
indicates the high quality of clustering. Detailed results of GBF are in middle column of
Tab. 2-2
Fig. 2-6 shows effect of total session duration, total number of sessions and
unique days of login over clustering of users. We can see a clear drop in the number of
sessions and unique days when the clustering changes from 2 to 1 (2nd cluster signifies
the resident). We notice that at the beginning of cluster 1 there is a spike in the total
duration but still these users are not included in the regular users as their number of
sessions and unique days of login are comparatively less than users belonging to cluster
2. Clustering ensures that all three metrics are incorporated when making a decision.
Similar results are obtained for other traces from university U1 (Fig. 2-7) and U2 (Fig.
2-8 and Fig. 2-9). GBF is generic and can be used to identify other social groupings
such as study-major, which will be investigated in our future research.
29
Table 2-1. Average Silhouette Width for Sorority and Fraternities from University U1 andU2
U1 U2Feb 2006 Oct 2006 Feb 2007 Nov 2007 Apr 2008
Fraternity 0.72 0.74 0.75 0.84 0.78Sorority 0.65 0.72 0.69 0.78 0.76
1
10
100
1000
10000
100000
1e+06
1e+07
0 100 200 300 400 500 600 700 800 900 1000
Num
ber
Users
Regular Users Vistors
Sum of durationDistinct Days
ClusterNumber of Session
A
1
10
100
1000
10000
100000
1e+06
1e+07
0 200 400 600 800 1000 1200 1400
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
B
1
10
100
1000
10000
100000
1e+06
1e+07
0 200 400 600 800 1000 1200
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
C
Figure 2-6. Clustering results for University U1 Sororities. A) Feb 2006. B) Oct 2006. C)Feb 2007. .
30
1
10
100
1000
10000
100000
1e+06
1e+07
0 200 400 600 800 1000 1200 1400
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
A
1
10
100
1000
10000
100000
1e+06
1e+07
0 200 400 600 800 1000 1200 1400 1600
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
B
1
10
100
1000
10000
100000
1e+06
1e+07
0 200 400 600 800 1000 1200 1400 1600 1800
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
C
Figure 2-7. Clustering results for University U1 Fraternities. A) Feb 2006. B) Oct 2006.C) Feb 2007. .
1
10
100
1000
10000
100000
1e+06
1e+07
0 100 200 300 400 500 600 700 800
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
A
1
10
100
1000
10000
100000
1e+06
1e+07
0 200 400 600 800 1000 1200
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
B
Figure 2-8. Clustering results for University U2 Fraternities. A) Nov 2007. B) Apr 2008..
31
1
10
100
1000
10000
100000
1e+06
1e+07
0 500 1000 1500 2000 2500
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
A
1
10
100
1000
10000
100000
1e+06
1e+07
0 500 1000 1500 2000 2500 3000 3500
Num
ber
Users
Sum of durationDistinct Days
ClusterNumber of Session
B
Figure 2-9. Clustering results for University U2 Fraternities. A) Nov 2007. B) Apr 2008..
2.1.1.3 Hybrid Filtering (HF)
As we do not know the ground truth or have the real data about the users, it is
difficult to validate the results of these classifications. In order to have a meaningful
analysis after the classification, we need to validate the classification. We validate LBC
via multiple techniques in Sec. 2.2. In one of the techniques, we compare the results
from IBF and GBF. Results are tabulated in Tab.2-5. We find that both methods mainly
select same set of users, which should be the case as both methods attempt to identify
regular users (males in fraternities and females in sororities). Therefore, for higher
confidence/correct classification and analysis in the later sections of the paper, we
choose the users selected by both filtering methods. We call this method Hybrid Filtering
(HF) as this uses results from both IBF and GBF. By doing so we successfully classify
majority of the users (more than 90% of the users selected by GBF are common to
users selected by IBF based method as shown in Tab. 2-5).
Our proposed scheme of LBC is generic and can classify users into social groups
if these groups have inherent location preferences (Sororities are females residences,
Computer Science major has strong ties with Computer Science buildings or theater
group meets often at the auditorium). One thing to note is that LBC and its filtering
32
techniques do not need access to unanonymized MAC address. As long as the MAC
addresses are consistently anonymized, LBC is applicable. This property makes
LBC usable in most of the available WLAN traces. Next, we present Name Based
Classification (NBC) technique, an alternative to LBC.
2.1.2 Name Based Classification (NBC)
In this technique, we use the usernames of the WLAN users, which are sometimes
available in the traces. This field may be obtained on campuses and enterprises that
require authorization mechanism such as passwords to access WLAN. Including
username should not affect privacy of the user as these usernames are not private and
usually cannot identify a person. We approach our classification problem by exploiting
the fact that university U2, from which these traces were collected, provides usernames
and maintains a directory. This directory can be searched with the username (WLAN)
and users have the option of not listing their names in the phone book. This implies that
we can search and find the first names corresponding to the usernames for the users
who have made their information available in the phone book. We then use the list of
top 1000 males and females first names from the US Social Security administration
website [2] and remove the names present in both lists (neutral names). Thus, we get
the list of most popular male-only and female-only names. We run this list against the
list of names we find from the phone book, thus finding the gender of the users [32, 66].
In this technique, we do not have problem of visitors thus we do not need any filtering.
We observe that names from the US Social Security list may not be able to classify
foreign national students and non-popular names into gender groups, this however is
not a limitation of our method but of the name database. Using a more comprehensive
database should provide better classification. In this paper, however, we are more
concerned with a general methodology of classifying WLAN users, the details of how to
acquire a better database are out of scope of the paper.
33
Table 2-2. Results of classification of users from U1 (LBC) and U2 (NBC). ‘Common’signifies the users which were common to both male and female population.
U1-IBF U1-GBF U2-NBCFeb 2006 Oct 2006 Feb 2007 Feb 2006 Oct 2006 Feb 2007 Nov 2007 Apr 2008
Total Users 16416 22405 20302 16416 22405 20302 27068 29982Males(only) 506 553 545 451 437 417 5245 5807
Females(only) 513 570 509 441 456 410 5955 6817Common 0 0 0 22 37 29 0 0
Using NBC classification, we could classify 11,000 as males or females out of
27,000 users in the trace period of Nov 2007, and 12,500 as males or females out of
30,000 users in the trace period of Apr 2008 at University U2. Some of the users from
both trace periods have been marked as ‘Common’ since their names appeared in both
male and female name list. For purpose of this study ‘Common’ users were excluded
from both male and female user sets. Details of the classification are listed in Tab. 2-2.
Compared to NBC, LBC requires less information (username not needed); however,
we need to find a way validate LBC. One way to validate is to compare classification
results of LBC with NBC as shown in Sec. 2.2.3. NBC method is much closer to the
ground truth. The use of NBC is limited as the availability of usernames is limited to
a very few currently available traces. Once we check the correctness of LBC, this can
become the primary method for classification.
2.2 Validation of Location Based Classification
Validation of LBC is needed to raise confidence in the results from U1 i.e. users
classified as visitors are indeed visitors and not the regular users of that Access Point
(males in case of fraternities and females in case of sororities). Validation of the results
with the ground truth/actual reality is difficult, especially when we have developed the
methods for publicly available traces and information. Even if we get access to students’
university records, we would not be able to match it with student’s device (especially
when MAC addresses are anonymized). One approach is to conduct surveys for 50,000
users in each campus, the results are likely to be incomplete and noisy (erroneous)
34
aside from the enormous efforts/resources needed if at all possible. Instead, we have
devised three statistical methods to validate our filtering mechanisms. The first method
finds out regular users in the trace-set belonging to adjacent months and compares
this list to see how many are common (temporal consistency). The second method
compares results from IBF and GBF to check the similarities in the results. The third
method takes the classification achieved using NBC method and compares it with the
results of LBC because NBC should be very close to the ground truth. The methods are
discussed in detail below.
2.2.1 Temporal Consistency Validation Using Adjacent Months
In this method of validation, we consider a pair of one month long trace-sets
belonging to adjacent months in the same semester (such as February 2006 and March
2006 from Spring 2006 semester) and use IBF, GBF, and HF filtering techniques to
find out how many users are common between the two adjacent months before and
after filtering. Assumption being that the set of users living in fraternities and sororities
do not change from one month to another in the same semester. If after filtering, the
percentage of common users increases then it is likely that this method works correctly
in identifying regular users. Tab. 2-3 and Tab. 2-4 show the results we obtain for both
fraternity and sorority users. We see that for fraternities, before filtering, the percentage
of common MACs in two consecutive months is around 60% to 64% and after filtering
it goes upto between 72% to 80% in all three filtering schemes. In case of sororities,
before filtering, we see that common users are between 66% to 72% and after filtering
the percentage of common users shoots up to 80% to 93%. This shows that filtering
schemes are selecting regular users, as percentage of common users rises dramatically
after filtering.
2.2.2 IBF vs GBF
The LBC technique in Sec.2.1.1 describes two main filtering techniques - IBF and
GBF. Both use location information to identify the gender; however, cut-off thresholds
35
Table 2-3. Similarity in the user population selected after filtering fraternity users for U1Before FilteringMonth(a) Month(b) # of Users(a) # of users(b) Common % usersFeb2006 Mar-Apr2006 1350 1441 816 60.4Oct2006 Nov2006 1520 1572 969 63.8Feb2007 Mar-Apr2007 1692 1875 1050 62.1
After Filtering- IBFMonth(a) Month(b) Male(a) Male(b) Common Males % CommonFeb2006 Mar-Apr2006 506 507 386 76.2Oct2006 Nov2006 553 518 401 72.5Feb2007 Mar-Apr2007 545 613 407 76.5
After Filtering- GBFMonth(a) Month(b) Male(a) Male(b) Common Males % CommonFeb2006 Mar-Apr2006 473 463 378 80.0Oct2006 Nov2006 474 445 371 78.27Feb2007 Mar-Apr2007 446 482 354 79.4
After Filtering- HFMonth(a) Month(b) Male(a) Male(b) Common Males % CommonFeb2006 Mar-Apr2006 416 409 332 79.8Oct2006 Nov2006 418 387 327 78.2Feb2007 Mar-Apr2007 399 419 311 77.9
for filtering regular users and visitors are set differently. Comparing the results of both
methods provides us with another validation mechanism. Tab. 2-5 shows comparison of
filtering results for 3 months long traces (Feb2006, Oct2007, Feb2007) from university
U1. We can see that greater than 400 (75%) users are consistently common in both
the methods. This points to the high degree of similarity, which validates the filtering
that both methods remove visitors and result in similar regular users (increasing the
confidence in our results). We note that GBF is more conservative (less number
of regular users) than IBF, which could be attributed to the fact that GBF takes into
consideration the usage attributes (session count, duration, distinct days of login) of an
average user for comparison (by using clustering), which can be higher than a regular
user selected by IBF. For the user behavior analysis, in the following section, we only
36
Table 2-4. Similarity in the user population selected after filtering sorority users for U1Before FilteringMonth(a) Month(b) # of Users(a) # of users(b) Common % usersFeb2006 Mar-Apr2006 991 1155 717 72.3Oct2006 Nov2006 1264 1305 844 66.8Feb2007 Mar-Apr2007 1169 1327 821 70.2
After Filtering- IBFMonth(a) Month(b) Female(a) Female(b) Common Females % CommonFeb2006 Mar-Apr2006 513 536 450 87.7Oct2006 Nov2006 570 557 461 80.9Feb2007 Mar-Apr2007 509 511 417 81.9
After Filtering- GBFMonth(a) Month(b) Female(a) Female(b) Common Females % CommonFeb2006 Mar-Apr2006 463 474 429 92.7Oct2006 Nov2006 493 456 432 87.6Feb2007 Mar-Apr2007 439 458 405 92.3
After Filtering- HFMonth(a) Month(b) Female(a) Female(b) Common Females % CommonFeb2006 Mar-Apr2006 435 449 402 92.4Oct2006 Nov2006 454 432 401 88.3Feb2007 Mar-Apr2007 406 413 367 90.4
Table 2-5. Validation - comparing users selected by IBF and GBF for U1Month Gender IBF GBF HF
Feb 2006 Male 506 451 416Female 513 441 435
Oct 2006 Male 553 437 418Female 570 456 454
Feb 2007 Male 545 417 399Female 509 410 406
consider the users selected by both filtering methods also referred to as Hybrid Filtering
(HF).
2.2.3 Cross Validation
NBC does not classify all users as either male or female (Sec. 2.1.2), however,
this classification has a low error rate because of using statistics from real data coming
from the US Social Security Office. Using this property of NBC, we can find out the
error bound for the LBC. Availability of the error percentage can help in realizing
37
Table 2-6. Cross validation of LBC by NBC for U2Month FL FL ∩MN Ef ML ML ∩ FN Em
Nov 2007 1280 74 0.058 334 25 0.074Apr 2008 1690 123 0.072 349 29 0.083
the error margins for LBC. To calculate the error bounds, the users (from sororities
and fraternities) classified by LBC as females and males are put in sets FL and ML
respectively.
Using NBC, we classify all users from Fraternities and Sororities and put them in
different sets. Females in set FN and males in set MN, and remove the unclassified
users. The unclassified set of users are those whose name existed in both male
and female databases or whose name was not in the database. The error in female
classification by LBC can be given by Ef , where Ef = (FL ∩ MN)/FL and the error in
male classification by LBC can be given by Em, where Em = (ML ∩ FN)/ML.
Tab. 2-6 provides results on the cross validation of LBC by NBC. We did the
analysis for trace sets coming from university U2 as it provides usernames along with
the information about AP located in the sororities and fraternities, which allows us to
perform both NBC and LBC. For Apr 2008 traces from university U2, the set FL has 1690
users after doing LBC and Ef is equal to 7.2%. In case of set ML, which has 349 users,
we find that Em is 8.3%. Similarly, in Nov 2007 traces, Em and Ef is less than 8.3%. The
low value of error, E , further increases our confidence in the LBC and validates the
classification method.
To sum, we find our location classification LBC (with three filtering tech-
niques - IBF, GBF & HF) are supported by three validation techniques. Validation
ensures the users selected by the filtering are indeed the regular users, which
in sororities means selecting females and in fraternities selecting males. The
filtering statistical errors were below 10%, and the confidence was found to be
over 90%.
38
2.3 User Behavior Analysis
Classification of users into social groups is the first step in understanding the
usage differences between the groups. The classification techniques discussed in
Sec. 2.1 take all the WLAN users and divide them into various sets (depending on
the grouping criterion). For the gender based grouping, we have three sets : Male,
Female and Unclassified (grouping could not be determined). These groups can
now be evaluated on multiple metrics depending on the application. In this work we
have considered three generic metrics (not corresponding to any application). We
investigate the spatio-temporal distribution for wireless usage across genders in addition
to vendor preference. The main aim of these metrics is to examine the existence of
differences between the groups. We attempt to identify differences that are statistically
significant and consistent across the multiple traces we have studied. One observation
to make here is that it may not be necessary that such differences hold true in different
campuses or time-periods. However, knowledge of these differences (even existence)
may be important to protocols and services targeted at these groups of users. The three
metrics we use are:
a. WLAN Usage and Gender Spatial Distribution: What are the trends in WLANusage across different (buildings) areas on campus?
b. Average Online Time (Temporal distribution): Are there trends in the averageonline times of users and can differences be identified based on gender and areas(buildings) within the campus?
c. Manufacturer Preferences: Which device vendors do different genders prefer? Towhat degree does gender affect the choice of vendor?
2.3.1 User Spatial Distribution
An example of a metric is the spatial distribution of the users. This metrics can
identify where the classified users spend most of their time (regular users). For example,
by searching the female users in the complete trace we can find out the locations visited
by them. We refer these locations as “Area”, since they also represent major/department
39
Figure 2-10. Comparison of user distribution across the university U1 campus (inPercentage)
housed at that location. Here we only look into major trends by the active user. A user is
considered active (regular) at an area by using GBF. Difference in the number of users
among the genders can tell us about the building preferences of the genders. Fig. 2-10
and Fig. 2-11 show percentage distribution for males and females at Universities U1 and
U2 at various buildings. At both universities, we can see that there are more males than
females in the areas of Economics (by 39% at U1 and 33% at U2), Engineering (5% at
U1 and 89% at U2) and Law (by 83% at U1 and 6% at U2). Law area information for
Feb2007 is a outlier as we do not have any male student during that period. Females
are more in number than males in the area of Social Science (by 16% at U1 and 3%
at U2) and Sports (by 41% at U1 and 2% at U2). We see that at U1 and U2 trends are
opposite for the area of Music (U1 has 40% more females however U2 has 33% more
males). For more details see [47].
Existence of locations, which are consistently preferred by one of the two genders,
highlights the existence of difference in WLAN usage by two genders. Many of the
trends hold even across the two campuses. We believe this can be beneficial to several
application as discussed in Sec. 2.4.
40
Nov20
07
Apr20
08 --
Nov20
07
Apr20
08 --
Nov20
07
Apr20
08 --
Nov20
07
Apr20
08 --
Nov20
07
Apr20
08 --
Nov20
07
Apr20
08 --
Nov20
07
Apr20
08 --
Nov20
07
Apr20
08
0
10
20
30
40
50
60
70
80
90
100
110
Use
r Pop
ulat
ion
Per
cent
age
Area
Female Male
Admin Communication Economics Engineering Law Music Social Science Sports
Figure 2-11. Comparison of user distribution across the university U2 campus (inPercentage)
2.3.2 Average Duration or Temporal Analysis
Average duration of a session for males and females gives us an understanding of
the extent of WLAN usage at different areas. From Fig. 2-12 and Fig. 2-13, we observe
that males on average have longer sessions than females in most of the areas (on
average by more than 9%, in extreme cases by as much as 200%). On average, male
users tend to stay - as WLAN users - at certain places for longer times than females.
At both universities, we see that females consistently have higher average duration
than males in the area of Social Science (by 12.8% at U1 and 10% at U2) and Sports
(by 17.2% U1 and 8% U2). Males consistently have higher duration session at both
universities in the areas of Engineering (by 76% at U1 and 15.4% at U2) and Music (by
39.9% at U1 and 36.8% at U2). We see that females at university U1 consistently have
higher average duration in the area of communication (by 12%) where as males have
higher session duration at university U2 (by 10%). We also see clear trends at university
U2 that males have higher session duration at area of Economics.
Another observation of interest is that average duration per session decreases from
Feb 2006 to Feb 2007 (from 2789 sec to 2454 sec) in almost all the cases for university
41
Admin
CommunicationEconomics
Engineering LawMusic
Social ScienceSports
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Ave
rage
Dur
atio
n (s
econ
ds)
Area
Male-Feb2006 Male-Oct2006 Male-Feb2007 Female-Feb2006 Female-Oct2006 Female-Feb2007
Figure 2-12. Average duration of male and females in different Areas of university U1campus
U1 campus, we observe similar trend in university U2 (from 3800 sec in Nov 07 to 3609
sec in Apr 08). This points to the possibility that students are becoming more mobile,
and thus have shorter sessions at the same location.
While in some cases the trends were equal across genders, in several scenar-
ios we do find differences in WLAN usage among the genders. Some of these
differences were found to be significant and spatio-temporally consistent even across
campuses; females’ wireless activity is stronger in Social Science and Sports areas,
whereas males’ activity is stronger in Engineering and Music. In other scenarios each
university campus had a different trend specific to it. These findings are likely to have a
significant impact on usage modeling in wireless networks
2.3.3 Device Preference
In many available traces, partial MAC anonymization is done, such that top three
octets of the address (which identify the Manufacturer) are left unchanged. Traces
from both U1 and U2 use partial anonymization. These top octets can be used to
42
Admin
CommunicationEconomics
Engineering LawMusic
Social ScienceSports
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Ave
rage
Dur
atio
n (s
econ
ds)
Area
Male_Nov2007 Male_Apr2008 Female_Nov2007 Female_Apr2008
Figure 2-13. Average duration of male and females in different Areas of the university U2campus
find preferred vendors for the groups (Male and Female). In this metric, we are only
considering major vendors (by the number of users).
Fig. 2-14 and Fig. 2-15 show the number of users per vendor at University U1 and
U2. At university U1, it is interesting to note that Apple computers are more popular
amongst females than males. Intel devices are more popular amongst males. For
example, using the Feb 2006 traces we find that 25% of the males use Apple and 32%
use Intel, so that there are 28% more male users using Intel with respect to Apple users.
In the case of Females, 30% use Apple and 27% use Intel, so 12% more female users
use Apple than Intel. To test whether gender provides a bias towards specific vendors,
we use the Chi-Square statistical significance test. The Chi-Square test shows with
90% confidence that there is a bias between gender and vendor/brand. This holds true
for all the three trace sets from university U1. We also notice a consistent increase in
percentage of Apple computer users of both genders over the three trace samples.
For comparison of the results from university U1 with university U2, for this case
only, we considered users only from fraternities and sororities from university U2.
43
Apple Intel Askey Gemtek Netgear Hon Hai Enterasys D-Link Linksys0
5
10
15
20
25
30
35
40
Use
r Per
cent
age
Manufacturer
Male-Feb2006 Male-Oct2006 Male-Feb2007 Female-Feb2006 Female-Oct2006 Female-Feb2007
Figure 2-14. Device distribution by manufacturer at university U1
The classification of users was performed using LBC (similar to university U1). At
university U2, we do not find trends similar to university U1, we see that both the
genders consistently prefer Intel devices more than the Apple devices. We tend to
believe that preference of WLAN users can wary with geographic location and factors
such as affluent society, presence of Apple store on campus among others.
We also observe that vendors like Enterasys, Linksys, D-link and Askey Corp.
have a decreasing trend in terms of percentage of users. One of the reasons is that
these manufacturers mostly make external Wi-Fi devices for old laptops (with no built-in
Wi-Fi NICs). Currently almost all new laptops come with a built-in Wi-Fi, so the users of
external devices are decreasing.
These results indicate once more that there are statistically significant differences in
the usage pattern of the two gender. One possible implication of this device preference
is that PC viruses or malware propagation in some female groups may be less effective,
which will have a direct impact on security studies in future wireless societies as in
DTN [77].
44
Apple Intel Askey Gemtek Netgear Hon Hai ASUSTek0
10
20
30
40
50
Use
r per
cent
age
Manufacturer
Male-Nov2007 Male-Apr2008 Female-Nov2007 Female-Apr2008
Figure 2-15. Device distribution by manufacturer at university U2
2.4 Applications
Analysis of user behavior in the previous section highlights that statistically
significant differences exist in the usage pattern of the two genders. There can
be several metrics on which a group of users can be evaluated and their behavior
quantified. The results from these metrics can then be applied to an existing or new
application to make it context sensitive. In this section, we discuss few applications
which will benefit from the quantified differences among the groups such as mobility
modeling and protocol design. We also discuss impact of this analysis on user privacy,
wireless network deployment, and resource management among others. For the lack of
space, more details of the application are omitted.
2.4.1 Mobility Models
Mobility models are important tools to understand user movements and create
models on which protocols can be tested. The knowledge of groups can be used
to re-evaluate mobility models such as TVC [37], IMPORTANT [16], and several
others [15]. This enhancement can allow us to model/evaluate social groups on
‘behavioral’ aspects, load (sessions duration) and density among others. This kind
45
of study can only be possible by using the methods mentioned in this work, other
methods like taking a survey of 50,000 users would require tremendous effort and may
still have similar error rates.
2.4.2 Protocol Design
Protocol and service design in Mobile Ad-Hoc networks can take features of
various groups to evaluate its performance. It has been shown in Profile-Cast [39] that
considering behavior of users (profiles), one can create efficient protocols for Mobile
Ad-Hoc Networks. This work does not consider difference among groups of people. It
has also been shown that users with similarities meet often and have closer ties [60].
Can similar people (belonging to same group) have higher chances of meeting more
often? Can this knowledge increase the message delivery success? Our method helps
in identifying the social groups, however, further investigation needs to be done such as
combining this group information with services such as Profile-Cast
2.4.3 Privacy
A major impact of this work is bringing the privacy related issues with traces
to forefront. Determining gender from the traces which were anonymized, shows
weaknesses in current anonymization techniques. It may be argued that anonymization
of location information may prevent this kind of classification, however, this not only
decreases the utility of the traces, but also the authors in [48] show that location
anonymization can be easily undone. The primary reason is the unique session patterns
of the WLAN users. Anonymization of WLAN traces while maintaining utility of the traces
is a challenging task. Our work also points at this significant problem.
2.4.4 Resource Management
Knowledge of group behavior can also be helpful in planning WLAN resource
deployment and capacity modeling. Questions like how the usage would change
if more admissions are given to computer science students versus law students or
females/males can now be answered in better light.
46
We all have intuition where and how a certain group of users may use WLAN, our
method allows to quantify this intuition. We believe that methods discussed in this work
are the fundamental step for many interesting studies in the future.
2.5 Conclusion And Future Work
In this study, we propose novel methods, which use WLAN traces to classify WLAN
users in to social groups based on features such as gender and study-major among
others. The work presents a general framework that can be applied to traces coming
from multiple sources. As an example, traces from two university campuses have
been used and gender based grouping classification is performed. Multiple techniques
for grouping users are discussed since each one has slight advantages in certain
scenarios. The study cross-validates the results by comparing results provided by each
of the classification methods.
Results from this research are based on a sample of the user population, since
gender may be identified based on sorority and fraternity wireless access point
associations or based on name filter. We find that there is a distinct difference in WLAN
usage patterns for different genders even with similar population sizes. Availability of
results comparing groups of users can allow researchers to quantify the behavioral
differences between the groups. We see that these trends and characteristics are
consistent over periods of time and across different semesters and sometimes even
across university campuses. We also see some trends that are not consistent across
the two university campuses like the vendor preference. At one university females show
a statistical trend for preference towards the Apple computers, however, no similar
observation is made at the other university. We think that some social characteristics
are dependent on the location of the University campus and other facilities around the
campus (like presence of Apple store, affluent population). Even though the results vary
with time and location, it may be essential for a protocol designer of mobile networks to
understand the characteristics of this network.
47
Interestingly, we were able to classify users into males and females and were
successful in obtaining their preference of vendor, based on analysis of anonymized
traces (university U1 study did not use usernames). We were also able to validate our
results. This raises several privacy issues. Can private information of individuals be
identified by analyzing anonymized traces? What kind of anonymization algorithms
should be used for mobile networks traces? And how can such algorithms provide a
notion of k-anonymity [76] for the mobile society while retaining useful information for
researchers? These are questions that bear further research and we plan to address
them in our future work.
In the future, we plan to prepare mathematical models, which can represent a user
in a particular group. This process would allow us to understand various features, which
represent the user’s WLAN usage characteristics. It would also allow us to classify users
into groups by looking at the features only. User model would also be useful in tailoring
the protocols for multicast and profile-cast to incorporate the group behavior.
We hope for this study to open the door for other mobile social networking studies
and profile-based service designs based on sensing the human societies.
48
CHAPTER 3BREAKING ANONYMITY IN WLAN TRACES
The advent of portable/mobile devices and availability of ubiquitous network
coverage using heterogeneous wireless technologies like Wi-Fi (IEEE 802.11), GPRS,
3G and Wi-Max, has allowed humans to browse information on the go. From sharing a
computing device at home, office, or a commercial establishment, we have come to an
era where these devices have become very personal and customized to user’s taste.
A major impact of this change (apart from all the benefits of being mobile) is that these
devices have become sensors of the human society. As these devices remain with
their owners for many hours in a day, they can capture large amounts of user behavior
patterns, which can be made available to researchers. On one hand, the study of such
data can be used to develop better understanding of human behavior and provide
improved services, on the other hand, availability of this kind of data can be considered
an infringement on the privacy of the user.
Several researchers use WLAN traces for research and analysis purposes such as
to examine usage behavior of users[35, 38, 49], discover characteristics for developing
network protocols[39] or to study user mobility patterns[25, 37, 65]. Many of the WLAN
traces are publicly available[7, 41]. It is, therefore, important to understand how the
privacy of WLAN users gets affected. In this work, we investigate the extent of user’s
private information that can be extracted from the anonymized Wireless Local Area
Network (WLAN) traces. Even though most of the trace libraries anonymize/sanitize
the traces to protect user’s privacy, we present several methods, which can be used to
reverse the anonymization. We attempt to expose the weakness in the currently used
anonymization techniques and bring attention of the WLAN research community on this
fundamental problem. We find that WLAN traces are unique in the sense that human
movement pattern gets embedded in them, which can have unique signatures. These
signatures can be later combined with publicly available information from such sources
49
as directories or schedules to identify a user even after anonymization. Despite the
importance of privacy issues in WLAN traces, there is a lack of significant research in
this field. The purpose of this study, therefore, is to shed light on the need of better
anonymization techniques and identify a rich set of plausible scenarios in which
anonymity can be compromised.
The issues of privacy and anonymization have always been present in network
traces. Researchers have also faced challenges in anonymizing the wired traces[69].
Recently, wireless traces have also been collected and archived at on-line public
libraries like CRAWDAD[7] and MobiLib[41] that collectively hold well over 50 traces. As
these are pervasively captured user information, several questions have been raised
about the process of collecting traces[12, 74]. Techniques are being researched such
that users themselves can shares their traces[73]. However, the pertinent question,
which still remains unanswered is that once traces are collected, how can they be
prepared for distribution such that they have a good utility, as well as, they do not
compromise the privacy of the users. Our efforts are targeted at this question, which has
become even more challenging with the WLAN traces, as we shall discuss in this paper.
In this work, we present our analysis of the currently used anonymization methods and
their shortcomings.
The next section presents the information available in the WLAN traces. Sec.
3.2 presents example scenarios where identifying a user and monitoring his usage
pattern can be detrimental to his privacy. These cases justify the need for fail-proof
anonymizing/sanitizing of WLAN traces. We discuss prevalent methods of anonymizing
WLAN traces in Sec. 3.3, following which we discuss attack scenarios and methods,
which can be used to break WLAN anonymization. Sec. 3.4 presents an analysis of how
the anonymization could be broken. Sec. 3.5 provides an analysis of the attacks and
discusses different possible approaches that can be used to prevent evasion of privacy,
50
though this remains an open question. In the last section, we summarize our findings
and present directions for future research.
3.1 Information In WLAN Traces
WLAN traces are logs of user association with wireless Access Points (AP). A
generic information tuple, after some processing of the raw trace, has MAC ID, Start
time, Duration and Access Point/Location.
Table 3-1. WLAN trace sample: before and after anonymizationMAC Start Time Duration(sec) AP/Location
00:11:22:33:44:55 01 Jun 2008 21:00:51 GMT 3000secs CS buildingAP111:22:33:44:55:66 01 Jun 2008 21:01:30 GMT 10secs ECE buildingAP201:02:03:04:05:06 01 Jun 2008 22:11:00 GMT 200secs MSL buildingAP110:20:30:40:50:60 01 Jun 2008 22:15:30 GMT 600secs MACA buildingAP111:22:33:44:55:66 01 Jun 2008 22:23:10 GMT 180secs CS buildingAP3
a. Sample un-anonymized trace| | | || | | |
Partial & consistent No change No change Location Anonymization| | | |↓ ↓ ↓ ↓
MAC Start Time Duration(sec) AP/Location00:11:22:0353 01 Jun 2008 21:00:51 GMT 3000secs AcadBldg10AP111:22:33:0521 01 Jun 2008 21:01:30 GMT 10secs AcadBldg2AP201:02:03:9877 01 Jun 2008 22:11:00 GMT 200secs Library5AP110:20:30:3260 01 Jun 2008 22:15:30 GMT 600secs AcadBldg22AP111:22:33:0521 01 Jun 2008 22:23:10 GMT 180secs AcadBldg10AP3
b. Sample anonymized trace
A snapshot from an un-anonymized trace, is shown in Tab.3.1a. Some traces
may provide more information such as username. For the sake of simplicity, we
have considered the basic tuple similar to shown in Tab.3.1. Using a tuple with less
information makes the breaking of anonymity any easier as compromising anonymity
with less information is more difficult.
51
3.2 Need For Anonymity
Although the implications of losing privacy in the real world are well known, in this
section, we discuss the implications related to the loss of privacy in WLAN traces. As
Tab.3.1 shows, MAC address is one of the fields in the traces. This field is the link-layer
address of the hardware/device used to access the WLAN network. Users generally do
not change their MAC addresses between the sessions (perhaps due to lack of tools,
which do it effortlessly or due to lack of awareness) and current protocols do not allow
a user to change his MAC address during the session. This implies that MAC address
becomes a permanent identifier of the machine. Since most of the machines using
wireless are portable, they are less frequently shared by people. MAC address, thus,
becomes associated to the person and hence his/her identifier. If we know MAC address
of a device and its user, then we can search for that user in the WLAN traces and
essentially know the places visited. MAC address of a device can be found by various
methods such as sniffing the wireless channel.
Greenstein et al.[34], with the help of case studies, have shown how capturing and
analyzing of 802.11 protocol packets can be used to evade user privacy. The cases,
which we present, show similar threats as shown in this paper [34]; however, we are
using only the WLAN traces and are not coupling it with actively captured data packets.
In our case, threats become even more serious because the attacker need not be
present in the same geographic location as the attacked/victim (traces are available
on the Internet [7, 41]). Tracking the attacker can also be difficult due to the fact that
some of the WLAN traces are publicly available with little or no security checks or log
mechanism. Below are some cases that show possible attacks on user privacy:
1. One can prove someone’s presence at a location by showing the association of hismachine with AP located in that vicinity.
2. If one knows MAC-to-name mapping of a user, he/she can trace the user by findingthe location of AP with which the user associates. Therefore, he/she can get user’s
52
Table 3-2. Fields present in each record of wired trace, basically a IP-HeaderFields
Version Header Lent Type of ServiceIdentification Flags Fragment OffsetTime to Live Protocol Header Checksum
Source Address Destination Address OptionsData
daily activity pattern/schedule (Imagine if a thieve knows exactly when one is goingto be away from house or in which time interval nobody is in the office).
3. By looking at the MAC addresses associated with a particular AP with which a userassociates, one can make a guess about the people the user is meeting with. IfMAC addresses to name mapping is available for all MACs, this would be a trivialtask.
4. Information can be used as a forensic evidence against the user (or as an alibi).
These scenarios show us some of the possible privacy infringements, if the WLANs
are available without anonymization. Trace providers are aware of these concerns and
therefore anonymize the traces before making them public. In this study, however, we
show that the anonymization techniques used can be compromised and users can
be identified to some extent even after anonymization. The next section provides an
in-depth discussion of the anonymization techniques used in WLAN, this would allow us
to better appreciate the attack as well as the complexities involved in anonymizing the
traces.
3.3 Related Work
The wired network traces have existed for some time and many libraries have been
created for sharing the traces[4, 6]. Researchers have developed several anonymization
techniques[63, 69, 81] for wired network traces. Several tools have been developed
such as Tcpmkpub[69] and Tcpdpriv[62]. We looked into these traces and techniques
to investigate if they can be applied to WLAN traces. We, however, found that even
though highly sophisticated techniques have been proposed to anonymize the wired
traces they are not completely unbreakable[27]. In addition, there are fundamental
53
differences between Wired and Wireless LAN (WLAN) traces, which makes it difficult
to apply Wired trace anonymization on WLAN traces. In terms of anonymization goals,
in wired traces, the goal is to prevent discovery of identities of network resources
and leakage of security policies [69]; however, anonymization in WLAN traces, also
requires protection of user’s identity[34, 68] as the network resources are personal
devices. Wired traces (also called netflow) have fields as shown in Table 3.3, which is
essentially an IP header (IPv4). WLAN traces can have this information along with other
information as in Table 3.1a, which is generated by association and disassociation of the
device with the access points (AP). As this feature is unique to WLAN usage, we face
newer challenges in anonymization. We can see that complete WLAN trace (along with
netflow) is a super set of wired trace (only netflow). In WLANs, generally IP address are
assigned using DHCP protocol and the subnet varies with WLAN access location. This
reduces chances of same machine getting the same address on every session, which in
wired traces can be considered 100% (assuming static assignments only). This makes
anonymization of netflow information from WLAN traces much simpler than wired traces.
We find that in many studies[25, 35, 36, 39, 49] regarding WLAN traces, researchers
have only used association traces such as shown in Table 3.1. In fact, most of the
WLAN trace libraries[7, 41], do not have comprehensive netflow traces as they have
the association traces. One of the reason is the difference between association traces
over netflow data in WLAN. Netflow information (like in wired traces) are usually used
to understand the behavior of the applications[44, 64], to detect anomalies in the
network[5, 82], network protocol designs, and network planning [28, 75]. Wireless traces
have been used for network planning[25, 35], understanding user behavior[25, 36, 49],
DTN protocol designs[39], and understanding societal interaction with technology [49].
Overall, we see that even though rich set of techniques are available for wired
traces, their applicability to WLAN traces seems insufficient due to above reasons and
54
because of similar reasons, the attacks on WLAN traces would be quite different than
the attacks on wired traces.
Although anonymization is a very important step in releasing WLAN traces, we
could not find any published work that deals with the techniques most suitable for
WLAN. Most of the techniques used are not thoroughly investigated in the light of WLAN
traces. This will be more clear in the next section where we talk about the possible
attacks and drawbacks of the existing methods. Rest of this section examines the
anonymization techniques currently used in WLAN trace anonymization.
Current Techniques:Anonymization in WLAN traces is done on field by field
basis[41, 46]. Either a field is fully anonymized (mapped to a random number) or only
a portion of the field is anonymized. In the traces having multiple sessions per MAC
addresses, trace providers can either randomize the MAC address to a unique value
for each session, or use the same anonymization mapping of the MAC address for all
the sessions (consistent mapping). This step decides the information and utility of the
traces. Consistent mapping for each MAC throughout the traces, provides ability to track
a user through multiple sessions. Majority of the traces available at MobiLib[41] and
Crawdad[7] provide the consistent mappings.
Some traces like Dartmouth traces[46] at Crawdad[7] anonymize the location
field by giving a building level granularity of the AP’s location or by anonymizing the
building name with code names such as AcadBldg10AP3[46], which signifies an AP
(numbered 3) located in a building used for academic purposes. In this case, all the
buildings are grouped into building classes such as acadbldg, librarybldg etc. Tab.3.1b
shows how WLAN traces would look when anonymized for consistent and partial MAC
anonymization with reduced location information. We will attempt to extract private
information from traces which have been anonymized using this technique as this is
used by many trace providers[46].
55
Figure 3-1. Attacker capabilities
3.4 Attack Scenarios
In this section, we present techniques where user privacy can be theoretically
compromised. Fig. 3-1 shows attacker capabilities in terms of information related to the
traces collection environment he can access. Attacker is assumed to have access to
anonymized traces in all the scenarios. In this work we are, however, not dealing with all
the possible scenarios as our aim here is to bring forth the shortcomings of the current
anonymization, which can be achieved even if we can break the anonymization for one
case. We are considering two possible attack scenarios: one where attacker can inject
data into the traces by accessing the WLAN network (Sec. 3.4.1, 3.4.2 and 3.4.3) and
second where attacker has physical access to the campus but cannot access the WLAN
network (Sec. 3.4.4). If we can identify anonymized MAC address in the traces for any
user, we will consider that anonymity has been compromised. This can be justified since
the main purpose of anonymization is to prevent user identification is to prevent user
identification. Using this definition of compromise, we will show how an attacker can
identify his own anonymized MAC address and then how can he identify any other user’s
MAC address.
56
3.4.1 Identify Your Own MAC In Trace
Using the definition of anonymity compromise, even if an attacker can identify
his own MAC address, it should be considered a failure of anonymization techniques.
Although this is not a serious breach of privacy per se, yet an attacker can now use this
information to find out building codes and identify MAC addresses for other users. Steps
for obtaining one’s own MAC address are as follows:
1. Go to a WLAN covered area in the campus, at a time when it is not frequentlyvisited and the WLAN usage is minimum (find this pattern from the previoustraces).
2. Associate with an AP belonging to campus network, and mark the start time andend time.
3. If there are some people around the area, move to a new location which is at least100 ft away (beyond range of the previous wireless AP) and repeat Step 2.
4. Now go back to study the traces and find all the MAC addresses (anonymizedthough), which log-in at the same time and log-out at same time at the twolocations visited.
5. If there are several MAC addresses, one needs to repeat this experiment fromStep 1 to 4 and then take a intersection of the MACs. In the end, there should beonly one MAC address left after the intersection.
This will provide ones MAC address’s mapping in the traces. In Sec. 3.5, we mathematically
show that even in a large environment (over 500 AP), at most 5 iterations of steps 1 to 4
would be enough to identify your own MAC address.
3.4.2 Identifying Building Codes
Identifying the building codes is useful for finding users at a particular location. The
attacker who knows his anonymized MAC address can visit all the buildings of interest
in the campus and mark his login and logout time at each building. While looking back
at the trace he can reverse map all the building codes to actual building codes/names by
correlating the timings in the notes with the actual trace.
57
3.4.3 Identifying A Person
Once we have the building codes, one can target a specific person, follow him
and mark his device’s start or end times (observing opening and closing of laptop lid).
Filtering the traces with this approximate timing information and building information,
one should not get many sessions. If one does then one can repeat this process and
zero down to a single MAC address belonging to the target (publicly available schedules,
status messages on social networking websites can also be used to find approximate
login and logout timings).
To discover mapping of large number of MAC addresses to their real MAC address,
one can sniff all the wireless traffic at a location (AP) whose trace mapping is already
known, parsing this captured data for messages which clearly show that a machine is
trying to associate with the AP [68]. In this case, we have the precise time of the user’s
log-in and also the MAC address with location. Identifying his anonymized MAC should
be trivial. And once we know the mapping to real MAC address in the traces one can
track that person anywhere on the campus.
Using the above methods, in theory, an attacker can track any person throughout
the campus, causing a breach of privacy. This method presents a serious shortcoming
to the prevalent methods. It shows a possibility of a privacy attack without much effort.
If one does not have access to the campus Wi-Fi, one can ask a friend or one may use
social networking skills to ask a complete stranger to do it. We also observe that even if
the trace providers do not provide traces on daily basis, a careful planner can undertake
several such experiments and then wait for the trace provider to release the trace and
perform his attack.
3.4.4 Multiple Filtering
In above described methods, the attacker has to have a capability to inject data into
traces collection system (should have authorization to access the WLAN). In the current
case, we consider an attacker with no ability to access (and inject data into) WLANs.
58
He is limited to the physical access to the traces collection environment. Researchers
have attempted to classify WLAN users based on their genders[49]. We extend this
idea further by grouping users based on different categories like gender, login time,
building, and manufacturer of the device. We, then attempt to identify users who appear
under multiple categories (find intersection). In all these individual categories, the
group size is large (∼100). However, when we intersect the groups, this size drops
rapidly. For example, female student going to Law building in the morning with an
Apple computer resulted in a single user. This finding has privacy implications. Taking
the above example, just by watching a female student going to a law school building
with an Apple device in hand, should enable a attacker to go find the anonymized
MAC ID of the student in the traces. Once it is accomplished, the attacker can trace
the student’s movement throughout the campus. This is a serious breach of privacy.
We have conducted analysis to examine how many users can be identified using a
filter using gender, study major and network card manufacturer (on a feb2006 trace
downloaded from MobiLib[41]). We found that for 111 different filters (formed by different
combinations of gender,study major and manufacturer), 35% resulted in a single user
and 60% of the cases had less than 3 users (Fig. 3-2). We did the analysis for three
different traces periods (feb2006, oct2006, feb2007) and found similar results. We also
used different filters like gender-major-time, and again obtained a similar result. This
method exposes a major flaw in the anonymization technique.
.
3.5 Analysis and Mitigation
The attacks mentioned in the previous section were feasible because attacker
could identify unique WLAN usage in the traces. The attacker could identify MAC
address of his machine by creating usage patterns that were unique for that traces
collection environment. Patterns are formed because MAC addresses are consistently
anonymized. Therefore, considering all the sessions made by a device (identified by
59
Figure 3-2. Percentage of no. of users found, when 111 filters based ongender+major+manufacturer are applied
MAC address), one can identify individual usage sequences from fields in the trace
like location, start time and duration. For example, a user who starts using WLAN
everyday around 9 am is creating a pattern with respect to start time. This pattern
may not be unique as there may be several users starting WLAN usage around 9 am.
However, one can reduce the search space or may even make the pattern unique by
combining location and duration patterns with start time. Consider employees working
in same office space and having same office hours and work load. They would have
similar start time, location and duration patterns. However, if the office and residences
share a common WLAN service (say City-wide wifi or students living on-campus), the
location, start time and duration of WLAN at residences would become different for
all the users (unless each and every employee has the same residence and follows a
similar lifestyle!). The argument here is that users can have sufficiently unique usage,
which can be used to identify them even though traces are anonymized. In the next
two sub-sections we present our reasoning in support of the above argument. We do a
theoretically and a practical analysis on real WLAN traces.
60
0 20
40 60
80 100
0.02 0.04
0.06 0.08
0.1 0.12
0.14 0.16
0.18 0.2
1e-20 1e-10
1 1e+10 1e+20 1e+30 1e+40 1e+50
Number of Access Points (a)Percentage of A
P a user visits (p u)
Num
ber
of u
niqu
e us
age
patte
rns
UL at n=5
Figure 3-3. UL at n = 5
3.5.1 Theoretical Analysis
Mathematically, it can be show that each field in the trace can create enormous
amounts of patterns. For the sake of simplicity, we are only considering the patterns
generated by location because similar equations can be used for other fields. Let UL be
the number of unique usage patterns possible using location field only.
UL(a, pu, n) = Ca(a.pu).(a.pu)
n
where a is the total number of Access Points/locations, pu is the percentage of total
Access Points/locations a user visits, n is the number of sessions and C denotes the
combination function. Fig. 3-3 shows the distribution of UL. UL is a product of the
number of ways a.pu Access points can be selected out of total a Access Points (C a(a.pu))
with number of ways in which a.pu Access Points can be selected in n sessions ((a.pu)n).
As an example, consider a university campus having hundreds of buildings, say
University of Florida (UFL), which has over 500 hundred wireless access points, so
we can have 500 different values in the location field. It has been shown, that users
generally use less that 5% of the Access Points 90% of the time [17, 36]. Therefore,
in our case (a = 500 APs), we assume each person uses only 5% (= pu) of them
(a.pu = 25). Because in a pattern not only visiting a location but also the order of visiting
61
Table 3-3. Result of finding users with similar location visiting sequences with varyingduration of the trace
Period (5 Nov 2007) (5 to 11 Nov 2007) (5 to 18 Nov 2007) (Nov 2007) (Aug to Dec 2007) (Aug 2007 to Jul 2008)Total Users 9844 17602 22333 27068 47766 52217
100% match scoreusers 4288 4847 4969 4461 4288 4880
> 1 session 1477 1872 2061 1928 1840 2186> 5 sessions 31 121 108 131 187 235
90% match scoreusers 4291 4494 5300 4879 4743 5486
> 1 session 1480 2018 2391 2345 2294 2791> 5 sessions 34 268 439 548 642 839
80% match scoreusers 4473 6068 6924 6872 7484 8954
> 1 session 1662 3092 4015 4339 5036 6260> 5 sessions 113 1085 1777 2272 3057 3930
a location is important, we can see that total number of combinations of APs people
can choose from is C 50025 ∼ 1046. Assuming that traces contain only 5 sessions per
user (n = 5), the total number of paths possible for a user, using 25 APs, is equal to
255 = 9765625 ∼ 106. Therefore, the total number of unique location pattern possible,
UL is ∼ 1046 × 106 = 1052. Total number of students at UFL ∼ 5 × 104. So, theoretical
number of unique location pattern per user = 20×1046. Even though this is a very lose
upper bound and in reality this number can be smaller, what it shows is the enormous
number of possible unique patterns that can be generated using just one field (location).
This implies that theoretically every user can have a unique pattern in a short time,
which can be used to identify him. This further implies that sanitization techniques
cannot work well, if only the fields are anonymized; one should aim to anonymize the
patterns. One of the ways is to use inconsistent MAC anonymization, which is extremely
detrimental to the utility of the traces, the very reason traces are shared. A fundamental
question about the relationship between the utility and the anonymization/privacy is
evident here, which we plan to discuss in our future works.
3.5.2 Practical/Trace Analysis
To check the validity of the theoretically limits discussed above, we did an
experiment on WLAN traces coming from UFL for a period of one year. Tab. 3-3 has the
findings for users having same location visiting sequence. We calculate and distinguish
users based on location field using Longest Common Subsequence algorithm [26]. We
62
0
50
100
150
200
250
300
350
0 20 40 60 80 100 120 140 160 0
20
40
60
80
100
Num
ber
of s
essi
ons
(ni)
Pi
Users
Number of sessions (ni)Pi
Figure 3-4. Results of the combination generation and sequence matching for randomlychosen 230 users out of 27K users belonging to the month of Nov 2007.This graph shows Pi and ni .
find the number of users having similar location visiting pattern with at least one other
user, considering several time periods (1 day to 1 year), listing total WLAN users in
that specific period. Tab. 3-3 also shows number of users who had number of sessions
greater than 1 and 5. Results support our insight behind the theoretically limits. We
notice, that for a period of one year, only 4880 users had a similar location visiting
sequence with one or more users out of ∼52K users (9%), if we consider 100% match.
This means that almost 91% of users have distinct location visiting sequence and a
attacker following a user can later identify him/her in the traces with probability greater
that 0.9. Another result that further supports the above statement is that only 235 users
(0.45%), who have same location visiting sequence with other users, have more than 5
sessions (in case of 100% match score). This further strengthens the theoretical limit
we discussed earlier (to make it more interesting we found that most of these users had
logged in to the same access point throughout their multiple sessions).
We also attempt to identify the source of these sequences, which become unique
in a short time span. We note that not only each field can be used to form unique
sequences but several fields may be combined to form unique sequences. We
63
generate various sequences using several combinations of location field for a user,
maintaining the temporal ordering in the combinations. This helps us to identify how
much information an attacker may obtain about a user, even if the attacker follows him
for only a few sessions. Because of this, attacker would find information holes in the
observed sequence for the user. For example, he may be able to observe only 2nd , 3rd ,
6th, 8th and 10th sessions of a user. We investigated 230 randomly selected users from
a set of 27K users appearing in Nov 2007 WLAN traces from UFL. For each user, we
created all the possible combinations of sequences of length 5 using Location field,
maintaining temporal order (earlier we saw that users with number of sessions greater
than 5 sessions have higher chances of being unique). Each combination represents
a possible set of sequence an attacker may be able to capture by following a user,
assuming attacker may not be able to capture all the user sessions. This simulates
loss while capturing user information. Then we search for these sequences in traces
belonging to all 27K users. Let Pi be the percentage of matches for user i , where Pi
is defined as Pi = Mi/C ni5 . Here C ni5 represents the total number of combination of
sequences possible of length 5 for user i , ni is the number of sessions for user i and
Mi represents the number of matches found for C ni5 sequences in the trace belonging
to 27K users. Fig. 3-4 shows the results for this experiment. We find that out of 230
users, 78 had less than 5 sessions in the whole month and were discarded. For the
rest of the users we plot Pi in descending order along with ni . One interesting result
is that even when the total number of combinations generated is very high (ni = 100,
C ni5 = 75287520 ∼ 107) and the number of matches is very low (Mi = 81). This indicates
that if the location information of 5 sessions is available in temporal order with many
intermittently missing location information of a user, even then there is a very high
chance of identifying the user in the trace.
As per the analysis we conducted, there can to be two ways of mitigating the
attacks discussed in the previous section. One is to manipulate the traces in such a
64
manner that no one can identify unique patterns and the other is to prevent linking
of usage patterns to users. Both these abstract ideas can be applied to the traces
independent of each other. If one can identify usage pattern, but cannot assign it to a
specific user, one can never be sure of identifying the correct user or the correct pattern
of the user. On the other hand, if we can prevent linking of usage patterns to users, then
no matter how many unique usage patterns one can identify, one would not be able to
link it back to a user. Both methods should individually provide sufficient privacy for the
users. For the first method many techniques exist in literature such as k-anonymity [76]
or l-diversity [57]. For the second method, we need to devise techniques, which can
obscure linking information.
3.6 Conclusions and Future Work
We have uncovered a serious problem in the way WLAN traces are anonymized.
We believe that this kind of attack is possible as WLAN traces have human behavior
pattern embedded in them, which can be easily observed by an attacker following the
victim. The aim of any privacy protecting technique should be to ensure that even if
attacker has access to all the publicly available information about a user or a group of
users (but not the mapping between anonymized MAC and real MAC), he should not be
able to reduce the sample size below a number, say K. This K should be a parameter
configurable by the trace releasing authority.
In the future, we plan to work on the feasibility of anonymizing using techniques like
perturbations and release of traces in multiple different formats like one with no location
or time information. We would also like to investigate in further details how the fields
like start time, duration and locations are responsible for generating unique patterns.
It may be due to the atomic properties of these fields like periodicity and history. We
would like to work on a system, which can generate anonymized traces according to
the security clearance of the demanding user, this would allow us to serve traces with
65
varying anonymization and privacy criterion and would make traces more useable. We
also plan to investigate, if k-anonymity model [76] can be applied to WLAN trace.
Findings in this work certainly call for a new research in the area of WLAN trace
anonymization and privacy, details of which are to be pursued in our future work.
66
CHAPTER 4AN ENCOUNTER-BASED FRAMEWORK FOR TRUST
The success of future mobile applications hinges on its wide adoption and
acceptance by the mobile users through increased interaction and cooperation. These
factors become particularly crucial for emerging classes of mobile networks that include
peer-to-peer networking; such as mobile ad hoc (MANETs), sensor and delay tolerant
networks (DTNs). This study introduces and investigates a new mobile application
aiming to improve interaction and cooperation by leveraging social connections and
gaining confidence and trust in new opportunistic encounters.
The establishment of trustworthy networking is of prime importance, since most
interactions rely on trust establishment. This challenging problem is further exacerbated
by the uncertainty and dynamics in mobile networks. Furthermore, in MANETs and
DTNs cooperation and trustworthy networking are imperative to the construction and
operation of the network, without which these networks would fail.
Several factors pose great challenge to the practical and effective study and
establishment of trust and confidence. First, conventional reputation and credit-based
systems rely on prior interaction to score trust. However, in the absence of such prior
interactions (due to introduction of new technology or psychological barrier), such
systems are not effective. We refer to this problem as the trust bootstrap problem, and
its solution is essential for jumpstarting trustworthy operation. Second, the utility of the
trust system is difficult to validate against the ground truth. Trust is a social trait; it is
subjective and contextual. Only through deployment and testing can the efficacy of such
a system be evaluated. Third, attacks to gain unwarranted trust are harder to detect due
to mobility, resource-constrained devices or lack of infrastructure. A secure trust system
should be stable and resilient against attacks.
At the same time, several unique characteristics of mobile networks provide
new opportunities to tackle the above challenges. The use of short range radios
67
(e.g., Bluetooth, Wi-Fi) enables detection and utilization of proximity and encounters.
Encounters represent an interesting primitive that can be used to construct abstractions
for reasoning probabilistically about trust, and for establishing encounter-based
keys [24, 55] that can seed future secure communications. In addition, the increased
capabilities of mobile devices, in terms of computation, storage, communication and
sensing, can add important contextual information to encounters, such as locations,
events, and statistical history. The processing of such information could augment the
users network view and awareness to score trustworthiness of other nodes and to
establish encounter keys or challenges (through out-of-band face-to-face exchanges).
Furthermore, the tight coupling between users and mobile devices enables new and
accurate ways to establish behavioral profiles that can be used to fine-tune the trust
processing; e.g., by adding more weight to trusted locations. It is the fusion and
integration of these multi-dimensional data, that provide the promise in establishing
trustworthy opportunistic networking in ways we could not before, and in ways that are
not possible in wired networks due to lack of connectivity proximity.
This study introduces a systematic framework and new protocol for gathering and
processing the above information to gain confidence and trust1 . Our protocol is fully
distributed, self-bootstrapping, and integrates attack resilience mechanisms. The core
of our method utilizes a trust adviser algorithm that employs a set of parameterized
trust filters. The trust filters analyze mobile encounters, proximity, location, and context
data in novel ways, to augment the users network view and awareness. Its goal is to
identify opportunities of trust (or attack prevention) based on weighted filter scores that
are coupled with the users input and encounter keys to build a trustworthy node list.
1 We shall use the term ‘trust’ to indicate confidence and opportunities to exchangeencounter-keys in mobile networks.
68
Focus is given to the investigation of the relationship (or lack thereof) between behavior
similarity (i.e., network homophily) and trustworthy mobile networking.
Effective establishment of trustworthy networked mobile communities can enable
several potential applications; including mobile social networking, formation of interest
communities and support groups (in health care, education), localized response and
emergency notification, context aware and similarity-based networking [8, 39], and worm
vaccination [77].
Our protocol’s mechanistic design and implementation strive to achieve the following
main design goals: stability, scalability, efficiency, distributed operation, and resilience. In
addition, careful thought is given to utility, accuracy and simplicity of the application.
Evaluation of the proposed trust adviser filters and app is a three-phase process: i-
real world mobile networks trace statistical analysis, ii- extensive trace-driven simulation
of the framework components, and iii- prototype implementation and participatory testing
on smartphones. First, we use wireless network traces from 3 different major university
campuses spanning 9 months with over 70K users and 150 million encounters. We find
that several filters possess desirable stability characteristics, and that trust scores in
general form a small world. Resilience to attacks (using anomaly detection) achieves
less than 10% false positives and 7% false negatives. Second, we measure the
effectiveness of ConnectEnc on epidemic routing in DTN with selfishness using the
new trust routing engine, and obtain stable trust routing without the sacrifice of network
performance. Third, we conduct a series of surveys and participatory experiments
to evaluate the performance of ConnectEnc against the ground truth. We find users’
willingness to trust others in a mobile network has a statistically strong correlation with
their behavioral similarity. Further, ConnectEnc filters can capture 80% of the already
known user within top 25% of the encountered users.
Key contributions of this work include: 1. introducing a framework to augment
mobile user’s perception and awareness of the network neighborhood by fusing
69
multi-dimensional encounter and contextual data, 2. analyzing various trust adviser
filters with extensive network traces, 3. propose a model for anomaly or attacker
detection, 4. developing a mobile app ‘ConnectEnc’ that integrates the filters and
contextual information to aid user trust classification, and 5. deployed ConnectEnc as
proof-of-concept and to evaluate the system based on ground truth via participatory
testing.
4.1 Related Work
Several researchers have proposed novel approaches to establish trust and
cooperation in ad hoc and DTNs using credit and reputation based schemes, incentive
based schemes, and game theory.
The reputation based schemes target better peer selection based on previous
interaction records and transfer by rating trust and cooperation to nodes in a mobile
ad hoc network. In [20], a node detects misbehavior locally by observation and use
of second-hand information. In [19], a fully distributed reputation system is proposed
that can cope with false information, where each node maintains a reputation rating
and a trust rating for other nodes. In [14, 29, 70], analysis of rewards provisions and
punishment is conducted based on game theoretic approaches to provide incentives
for message delivery. In [13], authors derive performance and optimization statistics to
measures the success in delivery probability for a message covering both cooperative
and non cooperative scenarios. The study in [67], analyzes the effect of cooperation on
three different routing algorithms. The authors investigate the performance of epidemic,
two-hop relaying and binary spray and wait routing to model a node’s cooperation
probability to either drop or forward a message. The incentive based credit schemes
rank trust for neighboring nodes. In [22], authors propose a game-theoretic model to
discourage selfish behavior and stimulate cooperation by leveraging Nash equilibria with
socially optimal behavior. In [84], authors propose a pricing mechanism to give credits
70
to nodes that participate in the message forwarding mechanism. The cooperation is
developed based on the number of messages transfered by the users.
A common theme in these works is the reliance on device interaction to evolve
the trust scores. Inherently, this creates an undesirable circular dependence, where
interaction requires technology adoption (say of ad hoc networks or DTNs), which -
in turn - requires trust. Hence, there is a compelling need for a bootstrap mechanism
for trust, which we directly address in our design. Furthermore, other studies do not
utilize encounter context which we do focus on in this paper. Our work contributes
towards solving this challenge by providing inputs from user’s location preferences
and contextual (e.g., social) behavior. It then uses the trust established using iTrust
to establish further trustworthy communication in various types of mobile networks,
including, but not limited to, ad hoc networks and DTNs.
Message delivery mechanisms in ad hoc, sensor and delay tolerant networks
necessarily require node cooperation. However, in reality due to selfishness or lack of
trust some nodes may not cooperate. Lack of cooperation, in such cases, may largely
disconnect or partition the network. Such selfish nodes (or free riders) [61] could exploit
network services but refuse to forward messages. An analytical model that builds the
concept of trust is discussed in [42, 58]. The authors show trust supports cooperation
and is heavily based on the interactions and bonds that govern behavior in ad hoc
and opportunistic scenarios. Other approaches discussed in [24, 55, 59] propose
explicit authentication mechanism to generate trust and cooperation in network. These
approaches are better modeled for small groups [55] and require exchange of public
keys and the installation of the private key on the users device [24]. We shall borrow
from these works for the establishment of opportunistic encounter keys in our trust
framework.
A few studies [53, 56] have attempted to use encounter information to route in
DTNs. These protocols seem to contribute towards improved prediction and routing in
71
DTNs. However, the relation between encounters and trust has not been investigated
in terms of the ground truth. The focus of this work is to establish such relationship
between encounter statistics, stability, location, and context and trust through thorough
systematic analysis as well surveys and experimentation. Ours is the first work, we are
aware of, to contribute to this area of research.
4.2 Architectural overview
In this section, we describe the design goals and major components of iTrust and
their functionality. We begin with design goals, then present a high level diagram of
the design in this section. We then proceed to describe all the modules in the following
sections.
4.2.1 Design Goals
The main design goals for the iTrust protocol include:
1. Accuracy - The recommendations should be as close to users perception aspossible. We achieve this by utilizing state of the art trust advisers and adaptingrecommendations based on the users’ usage of the protocol.
2. Robustness - The trust recommendation should be stable over time and insensitiveto minor, temporary changes and noise in user behavior. Outliers and anomaliesshould be detected and removed.
3. Energy Efficiency - Mobile devices are energy constrained. iTrust should striveto minimize use resource of the device in terms of computation, storage andcommunications.
4. Distributed Operation - iTrust should be able to provide all the functionalities in adistributed fashion without the need for a centralized infrastructure or trusted thirdparty.
5. Privacy-Preservation - the usage of the protocol should not affect the privacy of theuser. All operations should be performed locally on the user’s device. Informationabout user, if any, should be send out of the device only on user’s command.
6. Resilience - The system should function properly in the face of intrusion attacksand selfishness. We propose an anomaly detection technique to avoid intrusionattacks. Selfishness, or lack of cooperation, is investigated and analyzed especiallyin the context of ad hoc and delay tolerant networking.
72
Figure 4-1. Block Diagram overview of the iTrust architecture. Dotted lines indicatemodules needed by iTrust. Shaded blocks indicate modules discussed in thiswork.
Other goals include: the ability of the protocol to augment and integrate with other
reputation and credit based trust systems, the capability to bootstrap trust (without
requiring device cooperation), and flexibility to utilize other user preferences and
information (through external sources and social networks) in the future.
4.2.2 Overall Design
Fig. 4-1 provides an architectural overview of the iTrust framework and its
interconnections with related subsystems. The main component of the iTrust engine
(shaded blocks) includes: a. trust adviser filters, b. trust recommendation generator,
c. weight generator, and d. anomaly detector. Other modules (inside the dotted line)
needed by iTrust include: a. radio scanner, and b. locator.
The ‘Trust Adviser Filter’ is the block that generates trust scores using a family
of filters (described in the next section). The different trust lists (produced by different
filters) are fed into the ‘Trust Recommendation Generation’ module. This block combines
all the trust filter results with the input from anomaly detection, recommendation system,
73
reputation system, and black and white lists using the weights generated by the ‘Weight
Generator’. The ‘Weight Generator’ uses built-in weight scores and adapts itself using
the selections made by the user. The ‘Anomaly Detection’ provides a recommendation
regarding suspicious encounter activities. This can also take user’s input if needed. The
‘Short Range Radio Scanning’ module provides basic encounter information. Similarly,
the ‘Location Information’ module provides the device’s positioning data to ‘Trust Adviser
Filters’. Other modules such as ‘Reputation’ and ‘Recommendation’ provide extra
functionality and would be based on already existing techniques.
With this conceptual understanding of the system, we now describe each of the
module shown in Fig. 4-1.
4.3 Trust Adviser Filters
The trust adviser filters constitute the heart of iTrust. Its function is to provide
meaningful, stable scores of trust for encountered devices.
The primary motivation of our work is to : a. encourage interaction in mobile
societies and adoption of new mobile services (e.g., mobile social networks) b. establish
network connectivity in the context of ad hoc networks and DTNs. Trust can inspire
cooperation in networks, particularly in infrastructure-less networks. Here, trust means
that a user: 1. is willing to interact through the network with trusted nodes, and 2. In
DTNs, is ready to accept a message for the trusted user and genuinely attempt to route
it. To develop trust between a pair of users, we leverage proximity of mobile users (when
the devices come within radio range) and encounter information, location and context.
Several properties of nodal encounter behavior have been investigated in [40].
Our primary reasons for choosing encounters and proximity as measures to
generate trust are inspired by the work on homophily [60]. The principle of homophily
suggests a strong correlation between similarity of interest and frequency of meeting
and interaction. Trusting frequently encountered users would mean trusting similar
people (e.g. work colleagues or classmates). This trust can have social incentives too.
74
Second, when users are within the radio range of each other (for Bluetooth it is ∼15m),
they can potentially exchange out-of-band information including identity information and
cryptographic keys [24]. Such proximity-based out-of-band information exchange is
not possible in wired networks (inherently relational graphs, as the two terminals may
be geographically far apart) but can be utilized in mobile networks (inherently spatial
graphs).
The challenge is to find methods that can successfully discover potential similarities
between the users. We refer to these methods as Trust Adviser Filters. In the implementation,
a user would decide on which users to trust and the filters would serve as an adviser.
Thus, users would have full control over the selection of trusted users. These filters
would act as the scoring system that recommends users who are most similar to the
user. We have classified the filters into two major categories (Aggregation and Behavior
based) based on the similarity they measure. A third category of filters (Hybrid Filter)
combines results from the two main group of filters to produce a trust score.
4.3.1 Aggregation Based Similarity
These filters aggregate the encounter data using statistical methods and provide a
measure of encounter-based similarity. We present two such filters based on frequency
and duration of encounters.
4.3.1.1 Frequency of Encounters (FE)
One of the basic filters to estimate similarity between a user-pair is the number of
times they encounter (An encounter is defined as the event where a device is in radio
range of another device to allow device discovery ). This filter assumes that the more
devices meet, the more similar (and are more trustworthy) they are. On this assumption
(which may not be always valid), we design the FE filter that counts the frequency of
the encounters of the user with all the other users. To get the trust list from FE filter for
a user, we sort all the encountered users by their number of encounters and select top
users based on the trust (T ) value.
75
4.3.1.2 Duration of Encounters (DE)
The percentage of time spend by a user with another user is another measure
of similarity. The more the time spent together by the users, the more similar (and
trustworthy) they are likely to be. On this basis we design the DE filter to keep count
of the duration of time spent by a user with all the other users. From the ordered list of
duration of encounters for the user, DE filter selects top trusted users based on the T
value.
4.3.2 Behavior Based Similarity
Behavior based similarity measures similarity based on location visitations and
preferences. We couple location information with encounters to determine the similarity
between users.
4.3.2.1 Profile Vector (PV):
To capture behavioral characteristic, we have designed PV filter that stores location
visitations of a user in a single dimensional vector. It is assumed for this filter that a
device has some localization capability, which is quite common for today’s devices. Each
device maintains a vector. The columns of the vectors represent the different locations
visited by a user and the values stored in each cell indicate either duration or count of
the sessions at that particular location. At each location visit, the vector is updated with
respect to the location.
To get similarity score, this vector is exchanged with other user and the inner
product of the two vectors is computed. This similarity score is higher if the two PVs are
similar and can be zero, if the users do not have any visited location in common. Here,
implicit weight is given to locations based on the count/duration spend. We can also
provide an option to the user, where the user can give weights to the locations explicitly.
However, this filter is not privacy preserving and can introduce attacks in the
system, where a user can tamper with its vector, also there are communication costs
76
Figure 4-2. Location Vector LV for a user
involved in exchanging the vectors. This problem in solved by LV filters at cost of having
lesser information to compute similarity scores with.
4.3.2.2 Location Vector (LV):
LV filter is very similar to PV, except that a user not only maintains a vector for itself
but also for each of the encountered users. The columns of the vectors represent the
different locations visited by a user and the values stored in each cell indicate either
duration (LV-D) or count (LV-C) of the sessions at that particular location. For every
encounter, the vector for the encountering node is updated with respect to the encounter
location. Illustration in Fig. 4-2.
Since vectors for all the encountering users are maintained locally on the device,
LV requires no exchange of vectors among users for calculating similarity. This is more
privacy-preserving and more resilient to attacks since only first-hand information is
used (equivalent to what user might have observed). This privacy comes at the cost of
requiring extra storage space for storing vectors for each user. Considerable storage
optimization is achieved by storing (for each encountering user) only the locations where
encounters happened. Similarity calculations are similar to PV.
4.3.2.3 Behavior Matrix (BM)
The behavior matrix captures a spatio-temporal representation of user behavior.
Columns of the behavior matrix denote a location and rows represent a time unit (here
the time unit is taken as a day for simplicity). The value stored at each cell is a fraction
of the on-line time spent by the user at a particular location on a particular day (see
77
Fig. 4-3). Each user maintains their own matrix. To get the similarity score, users can
exchange and compare the two matrices.
To make the behavior similarity check efficient (in terms of space and computation
complexity) and privacy preserving (as only the summary of matrix is exchanged), we
use the eigen values of the behavior matrix for exchange between the two users. The
eigenvalues are generated using SVD (Singular Value Decomposition). SVD is applied
to a behavior matrix M, such that:
M = U ·Σ · V T , (4–1)
where a set of eigen-behavior vectors, v1, v2, ..., vrank(M) that summarize the important
trends in the original matrix M can be obtained from matrix V , with their corresponding
weights, wv1,wv2, ...,wvrank(V ) calculated from the eigen-values in the matrix Σ. This set of
vectors is referred to as the behavioral profile of the particular user, denoted as BP(M),
as they summarize the important trends in user M ’s behavioral pattern. The behavioral
similarity metric between two users’ association matrices A and B is defined based on
their behavioral profiles, vectors ai ’s and bj ’s and the corresponding weights, as follows:
Sim(BP(A),BP(B)) =
rank(A)∑i=1
rank(B)∑j=1
waiwbj |ai · bj | (4–2)
which is essentially the weighted cosine inner product between the two sets of
eigen-behavior vectors.
4.3.3 Hybrid Filter (HF)
Each filter provides a different perspective on an encounter or behavioral aspect.
The hybrid filter provides a systematic and flexible mechanism to combine the scores
from all filters and present a unified score to the users. The selection of weights for
various filters would depend on several factors including user’s preference and feedback
(check Sec. 4.6.1) and application requirements. A generic Hybrid Filter score (H) for a
78
Figure 4-3. Behavior Matrix for a user
user Uj can be generated by using the following:
H(Uj) =
n∑i
αiFi(Uj) (4–3)
where Fi(Uj) is the normalized score for user Uj according to filter i . The αi is the weight
given to filter score Fi and n is the total number of filters used. We select αi such that∑αi = 1, and 0 ≤ αi ≤ 1.
Note that our design (Fig. 4-1) provides feedback to the system based on user
selections. This feedback can be used to make the weights adaptive.
Decay of filter scores: Social science studies have shown that social relationship
are dynamic and require frequent interactions to prevent decay. The strength of
relationship wanes with the increase in time between interactions. This decay follows
a exponential decay pattern with half time dependent on the relationship type [21] (3.5
years for family, 6 months for colleagues). Configurable decay was integrated in our
ConnectEnc app with default halftime set to 6 months.
79
Table 4-1. Overhead of Filters in terms of processing and storage. Here m is the totalno. of records in the encounter file, n is the no. of unique encountered user, lis no. of locations visited d represents the no. of days used for BMcalculations. We also assume that m >> n.
Filter Processing Overhead Storage OverheadFE O(m) O(n)DE O(m) O(n)PV O(m) O(l)LV O(m) O(nl)BM O(m) O(ld2) for SVDHF O(n) O(n)
4.4 Anomaly Detection
Incorporating resilience to attacks is a primary requirement for our design. Here, the
attack on the trust system includes an attempt by an untrusted user (e.g. a stranger)
to gain trust of the system in a relatively short time by injecting many encounter
events (e.g. via stalking). A growth of trust scores in this fashion can be considered
an anomaly, and a specialized anomaly detection system is needed to combat such
attacks. Since iTrust scores individual encountered nodes, at present we consider single
attacker scenarios.
An attacker would want to get onto the trusted list as soon as possible to have
high effects for limited effort. The goal of the trust system design would then be to
considerably raise the level of effort needed for a successful attack, to be no less than
genuine trusted nodes and friends, which may entail weeks of consistent encounters
at trusted locations by the attacker. The spatio-temporal granularity used in our adviser
filters determines such attack effort and provides us with the anomaly we aim to detect.
Note that in our implementation, a user can opt to approve or remove any trusted node
before being added to the trusted list. The role of anomaly detection would then be to
raise a red flag when an attack is suspected.
Our anomaly detection approach investigates the evolution of encounter patterns
and trusts over time, and does not require information exchange between nodes.
80
Normal operation is observed where regular users are encountered over time. The
anomaly detection mechanism considers the slope of the growth of encounter statistics
(including frequency, duration or behavioral similarity as defined by the trust adviser
filters). The detection system learns normal behavior over time, and incorporates
deviations from the normal to detect suspect nodes and trigger user alerts. Admittedly,
this approach has promise when the user’s behavior is considered normal. In situations
where encounter patterns fluctuate considerably (e.g., during irregular events, trips or
city change), a re-evaluation of this approach is warranted (part of future work).
4.4.1 Detection Model
For attacker detection, we integrate scores from various filters with location
information as available. For example, using FE , the slope and standard deviation
of growth of trust score per user can be used to identify outliers, marking them as
attackers. If we consider the LV filter, attackers can be identified by comparing the
differences in scores based on locations (users encountered at more locations than
others).
Here, we use FE filter as an example to design the anomaly detection system. For a
user, we define a function, FE(i ,T ) that yields the FE score for encountered user i after
time T .
Since we are calculating slope of trust score growth over time, time interval needs
to be defined in two ways and therefore slope will be defined in two ways. Time interval
can either be total number of days since the first encounter or it can be the sum of
number of days when encounter happened. These two methods are necessary to
ensure that an attacker who waits for a long duration after an initial encounter with the
user to have multiple encounters in a short time does not go undetected because of a
slow growth slope. Two slopes are called γ1 and γ2:
γ1i(T ) =FE(i ,T )
C1(i ,T )(4–4)
81
γ2i(T ) =FE(i ,T )
C2(i ,T )(4–5)
where, function C2(i ,T ) gives the number of days since the first encounter with
user i and C1(i ,T ) gives the sum of number of days when encounter happened with
user i .
To detect attacker from other users, we propose to select neighbors of user i in
terms of number of encounters with the user, creating a set Si ,T . Here, Si ,T is a set of all
users k (0 < k < n and k = i ) who encountered the user and also satisfy
|FE(i ,T )− FE(k ,T )| ≤ x , (4–6)
where x is the parameter set by the user. The input, x , determines the number of users
(size of neighborhood) to consider to identify an attacker (anomaly). The users in Si ,T
are similar to suspect user i as they have similar FE score. Through the users in Si ,T ,
we can determine the mean and standard deviation of the slope of the neighbors. Mean
(µ1) and standard deviation (σ1) can be calculated as shown below (µ2 and σ2 can be
obtained similarly)
µ1 =
∑uϵSi ,T
γ1u(T )
|Si ,T |(4–7)
σ1 =
√√√√√√∑uϵSi ,T
(γ1u(T )− µ1)2
|Si ,T |(4–8)
A user, i , is classified as an attacker if the slope (γ1 or γ2) is greater than µ1 +
(κ × σ1) or µ2 + (κ × σ2). Here, κ is a multiplying factor ranging from 1 to 3 that we
need to investigate with the traces to discover the optimal performance of our detection.
This kind of detection is refereed in academic literature as Nearest Neighbor based
anomaly detection [23]. In essence, we consider all the encountered users who have
82
score similar to the user being evaluated (neighbor size controlled by x) and if the slope
(both measures) of the user is different than that of the neighbors, the user is suspected
to be an attacker and flagged for evaluation.
4.4.2 Attacker Model
Evaluating the anomaly detection system designed in Sec. 4.4 is challenging as we
do not know how to model attackers. We assume that before this service is available for
general use, this kind of attack would not happen. So the traces we have will not have
any patterns belonging to the attacker we have discussed here. This makes detection
and validations difficult. To deal with this challenge, we present here an attacker model
created so as to beat the anomaly detection we earlier designed (it is just one of the
possible models for attackers).
We have created a parametrized model for the attacker, based on number of
encounter, Max days available and periodicity of encounters. Number of encounter, is
the number of encounters an attacker will have. In the simulation this number is kept
close to the minimum number of encounters needed to overcome the trust threshold.
Max days provides the length of period in which attacker can have encounters. Period-
icity of encounter provides the pattern of encounter information. The attacker follows a
periodic encounter pattern as it has been shown by studies (cite sungwook globecom)
that users show periodic encounter behavior (such as weekly pattern). However, the
period may vary from user to user. The attacker would like to follow the pattern displayed
by other encounters so as to reduce suspicion. (Even though, in reality, attacker may
only guess and not accurately get the periodicity information). In our work here, we have
considered time granularity of days i.e. we consider cumulative encounters on per day
basis. It is also possible to take seconds, minutes, hour, or week based time granularity.
The effects of changing time granularity are not discussed in this work and are left as the
future work.
83
Using the periodicity information, we can identify the days during which attacker
has encounters (restricted by Max days). Then we distribute the sessions equally to
each day of encounters. In our simulations we vary Max days from 1 to 30 (the trace
we consider for anomaly detection is 30 days long). For each value of Max days, we
compute the attacker pattern (AP). This AP is then injected back into the traces and
anomaly detector is run on the entire traces to detect it. The Algorithm 1 describes the
model used for the attacker.
Input: time period allowed for attack (MaxDay), average days (AvgDay), Numberof Encounter (NumEnct)
Output: Attacker Pattern (AP[])
for i ← 0 to MaxDay doAP[i]← 0
endEncDay← NumEnct / (AvgDay ≤ MaxDay ? AvgDay:MaxDay) ;period← ceil(MaxDay /(AvgDay ≤ MaxDay ? AvgDay:MaxDay) - 0.5) ;left← 0for i ← 0 to MaxDay , Steps = period do
if AvgDay == 0 thenBreak ;
endAP[j]← = EncDay ;left← left + EncDay ;AvgDay← AvgDay - 1 ;
endleft← NumEnct - left ;j← 0 ;while left != 0 do
ap[j]← ap[j] + 1 ;left← left - 1 ;j← j + period ;if j ≥ MaxDay then
j← 0 ;endfor j ← 1 to MaxDay do
ap[j]← ap[j] + ap[j-1] ;end
endAlgorithm 1: Algorithm of Attacker model for Anomaly detection
84
0 5 10 15 20 25 30
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
Sco
re o
f FE
filte
r
Time in Days
Figure 4-4. The growth of trust score using FE filter for a specific user. Each linecorresponds to an encounterd user.
0 5 10 15 20 25 30
30
40
50
60
70
80
90
100
110
120
130
Sco
re o
f FE
filte
r
Time in Days
Figure 4-5. The growth of trust score using FE filter using the attacker model. Each linecorresponds to an instance of attacker generated by the model.
85
Fig. 4-4 shows a sample trust score growth using FE filter. We can observe how
trust score for encountered users progresses for a user. We notice that around day 13,
17, 23, 25 most of score curve is slanted and its in these days where score increases.
The score change (or encounters) does not happen everyday, instead it happens on
certain days and it is broadly periodic. This periodic property is captured in our attacker
model too. Fig. 4-5 shows multiple attacker encounter patterns specific targeted for
a specific user. The curves are similar to the previous figures, however more dense
(because number of patterns here are much more than Fig. 4-4) and score building
starts early (This should not make a lot difference as in slope calculations γ2 we only
consider exact days of encounter, so total length of period will not have any effect).
4.5 Trace Based Evaluation and Analysis
In this section, we evaluate the design of iTrust filters including anomaly detection
and analyze the effects of recommendations on DTN routing with selfish nodes. Since
much of the following analysis use WLAN traces, we begin with describing the traces
used and then proceed to the evaluation.
4.5.1 Traces
To evaluate our design, we consider anonymzied trace sets from three universities
(see Tab. 4.5.1 for more details; the information provided in the traces is anonymized).
Tab. 3.1 shows a sample trace used in this work. The advantage of using WLAN
traces is that they are much closer to reality in terms of user mobility than the existing
synthetic mobility models. However, these traces, much like other real traces, have small
percentage of noise and error. We assume that users associating to same wireless
access point encounter each other as the range of an access point is generally less
than 50 meters in an indoor environment and most of the traces are from indoor usage.
Also, since only a few users may change/modify the MAC address of their devices, we
assume that a MAC address uniquely identifies a device and is always associated to a
single user. There could be a few users who share the devices.
86
Table 4-2. Facts about studied tracesTrace Source U1 USC [41] Dartmouth [7]
Time/duration of trace Fall 2007 Spring 2007 Fall 2005Start/End time 09/01/07-11/30/0701/01/07-03/30/0709/01/05-11/30/05
Unique Locations 845 APs 137 buildings 133 APsUnique MACs analyzed 34694 32084 4906
0 5,000 10,000 15,000 20,000 25,000 30,0001
10
100
1,000
10,000
100,000
Users
Num
ber
of E
ncou
nter
s
A. Frequency of Encounter (FE )
0 5,000 10,000 15,000 20,000 25,000 30,0001
10
100
1,000
10,000
100,000
Users
Ave
rage
Enc
ount
ers
Dur
atio
n(in
sec
)
B. Duration of Encounter(DE )
0.0 0.2 0.4 0.6 0.8 1.0100
1000
10000
100000
1000000
Use
r Pai
rs
Similarity Score
C. Location Vector using (LV −D)
0.0 0.2 0.4 0.6 0.8 1.0100
1000
10000
100000
1000000
Use
r Pai
rs
Similarity Score
D. Behavior Matrix (BM)
Figure 4-6. Similarity score for various filter for all the encountered pairs of users in Nov2007 from U1 trace
87
0 1 2 3 4 5 6 7 8 9 10
30
35
40
45
50
55
60
65
70
75
Sim
ilarit
y P
erce
ntag
e
Length of History in Weeks
DE vs FE DE vs LV-C DE vs LV-D FE vs LV-C FE vs LV-D LV-C vs LV-D DE vs BM FE vs BM LV-C vs BM LV-D vs BM
A. U1
0 1 2 3 4 5 6 7 8 9 10
25
30
35
40
45
50
55
60
65
70
75
80
85
90
Sim
ilarit
y P
erce
ntag
e
Length of History in Weeks
DE vs FE DE vs LV-C DE vs LV-D FE vs LV-C FE vs LV-D LV-C vs LV-D DE vs BM FE vs BM LV-C vs BM LV-D vs BM
B. Dartmouth
0 1 2 3 4 5 6 7 8 9 1025
30
35
40
45
50
55
60
65
70
75
80
85
90
95
Sim
ilarit
y P
erce
ntag
e
Length of History in Weeks
DE vs FE DE vs LV-C DE vs LV-D FE vs LV-C FE vs LV-D LV-C vs LV-D DE vs BM FE vs BM LV-C vs BM LV-D vs BM
C. USC
Figure 4-7. Correlation between the trusted lists produced by various filters at T=40%
4.5.2 Filter Evaluations
Using the traces, four properties of the filters are investigated: 1. Ability of filters
to distinguish between different encounters (statistical characterization), 2. Correlation
among filter results, 3. Stability over time, and 4. Small world characteristics. Then the
results from anomaly detection are discussed.
To generate the trust scores from various filters, WLAN trace is converted to
encounter trace for each user by determining and storing all the other users who had
overlapping sessions with this user at the same access points (location). Filters take
88
86
88
90
92
94
96
98
100
1 2 3 4 5 6 7 8 9
Sim
ilarit
y P
erce
ntag
e
Length of history in weeks
1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9 Week
A. Duration of Encounter(DE )
86
88
90
92
94
96
98
100
1 2 3 4 5 6 7 8 9
Sim
ilarit
y P
erce
ntag
e
Length of history in weeks
1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9 Week
B. Frequency of Encounter (FE )
86
88
90
92
94
96
98
100
1 2 3 4 5 6 7 8 9
Sim
ilarit
y P
erce
ntag
e
Length of history in weeks
1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9 Week
C. Location Vector - Count(LV − C )
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Sim
ilarit
y P
erce
ntag
e
Length of history in weeks
1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9 Week
D. Location Vector - Duration (LV −D)
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Sim
ilarit
y P
erce
ntag
e
Length of history in weeks
1 Week2 Week3 Week4 Week5 Week6 Week7 Week8 Week9 Week
E. Behavior Matrix (BM)
Figure 4-8. Comparison of trust list belonging to different history for various filters atT=40% (note that the y-axis scale for DE , FE , and LV − C starts at 85% andfor LV −D and BM the scale starts at 35%)
89
encounter trace as an input and produce a ranked list based on the similarity measure
used by that filter. For analysis, we pick top T% users from these ranked list.
4.5.2.1 Statistical Characterization
The proposed filters are justified if the scores generated for encountered users allow
us to discern them. In this section, we consider one month long WLAN trace from U1
(other traces have similar characteristics) and present the distribution of trust scores for
all filters, for all the encountered pairs (see Fig. 4-6, LV-C’s characteristic are similar to
LV-D).
We notice that for FE filter, 3,000 users have over 1,000 encounters each in
a month and more than 15,000 users (over 2/3 of the population) have over 100
encounters. Similarly, for DE filter, average encounter duration for more than 20,000
users is over 1,000 seconds. Results from LV-D filter show that large number of user
pairs have low score(close to zero), which may mean that most of the users are not
similar to each other and we see that only a few user pairs have a high similarity score.
BM filter score, like LV, is close to zero for most of the user pairs and is high for a few
user pairs. These results justify our choice of filters for distinguishing encountered users.
4.5.2.2 Correlation
We examine the degree of similarity (correlation) among trust lists from different
filters. While high similarity indicates redundancy of the filters, low similarity implies
orthogonality of the trust recommendations. For this investigation, we have considered 9
week long traces and created trust list at T = 40% for varying length (at 1 week interval)
of encounter history (results for other T values show similar trend).
As Fig. 4-7 shows, the trends are similar across the traces. LV −D and LV −C filter
results show ∼70% similarity as the list stabilize around 9 weeks of history. FE v.s. DE
stabilize around 60% to 70%. Rest of the filters stabilize between 55% to 30%, meaning
they produce different sets of trust list. The low similarity indicates that filters are not
redundant and can be used to generate rich set of recommendations.
90
4.5.2.3 Stability
Fluctuations in trust recommendations over time could confuse users. Therefore, it
is imperative to examine stability in the trust recommendation over time. We investigate
the stability of trust lists at T = 40% using 9 weeks of U1 traces (other T values and
traces show similar trend). Trust list comparison from multiple length of traces is used to
examine stability.
More than 90% similarity is found between 1 and 9 weeks trace for DE, FE and
LV-C filters (see Fig. 4-8), implying that users selected in 1st week of encounter
continued to be in the trust list of 9 week long encounter history. BM filter shows
high stability when the difference in history is less than 2 weeks ( 80%) and falls to
55% for 1 week and 9 weeks. The LV-D filter shows similarity of about 40% between
any list, implying that every week the list changes by 60%. This indicates that users
may encounter regularly (by stability in LV-C) but may spend different amount of time
encountering over the weeks. Overall, we note that some filters (DE, FE, and LV-C)
stabilize in just 1 week of history, which makes them suitable for recommendations when
trust history is short. The time interval between the trust list regeneration can also be
long (reducing processing requirements). As the stability of LV-D filter is comparatively
low, we may need to redo the trust list weekly.
4.5.2.4 Graph Analysis
We analyzed the effect of trust on the network graph and compared it with the
regular and random graphs while increasing trust (T )(using DE filter, other filters show
similar results). An edge is added between a pair of nodes only when atleast one of
them trusts each other (un-directed graph). We note that clustering coefficient (CC) [11]
of the network increases with T% and the path length (PL) decreases with increase in
T%. For e.g using 9 week U1 trace, CC is 0.171 at T = 10% and becomes 0.201 at
T = 100%. However, in the same scenario Path Length decreases from 3.64 to 2.59.
More than 99% of the nodes were connected even at T = 10%.
91
0 20 40 60 80 1000.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 Week 1 NCC Week 1 NPL Week 2 NCC Week 2 NPL Week 3 NCC Week 3 NPL Week 4 NCC Week 4 NPL Week 5 NCC Week 5 NPL Week 6 NCC Week 6 NPL Week 7 NCC Week 7 NPL Week 8 NCC Week 8 NPL Week 9 NCC Week 9 NPL Week 10 NCC Week 10 NPL
Nor
mal
ized
Clu
ster
ing
Coe
ffici
ent &
Pat
h Le
nght
Trust Percentage
A. UF
0 20 40 60 80 1000.0
0.2
0.4
0.6
0.8
1.0
Nor
mal
ized
Clu
ster
ing
Coe
ffici
ent &
Pat
h Le
ngth
Trust Percentage
Week 1 NCC Week 1 NPL Week 2 NCC Week 2 NPL Week 3 NCC Week 3 NPL Week 4 NCC Week 4 NPL Week 5 NCC Week 5 NPL Week 6 NCC Week 6 NPL Week 7 NCC Week 7 NPL Week 8 NCC Week 8 NPL Week 9 NCC Week 9 NPL Week 10 NCC Week 10 NPL
B. Dartmouth
0 20 40 60 80 1000.0
0.2
0.4
0.6
0.8
1.0
Nor
mal
ized
Clu
ster
ing
Coe
ffici
ent &
Pat
h Le
nght
Trust Percentage
Week 1 NCC Week 1 NPL Week 2 NCC Week 2 NPL Week 3 NCC Week 3 NPL Week 4 NCC Week 4 NPL Week 5 NCC Week 5 NPL Week 6 NCC Week 6 NPL Week 7 NCC Week 7 NPL Week 8 NCC Week 8 NPL Week 9 NCC Week 9 NPL Week 10 NCC Week 10 NPL
C. USC
Figure 4-9. Normalized Clustering Coefficient and Normalized Path Length
A small world analysis is performed as described in [11]. We find that normalized
CC (NCC) is close to CC of regular graph and the normalized PL (NPL) is close to PL
of the random graph (Fig. 4-9 shows NCC and NPL for different lengths of traces and
values of T ). It appears that network created by trust list to be a small world network.
4.5.2.5 Anomaly Detection
Here we analyze the effectiveness of the anomaly detection system we proposed.
Evaluating the anomaly detection system designed in Sec. 4.4 is challenging. Since
iTrust service is still not available, no attack patterns or models exist. Therefore, to
evaluate our anomaly detection system, we have created an attacker model (it is just
one of the possible models for attackers).
To mimic users’ encounter patterns that are periodic where the period is determined
by individual user behavior, we kept attacker’s encounters periodic. This period is
obtained from the victim’s encounter pattern (so attacker can avoid obvious suspicion).
92
The number of encounters needed to get into a victim’s trust list is also known. The
only tunable parameter is the number of days in which attacker wants to achieve the
required encounter score (detailed algorithm is here [1]). To evaluate, we varied the
number of days from 1 to 30 (the trace is from U1 and 30 days long). Forty users were
analyzed (20 users have maximum number of encounters and 20 have average number
of encounters in the 30 days trace).
To validate our model and detection scheme we choose false positives and false
negatives as metrics. The percentage of regular users identified as attacker classify
as false positive whereas percentage of failures to identify attackers classify as false
negatives. As Tab. 4.5.2.5 shows, the percentage of false positive decreases as we
increase the size of set Si ,T and percentage of false negative increases as we increase
the κ factor for standard deviation (FE filter scores where used). We notice that best
detection occurs at κ = 1 and neighborhood size (or |Si ,T |)= 10. These results are
promising, yet warrant further analysis to optimize and create a better attacker model
and detection system (outside the scope of this study). However, the results show that
iTrust can work with anomaly detection and can flag suspected users.
Table 4-3. False positives and negatives while using the proposed anomaly detection (inpercentage)
κ |Si ,T | False +ve False -ve1 5 10.03 8.301 10 8.15 6.271 15 9.73 6.442 5 3.80 20.112 10 2.77 19.972 15 2.16 19.383 5 3.18 48.273 10 1.12 44.243 15 0.98 42.04
4.5.3 Selfishness & Trust Routing in DTN
DTNs as one of the network scenarios where iTrust can work. DTNs are infrastructure
less networks that work on the cooperation of the nodes. Since nodes spend their
93
resources in routing messages, the nodes may only route messages for nodes they
know or when they have some incentives. In these scenarios (where nodes are selfish),
we find that using iTrust improves the network connectivity and routing performance.
To examine the effectiveness of iTrust, we introduce selfishness and use epidemic
routing [79] as a tool to study performance of a routing protocol over the WLAN traces.
The selfishness is defined as the probability (S) that a node will not accept and route
packets for a node it does not trust. Epidemic routing performs a controlled flooding
and has been proved to provide lower boundary in performance in terms of hops and
time needed. Epidemic routing also provides the upper bound on reachability. These
properties make it an appropriate tool for the purpose of our evaluations.
Fig. 4-10 shows the flow chart for iTrust routing inside each node. When a node
receives a message from a trusted sender, it accepts the packet and attempts to route
it. Otherwise, the node accepts the packet based on factors such as user-configured
selfishness. For our purpose, we have considered the acceptance of packets from
untrusted node based on the selfishness probability (S). For the purpose of simulation,
nodes are trusted (as recommended by iTrust) based on the T values .
The performance of epidemic routing is measured using three metrics : Unreachability,
Delay, and Overhead. We define Unreachability as the number of nodes out of all
receivers that could not be reached by a given source. Delay is defined as the ratio
of average time taken by a message to reach all the possible receivers over the max
possible delay. Finally, Overhead is the average number of hops a message took to
reach all the possible receivers using the shortest path. Since overhead and delay were
seen to vary directly with unreachability, we have not shown overhead and delay results
(they are available here [1]).
Fig. 4-11 shows the average unreachability for various combinations of trust and
selfishness using the DE filter (results from other filters show similar trend). Using first
60 days of traces, we create preliminary trust lists after which we run epidemic routing
94
Figure 4-10. Flow chart for iTrust routing
for a period of next 30 days. Trust lists are updated weekly during the run of epidemic
routing (to mimic a mobile device as computing trust list after every encounter or daily
would be resource intensive for the device). Around 800 nodes are randomly selected as
sources for the epidemic routing. During a round, only one node sends a message, and
we measure the unreachability of the message for that node. Each point on the graph
represents the average unreachability for 800 rounds (one for each sender).
Intuitively, selfishness should cripple the connectivity in the network. Fig. 4-11
shows that the network unreachability increases as S increases (and T = 0). To the
benefit of our scheme, we find that as trust is introduced in the network, the effect of
selfishness is reduced. Here we use trust list from DE filter (other filters show similar
trend). For U1, when T = 0% and S = 0.9, unreachability increases by 83% from the
case when S = 0. However, adding Trust T = 40% (S = 0.8) increases unreachability
to only 31% from the case when S = 0. Likewise, for Dartmouth, when T = 0 and
S = 0.9, unreachability increases by 40% from the case when S = 0. However, adding
trust T = 40% (S = 0.9) increases unreachability to only 10% from the case when
S = 0. For USC, T = 0 and S = 0.9 increases unreachability by 1.7% of the case when
S = 0. However, adding trust T = 40% (S = 0.9) brings unreachability to only 0.48%
from the case when S = 0. The effect of trust is higher when selfishness is high, which
makes iTrust more suitable in networks with high selfishness. The effect of trust is not
significant in USC traces, which could be a result of high unreachability in the network
95
even at S = 0 (5 times of U1 or Dartmouth). Also, adding selfishness does not increase
the unreachability significantly for USC.
We now show the comparison between the performance of the filters and a few
possible Hybrid Filters (Sec. 4.3.3). For this purpose, we use the U1 traces (as the
trends from other traces are similar) and vary the weights from 0 to 2 (see Fig. 4-12).
The highest unreachability is produced by using only the BM filter score and the lowest
by using the FE filter. The combination of filters at equal weights has unreachability
close to FE filter and is better than either BM or FE . This analysis gives us two
important results. First, that combination of filter scores can produce better results
(an also avoids user confusion) than using individual filters and second, that by default
configuration iTrust can use equal weights for combining the filter scores.
4.6 Survey and Implementation Based Validation
To validate the approach of iTrust with the ground truth, we have employed surveys
and user feedback from iTrust application.
4.6.1 Survey
To investigate the trust needs of users and the importance they give to trust, we
conducted a survey at a major computer network conference. Even when this is a
biased sample of survey takers, this population has good understanding of computer
networks. We received 32 usable responses. Participants were asked to indicate their
willingness to communicate (using ad hoc or DTNs) under different scenarios on a scale
of 1 to 10.
As Fig. 4-13 shows, willingness of the users to cooperate with unknown user
is low (mean is 2.31). However, willingness increases when users have knowledge
about the encounter history. This reinforces the approach of iTrust of using encounters
to build trust in the network. We also observe that users give more importance to
combined scores (FE and DE score are high) than individual scores (FE is high or DE
is high). This justifies iTrust ’s use of Hybrid Filter for combining trust recommendations.
96
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.2
T=0% T=20% T=40% T=60% T=80% T=100%
Unr
each
abili
ty
Trust
S=0.1S=0.2S=0.3S=0.4S=0.5S=0.6S=0.7S=0.8S=0.9
A. U1
0.12
0.125
0.13
0.135
0.14
0.145
0.15
0.155
0.16
0.165
0.17
T=0% T=20% T=40% T=60% T=80% T=100%
Unr
each
abili
ty
Trust
S=0.1S=0.2S=0.3S=0.4S=0.5S=0.6S=0.7S=0.8S=0.9
B. Dartmouth
0.554
0.555
0.556
0.557
0.558
0.559
0.56
0.561
0.562
0.563
0.564
0.565
T=0% T=20% T=40% T=60% T=80% T=100%
Unr
each
abili
ty
Trust
S=0.1S=0.2S=0.3S=0.4S=0.5S=0.6S=0.7S=0.8S=0.9
C. USC
Figure 4-11. Average unreachability with varying Trust and Selfishness using DE filter
97
0.0 0.2 0.4 0.6 0.8 1.00.105
0.110
0.115
0.120
0.125
0.130
0.135
0.140
0.145 1111 1112 1121 1211 2111 1000 0100 0010 0001
Unrea
chab
ility
Selfishness (S)
Figure 4-12. Hybrid filter results when T=40%. Number on the legend indicated the ratioof score from each filter. For e.g. 1211 implies αDE = 0.2, αFE = 0.4,αLV−D = 0.2, and αBM = 0.2 and 0100 implies αDE = 0, αFE = 1,αLV−D = 0, and αBM = 0 (Sec. 4.3.3)
No Information
High FE, Low DE
High DE, Low FE
High FE AND DE
0
1
2
3
4
5
6
7
8
9
10
Will
ingn
ess
to C
omm
unic
ate
Communication Scenarios
Figure 4-13. Survey Results showing user’s propensity to communicate with other usersin various communication scenarios
Standard deviations in results suggest that although most users want information about
encountered users before cooperating, the individual importance of the filters may vary.
This flexibility is made available in iTrust ’s Hybrid Filter by assigning weights according
to user’s preference.
98
Figure 4-14. Illustration of iTrust’s component and their interactions
4.6.2 iTrust Application
To show the viability of iTrust and to validate our design with user studies, we have
implemented most of the core features of iTrust for mobile platform. Currently, iTrust
is available for Android platform and Linux based Nokia Tablet N810. It provides the
ability to rate encounter users based on FE, DE, LV and Hybrid filters. Encountered
users can be sorted by any filter and weights for the Hybrid filters are user configurable.
If some of the encountered users are currently discoverable, their listing would have a
green circular mark as shown in Fig. 4-15A.. The application provides inbuilt facilities
for scanning Bluetooth devices and Wireless Access Points (for localization as GPS is
energy-wise expensive. User can select GPS, if needed).
On selecting a particular user, encounter details (Fig. 4-15B. are presented and
clicking on the map option one can see encounter locations on map (Fig. 4-15C.).
Encountering devices can be rated for trust by the user on the scale from -2 (no Trust) to
2 (high Trust). This allows users to store their evaluations for encounter devices and can
be also used by other applications on the user’s device.
99
Application block diagram is shown in Fig. 4-14. The arrows in the diagram
represent how the encounter data flows in the application. The basic blocks of iTrust
are Bluetooth and Wi-Fi Scanning. Bluetooth scanning is used to discover and record
Bluetooth devices and Wi-Fi scanning is used to obtain localization information. Traces
from both the scanners are then parsed and giving to the filters. Encounters are then
rated and ranked by filters and based on the weights for the hybrid filter, a combine
score is also generate and saved. User can also choose to update locations which
entails going to third party server such as Google and Skyhook to get location data
based on the Wi-Fi AP data (users can also switch to more power hungry GPS for
localization). This allows users to visualize encounters on a map.
In the application, we have also added an optional discovery service that can
show more information about the encountering user such as name, email, social profile
link and personal web page. This service can allow users to weed out potentially
uninteresting/unsuitable encountering users before initiating contact and key exchanges.
The Fig. 4-15D. shows how a user can register the device and provide information
about him/herself so that other encountering users can find more about this user. When
privacy option is selected, the information is only shared to a user when approved by
this user. For looking up information from this registry, users have to click at the name of
the encountered device on the screen showing encounter details.
4.6.2.1 Application Evaluation:
We asked a group of 30 students (grad and undergrad) from CS major to run iTrust
app for a month. Users already owning android phone, ran iTrust on their phones, rest
were given Nokia N810 devices. Users were asked to mark devices they trust in the
application. Finally, out of the 30 students we received usable traces (at least one month
long) from 22 users. On average, number of trusted user marked by each user is 15 and
number of unique devices encountered per user is 175. We use this data to investigate
if behavioral similarity as captured by the trust filters is correlated to trusted user
100
A. B. C. D.
E. F. G. H.
Figure 4-15. Screenshots of iTrust application. Fig. A shows the main screen whereencounter users are sorted by the filter score. Current encounters markedwith Green circles. Trusted users are shown in Blue color. Fig. B showsdetails for an encountered user. Fig. C shows user encounters on Map.Fig. D shows the registration screen for optional users informationdiscovery service. Fig. E shows screen where display order of encounteredusers can modified. Fig. F shows the screen to select weights for theHybrid filter (in the app it is referred as combined filter). Fig G. Shows thescreen where user can check self statistics regarding encounters. It alsoshows the number of scans saved due the use of energy efficient scanner.Fig H. Shows the menu. Menu allows the user to jump from one screen toanther.
101
A. B. C. D.
Figure 4-16. Continuation of screenshots of iTrust application. Fig. A shows the settingsscreen. Fig. B shows number of encounters the user had with a particularuser over a period of time. This feature allows a user to know more aboutencountering users. Fig. C shows a graphs from the Self-Stat screen of theapplication. Here the graphs show the total number of encounter this userhad with respect to time. Fig. D shows the about page with authorinformation and web link for iTrust.
identification . We note that not all encountered users who may be trusted/non-trusted
may have been marked. Also only the discoverable bluetooth devices are captured,
many trusted users that do not have discoverable bluetooth will not be shown. This issue
will be of lesser concern as the adoption of iTrust increases.
We rated the recommendations of iTrust for each of the 5 filters (including
Hybrid Filter with equal weights) on 4 metrics, 1: number of trusted user in range
top 1 to 10, 11 to 20, etc (also known as Precision metric in Information Retrieval
literature), 2. percentage of total trusted users in Top 1 to 10, 11 to 20, etc, 3. fraction of
encounter users needed (from top) to capture ‘x’% of trusted users for each filter, and 4.
Normalized Discount Cumulative Gain (NDCG) [43].
For metrics 1,2 and 3, we considered users in descending order of the encounter
score by each filter. For metric 1, we then counted the number of trusted users (as
marked by the user) in 1 to 10 top user, 11 to 20 top users, etc. Fig. 4-17A. shows the
102
1 to 10
11 to 20
21 to 30
31 to 40
41 to 50
51 to 60
61 to 70
71 to 80
80 to End
0
10
20
30
40
50
Per
cent
age
of T
rust
ed U
sers
Rank Ranges
FE DE LVC LVD Hybrid
A.
1 to 10
11 to 20
21 to 30
31 to 40
41 to 50
51 to 60
61 to 70
71 to 80
80 to End0
10
20
30
40
50
Trus
ted
Use
rs P
erce
ntag
e
Rank Range
Hybrid LVD LVC DE FE
B.
0 10 20 30 40 50 60 70 80 90 100 1100
5
10
15
20
25
30
35
40
45
Per
cent
age
of E
ncou
nter
ed u
ser (
Des
cend
ing
Filte
r sco
re)
Percentage of Trusted Users Included
FE DE LVC LVD Hybrid
C.
FE DE LVC LVD Hybrid0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
NDCG
Filters
D.
Figure 4-17. iTrust evaluations based on application usage. Fig. A shows thepercentage of trusted users in 1 to 10 Top user, 11 to 20 Top users for eachfilter. Fig.B shows the percentage of total trusted users in Top 1 to 10, 11 to20, etc. Fig C. shows fraction of encounter users needed (from top) tocapture ‘x’% of trusted users for each filter. Fig D. shows the NormalizedDiscount Cumulative Gain score for iTrust recommendations.
graph for this metrics for all the filters. It shows that on average, out of top 10 ranked
users by FE, DE and Hybrid filters, 5 (50%) or more users are marked trusted. We
see that LV filter’s top 10 ranks have 3 to 4 users on average, however if we consider
top 20 users, all filters capture 6-8 trusted users (more than 50% of the total trusted
users). The number of trusted user in rest of the ranges continue to fall except in the last
range as it contains all the users ranked beyond 80. For all the filters, there is a strong
statistically significant correlation between the score and the rank of trusted users (e.g.,
for LVC, r=0.84, p <0.01). This shows that users willingness to trust others in a mobile
network to be statistically correlated with their behavioral similarity as captured by iTrust.
103
Result for metric 2 are similar to metric 1 as show in Fig. 4-17B.. We note that out
of all the users marked trusted, more than 50% of the trusted users are in rank 1 to 10
(except LV filters). And almost 80% of the trusted users are capture in rank 1 to 20.
Metric 3 measures the fraction of encounter users needed (from top) to capture
‘x’% of trusted users for each filter. This metrics shows that 80% of the trusted users
are captured by top 25% of the encountering user as ranked by the filters and their is a
strong statically significant correlation (Fig. 4-17C.).
Metric 4, which is based on DCG measure is used to measure effectiveness
of search engines by giving more score to search results that are more relevant.
Normalized DCG (NDCG) is a ratio of DCG and IDCG (Ideal DCG). The IDCG can be
calculated by finding out the best possible search result (in our case all the trusted users
should be ranked first and then non trusted users should follow). NDCG, therefore tells
us how far the current results are from ideal. We note from Fig. 4-17D. that iTrust is able
to capture close to 70% of the IDCG via FE, DE and Hybrid filters and close to 50% of
IDCG via LV filters. This shows that our recommendations are relevant and close to the
ideal case.
We also note that there are users who have high rank, yet they are not trusted. We
believe, these can be the encountered users, who are very similar to the user and can
provide new interaction opportunities to the user.
Other observations from the deployment include that almost 70% user preferred
using equal weights for the HF filter. The amount of storage used by the application, on
average was 6.2MB, with storage of filter scores taking only 98KB, rest was occupied
by the encounter traces. This shows that storage overhead of iTrust filters is quite small
when compared to the raw traces. The raw traces can be removed from the device after
processing to save space. Also at this rate, 75MB is needed for storing traces for the
whole year.
104
4.6.2.2 Energy Efficiency
Scanning of Bluetooth and WiFi devices consumes maximum power (since the
scanning process is periodic). After receiving the traces (which were scanned at 1 min
interval), we noted that due to spatial locality in the traces, we can skip the scanning
rounds if we find the same devices again in the next round, assuming that the user is in
same location with same devices. Based on this assumption of spatial locality, we have
designed and implemented an energy efficient algorithms for iTrust. More details are in
the Appendix B.
4.6.2.3 Location estimation
For calculation meaningful Location Vectors, at least building level granularity
is needed (granularity needed may also be depended on the trust context). On a
mobile device, location can be estimated by using several techniques. Some standard
techniques are: 1. GPS (does not work well indoors and has “warm up” delays), 2.
Wifi Signals (may not be very accurate and may not work everywhere) 3. Cell Tower ID
(cannot work with devices that dont have phone functionality). GPS may be the most
accurate localization technology but needs the most energy and Wifi/Cell Tower ID
method needs a online database lookup to get coordinates [50, 54]. In our application,
where we immediately don’t need location coordinates (only when encounter locations
are shown on the map) and since we have observed that users have spatial locality,
scanning of Wifi signals and Cell towers works. Once in a while the app can fetch
mapping of Wifi and Cell towers to location coordinates and because of the spatial
locality in user movement pattern, we only need to fetch mapping for locations that have
not been visited previously. This scheme saves energy (not using GPS every time) at
the cost of communication (communication can wait till the phone is fully charged and
connected to high speed network) and accuracy.
When using Wifi/Cell ID for localization, we want to minimize communication costs
(do as few a coordinate/location lookups as possible). For reducing lookups, we cache
105
the wifi AP sets (we get a set of APs every time we scan, number of AP depends on the
location) whose coordinates we know. Upon getting a set of Wifi APs from a new scan,
we go through the cached wifi AP sets whose address we already know. Few challenges
with this scheme include 1. APs scanned at same location may change with time (AP
could have moved to another location or may not get scanned every time (scanning
noise)) 2. Since every localization scheme using Wifi signals employs heuristics for
location estimates (accuracy for Google database is ≥ 150m), different AP sets may
give same coordinates (collision in location space). To solved the first challenge we look
at the intersection of the two sets (one cached and another recently scanned) and if the
intersection is greater than a percentage (say 30%) the two sets are considered to be
the same. For the second challenge we currently do not have a worked out solution.
It is important to create a new location field only when the user visits a new location,
otherwise actual time spend at a location can get fragmented or fused and may result in
incorrect scores. We note that sometimes when we have slightly overlapping APs sets
(say only 10%), address lookup for both of them may return same coordinates. We plan
to address these issues in the future.
4.7 Discussion: Other Trust Inputs
As explained in section 3, iTrust was architected to potentially integrate with other
trust components and sub-systems, including blacklisting, other recommendation
systems, and contextual information.
4.7.1 Blacklist & Whitelist
A blacklist contains a list of devices that have been marked by the user as
untrustworthy (either explicitly, or after agreeing with an anomaly detection flag) and
should not be trusted regardless of their similarity or score. A whitelist contains a
list of devices to be trusted regardless of the similarity scores. The blacklist/whitelist
module was implemented in the iTrust app to allow a user to override the trust adviser
106
filters’ scores. The rate of system overriding was one accuracy metric considered in the
evaluation.
4.7.2 Recommendation & Reputation Systems
Several techniques for recommendations systems have been proposed [33, 72] and
iTrust is designed to integrate/adopt such (or similar) recommendation systems. iTrust
can also bootstrap a recommendation system, since recommendation system scores
start to evolve only after initial direct interaction.
Furthermore, a trusted node, over a period of time, may start showing malicious
network behavior (e.g., dropping packets frequently). Reputation systems attempt to
detect such nodes, and can be integrated with iTrust effectively to allow iTrust users to
detect and remove devices that at one time showed high trust potential but later turned
malicious/selfish. An example reputation system can be [19]. This reputation system
considers second-hand information, where users maintain reputation only for users they
communicate with (for iTrust it can be all the encountered users). One challenge here
would be to keep the communication costs to a minimum and to detect false advice.
This integrated system can also keep track of incorrect recommendations provided and
failed message routing info.
4.7.3 Contextual & Event Information
The context of an encounter; e.g., event and/or location, is sometimes more
important than the encounter statistics per se. Example of such scenario may be a
conference which only allows registered users to enter the venue or a secure building
that requires special permits to enter. In these cases, a user may be willing to trust users
regardless of the encounter frequency or duration.
For these scenarios, iTrust provides this module that can change trust recommendation
based on the context and the location of the user. Here, context sensing systems [8, 71]
or user input can be used to infer the context.
107
4.7.4 Combined Trust Recommendation
iTrust needs to provide easily understandable information to the user. Providing
scores from independent modules separately may confuse the user. As a first step to
simplify the output, we created a Hybrid Filter, combining the trust filter scores. A similar
idea can be used to combine the scores from all the modules discussed above and
generate a single score of trust for an encountered user. The scores can be combined
using the following:
T (Uj) = κ(δH(Uj) + (1− δ)(m∑i=1
βiRi(Uj)))
+(1− κ)Context(Uj),
(4–9)
where T (Uj) represents the combined trust recommendation for the encountered user
Uj , it is always between 0 (no trust) and 1 (max trust). H(Uj) is the score from Hybrid
Filter (Sec. 4.3.3)). βi represent the weights for other normalized trust related inputs (Ri )
such as anomaly detection, recommendation system, reputation systems among others.
Here∑mi=1 βi = 1. The factor δ decides the combination ratio of Hybrid Filter and other
trust related inputs. δ varies between 0 and 1 so the combined score is also between 0
and 1. Context(Uj) is the function that gives context score to trust Uj . The output varies
from 0 to 1. The contribution of context in the trust is controlled by κ parameter. If the
user (Uj ) is included in whitelist then iTrust does not have to evaluate this user as it is
already trusted. However, if a user exists in blacklist he can not be trusted (trust scores
are disregarded).
The challenge lies in finding out the correct weights (β, δ,κ) to combine different
inputs. These weights depend on the user preferences and applications. From a survey
we conducted, it is clear that there is no single weight scheme that is acceptable to all
users (more details in Sec. 4.6.1). One possible way to overcome this challenge is to
have standard weights when starting the system and it adapts (sets weight) according to
the selections/feedback given by the user.
108
4.8 Conclusion and Future Work
This work introduces, iTrust, an effective encounter based framework for trust
establishment in mobile communities in an efficient, privacy-preserving and resilient
manner. iTrust is driven by trust adviser filters that take advantage of the increased
sensing capabilities of the mobile devices and their close association with users, which
enables them to capture behavioral similarity with encountered devices and assess
levels of trust.
We use four novel encounter based trust adviser filters, based on encounter
frequency, duration, location behavior-vector and behavior-matrix to generate trust
recommendations. iTrust provides scores reflecting the level of trust to aid the user to
choose trustworthy nodes in coordination with personal preferences, location priorities,
contextual information and/or encounter based keys. The calculations are done in fully
distributed fashion, which eliminates the need for any server or trusted third party
Results of three phases of evaluation reveal that several filters possess high stability
and that trust forms a small world among trusting users. Further, resilience to attack
using anomaly detection achieves less than 10% false positives and 7% false negatives.
Selfishness analysis using trust based epidemic routing shows that it is possible to
efficiently use meaningful, stable trust routing without sacrificing network performance
in DTNs. Ultimately, a series of surveys and participatory experiments consolidate our
belief that users willingness to trust other devices is highly correlated with behavioral
similarity. Feedback from iTrust application shows that users favor the hybrid filter, the
recommendation of which conforms with 80% of users’ selections.
iTrust has been designed to inspire several potential applications that can be
enabled in future. However, there are a few avenues that require further research. In
future, we plan to address some of these questions such as handling multiple devices
belonging to a user. In addition, addressing issues emerging from MAC address
spoofing are part of future research (several crypto-based and non crypto-based
109
techniques exist [83]). Future work will include analysis of other filters for measuring
behavioral similarities. We also want to develop and deploy iTrust for popular mobile
platforms and study the effect of its usage on a larger scale.
With the release of the iTrust application, we can measure the encounters to
generate the trust scores. In the future we would like to investigate how exchange of
indirect trust recommendations affect the trust scores (transitivity of trust). The trust
framework presented in this study sits below the application layer in the mobile platform
and can provide trust scores to any requesting application on the mobile device. In
the future we plan to build applications that can benefit from trust scores. An example
of such an application can be crowd-sourcing. In crowd-sourcing applications users
report observations (it may be regarding gas prices in their neighborhood [9], restaurant
reviews, freeway traffic [10]). With the knowledge of who uploaded the data (is this
person in my trust list), the phone can automatically highlight information coming from
trusted sources, which may be more believable.
The knowledge of encounters and establishment of trust using them, can be used
to provide emergency services, an example application SOS [78], utilizes iTrust scores
to alert trustworthy user in the neighborhood in case of emergency situations. With the
inclusion of anomaly detection, iTrust can also generate lists of possible threats in the so
surroundings (such as presence of a stalker). In the iTrust application, a user can rate
a user on a range of levels from not-trust at all to fully trusted. These levels can also be
utilized by applications such as SOS to automatically access the threat level.
Since iTrust generates trust scores via encounter information, it can also be used to
identify users with similar interests. This information can be used to automatically form
support or meetup groups. We would like to investigate how successful an encounter
based scheme can be in discovering users with similar interests. We would also like
to examine how having a encounter measuring system in places such as hospital can
be used to evaluate patient care (number of doctor visits) and can also be used to
110
forensically examine and quarantine the spread of pathogens in a hospital by looking at
the encounter history.
iTrust can also applied to communication scenarios where existing infrastructure
cannot be trust (an extreme example can be a scenario where a section of population is
revolting again a regime and regime is monitoring all the communication). In these
cases, if the revolting section of the population has been using iTrust and have
established pair-wise security keys, they can communication over any medium (including
Ahdoc and DTN) by encrypting the messages. Here the role of iTrust is to identify
the users with whom this user might want to communicate later and thus enable key
exchanges with only relevant users. For this scenario and others, we would like to
investigate the correlation between trust level and frequency of communication between
users.
There is a need to conduct more research in order to understand how trust can be
established in mobile societies. We hope that this research contributes to that effort.
111
CHAPTER 5CONCLUSION AND FUTURE WORK
In this work, we propose several techniques to infer complex social relationships
and patterns using network data. We propose novel methods, which use WLAN traces
to classify WLAN users in to social groups based on features such as gender and
study-major among others. The work presents a general framework that can be applied
to traces coming from multiple sources. As an example, traces from two university
campuses have been used and gender based grouping classification is performed.
Multiple techniques for grouping users are discussed since each one has slight
advantages in certain scenarios. The study cross-validates the results by comparing
results provided by each of the classification methods.
We uncovered a serious problem in the way WLAN traces are anonymized. We
believe that this kind of attack is possible as WLAN traces have human mobility pattern
embedded in them, which can be easily observed by an attacker following the victim.
The aim of any privacy protecting technique should be to ensure that even if attacker
has access to all the publicly available information about a user or a group of users
(but not the mapping between anonymized MAC and real MAC), he should not be
able to reduce the sample size below a number, say K. This K should be a parameter
configurable by the trace releasing authority.
This work proposes, iTrust, an effective encounter based framework for trust
establishment in mobile communities in an efficient, privacy-preserving and resilient
manner. iTrust is driven by trust adviser filters that take advantage of the increased
sensing capabilities of the mobile devices and their close association with users,
which enables them to capture behavioral similarity with encountered devices and
assess levels of trust. We use four novel encounter based trust adviser filters, based on
encounter frequency, duration, location behavior-vector and behavior-matrix to generate
trust recommendations. iTrust provides scores reflecting the level of trust to aid the
112
user to choose trustworthy nodes in coordination with personal preferences, location
priorities, contextual information and/or encounter based keys. The calculations are
done in fully distributed fashion, which eliminates the need for any server or trusted
third party. Results of three phases of evaluation reveal that several filters possess high
stability and that trust forms a small world among trusting users. Further, resilience to
attack using anomaly detection achieves less than 10% false positives and 7% false
negatives. Selfishness analysis using trust based epidemic routing shows that it is
possible to efficiently use meaningful, stable trust routing without sacrificing network
performance in DTNs. Ultimately, a series of surveys and participatory experiments
consolidate our belief that users willingness to trust other devices is highly correlated
with behavioral similarity. Feedback from iTrust application shows that users favor the
hybrid filter, the recommendation of which conforms with 80% of users’ selections.
In the future, we want to look into user behavior study from the perspective of
buildings and locations. This will allow us to find out the trends in user behavior based
on the study-major and building preferences.
The ability to classify users into social groups can allow us to create models for
different groups of users based on usage characteristics. These models can not only
be used to understand different users characteristics but can also be used to filter users
that our proposed schemes could not. We also want to test if homophily, based on of
encounters exists among different social groups. Another area of research that we would
like to target is to look at the packet or the netflow traces to understand effects of social
group affiliations on browsing characteristics.
For the privacy and anonymity work, we would want to work on designing
anonymization schemes that are application specific. For example, the traces are
anonymized such that routing protocols can be tested on it without any privacy leak.
This may allows us to maintain privacy and yet utilize traces for research purposes. One
of the directions for Mobile Ad-hoc routing protocol testing would be to anonymized the
113
traces (such that it is completely privacy preserving) without affecting the encounter
probabilities between the pair of users.
114
APPENDIX ACODE SNIPPETS FROM ITRUST APPLICATION
In this appendix we will present some sections of iTrust code along with the block
diagram showing the evolution of iTrust application with each release. The first version
of the application was under internal release to the members of our research group.
Based on the feedback we received, more features were added and then it was released
to a group of 30 students. This lead to a thorough testing of the applications. The users
complained about unavailability of device to name mapping and energy efficiency the
most. These features were also added in the version 3 of the iTrust application. The
approximate evolution of the development including the features added is shown in the
Fig. A-1. The text over the arrows connecting one block to another depict the request of
features/functionality by the users.
In the following sections, we present some of the code snippets which have been
developed for the iTrust application. We hope that these snippets will provide sufficient
implementation details about the iTrust app.
1. Scanner
2. FE, DE, LV
filters
3. Mark trusted
users
1. Encounters on map
2. Facility to upload
Traces on server
3. Automatic error
reporting
4. Combined Filter
5. Show currently
encountering devices
1. Energy efficient scanning
2. Mac address to name
lookup
3. Graphs to show
encounter statistics
4. Users can be trusted on
a scale
Version 1 Version 2 Version 3
more encounter
info
Auto collection of
traces
Increase battery
life
Name lookup
Figure A-1. Evolution of features in the iTrust app based on feedback from user.
A.1 Energy Efficient Scanning
iTrust application has three algorithms of scanning bluetooth and wifi scanning.
Simplest one of them is an infinite loop with 100 sec sleep between the consecutive
execution and in each cycle it basically scans both Wi-Fi and Bluetooth devices. Two
of the algorithms perform energy efficient scanning. The code in List. A.1 illustrates the
115
algorithm used to decide the scanning interval. The input parameter state is set to zero if
any new device is found, otherwise
public s t a t i c i n t [ ] f i b o = {0 ,1 ,2 ,3 ,5 ,8 ,13 ,21 ,34 ,55 ,89} ;i n t ca lSk ipFac to r ( i n t s ta te ) {
i f ( s t a te == 0) {f a c t o r = 1 ;
}else i f ( s t a te == 1) {/ / MaxThres i n d i c a t e s the maximum value al lowed i n
FIBO ser i esi f ( f i b o [ f ac to r −1] < MaxThres ) {
f a c t o r ++;}
}/ / System . out . p r i n t l n ( f a c t o r ) ;return f i b o [ ( f a c t o r −1) ] ;
}Listing A.1. Function for calculating how many scanning periods to skips. Input
parameter state is 0 if a new device is found and it is 1 otherwise. Moreabout FIBO algorithm is in Appendix B
A.2 Calculating LV ( Sec. 4.3.2)
public i n t calLvScore ( TreeMap<In teger , EncLocation> userMap ,f l o a t sumCU2, f l o a t sumDU2) { / / score i s ca l wr t userMap
EncLocation l 1 = null , l 2 =nul l ;f l o a t sumCU1=0 , sumDU1=0 ,prodC=0 , prodD =0;C o l l e c t i o n c = locMap . values ( ) ;I t e r a t o r i t r = c . i t e r a t o r ( ) ;while ( i t r . hasNext ( ) ) {
l 1 = ( EncLocation ) i t r . next ( ) ;/ / Log . i (TAG, ” calLvScore f o r user : ” + t h i s .Name + ”
Locat ion i d ”+ u1 . l o c I d + ” du ra t i on and count ” + u1 .du ra t i on + u1 . count ) ;
i f ( ( l 2 = userMap . get ( l 1 . getLocId ( ) ) ) ==nul l ) {Log . e (TAG, ” EncUser Check the userMap . . i t i s missing
values present i n locMap . . imposs ib le ” ) ;return −1;
}sumCU1 += ( f l o a t ) l 1 . getCount ( ) ∗ ( f l o a t ) l 1 . getCount ( ) ;sumDU1 += ( f l o a t ) l 1 . ge tDura t ion ( ) ∗ ( f l o a t ) l 1 . ge tDura t ion ( ) ;prodC += ( f l o a t ) l 1 . getCount ( ) ∗ ( f l o a t ) l 2 . getCount ( ) ;prodD += ( f l o a t ) l 1 . ge tDura t ion ( ) ∗ ( f l o a t ) l 2 . ge tDura t ion ( ) ;
}
116
score [ 2 ] = ( f l o a t ) ( prodC / ( Math . s q r t (sumCU1∗sumCU2) ) ) ;score [ 3 ] = ( f l o a t ) ( prodD / ( Math . s q r t (sumDU1∗sumDU2) ) ) ;return 0;
}Listing A.2. Function that calculates the LV values for a user. ‘userMap’ contains location
visited data for the owner of the device and ‘locMap’ contains the encounterinformation along with the location for a particular encountering device
117
APPENDIX BENERGY EFFICIENT DEVICE DISCOVERY
Efficient use of energy is essential for always running mobile applications such
as iTrust. We have looked into some aspects of it and developed an energy efficient
scanner for iTrust as discussed earlier. We use this space to provide more details about
our technique.
B.1 Available Directions
Below are some of the directions that can be utilized to design an energy efficient
device discovery for both Bluetooth and WiFi. In each of the methods the core idea is to
avoid/reduce scanning when no new devices are discovered. The challenge, however, is
not to miss any new devices.
1. Use current scan response to determine next scanning time
2. Use temporal locality: Use weekly pattern to predict number of encounters perweek on per hr basis... system will have to maintain a time table for 7 day x 24hours
3. Use spatial locality: Use location information to predict encounters. New locationmay need aggressive scans.
Since scanning process is very similar in Bluetooth and Wifi, any technique
developed for Bluetooth can be used for Wifi and vice-versa. Show that scanning
characteristics of Bluetooth are similar to Wifi i.e. same techniques applied to Bluetooth
will also work with Wifi. show effect of skipping in Bluetooth has equivalent affect on
WiFi. Details about scanning are explained here [31].
B.2 Evaluations Techniques
Through the deployment of iTrust, we have collected atleast one month of traces
from 20 users and some of the users have used iTrust for more than one year. These
traces included both Bluetooth and WiFi scans done at 100 seconds interval. For the
evaluation and comparison of energy efficient methods, we proposed to use these
traces as the ground truth. The energy efficient algorithm can take these traces as
118
input and produce an output traces based on the algorithm. By comparing the input
and output and the number of scans we can compare the effectiveness of the energy
efficient algorithms.
B.3 Current Progress
Currently, we have only looked into algorithms that use current scan response to
determine next scanning time period. These methods generally work by looking that
the number of devices (new and already seen) found in the current scan to determine
the sleep interval before the next scan. Researchers have developed several algorithms
including STAR [80] and others [31]. Only STAR is evaluated using real-traces, rest
use some kind of artificial traces. Hence to test our proposed algorithms, we have
considered only STAR Algorithm. The two of our proposed algorithms are : one based
on multiplicative increase and multiplicative decrease (MIMD) (similar to [31]) and
another based on growth rate of Fibonacci Series.
Star Algorithm: Uses a method to estimate arrival rate based on the number of
new devices detected in the current scan round and also increase the scan rate if the
current time is greater than 8 am.
MIMD Algorithm (EE): doubles current scan time interval if no new device is found
(we have an upper bound on the time interval). On detecting a new device, the scan
time interval is reduced to the minimum possible period.
Fibonacci Series based Algorithm (FIBO): uses the Fibonacci series to decide
the number of scan cycles to skip (otherwise similar to EE). The growth is 0, 1, 1, 2, 3, 5,
8, 13, 21 and so on.
We have compared the above three algorithms for efficiency (saving of scans) and
accuracy (not missing any encounter). We measure accuracy by counting occurrence
of each device in the trace produced by each method. To make efficiency metric
independent of accuracy, we assume that during the time interval when no scanning is
119
performed, each of the algorithms assume that last encountered devices are being seen
(therefore they skipped scanning).
The Tab. B.3 shows the accuracy results from running different energy saving
algorithms. We note that our scheme EE4 out performs both STAR and FIBO and also
shows lower Standard Deviation. We can find the comparison of efficiency in Tab. B.3.
EE16 seems to be giving the best savings, followed by EE8 and FIBO16 and then STAR.
However, since we want an algorithm that is both accurate and efficient at the same time
. We have devised a new metric called ‘s/e’ ratio. This a ratio between the efficiency
and accuracy-loss. If an energy saving scheme provides more saving and less error the
‘s/e’ ratio would be higher than the one providing similar saving but worse error rates. To
choose an algorithm, one may first decide on the savings needed (based on the current
energy budget) and then choose an algorithm that performs the best based on ‘s/e’. The
Tab. B.3
Table B-1. Accuracy Loss using traces for 20 users, EE4 means 4 times the minimumscan period is the upper bound of scan interval, similarly in EE8, the upperbound on skip period is 8. This result used Bluetooth traces only. Lesservalues is better
Average Std. Dev.STAR 9.97 7.49EE4 7.45 4.38EE8 10.45 5.84EE16 13.65 6.81FIBO4 8.24 3.90FIBO8 8.58 3.95
FIBO12 10.93 5.42FIBO16 12.26 6.04
B.3.1 Combining WiFi And Bluetooth Scanning
We now present the results of combining Wi-Fi and Bluetooth scanning with the
energy efficient scanner. The scan time interval now depends on the results of both
Wi-Fi and Bluetooth scans. Wi-Fi scanning has following properties different that
Bluetooth scans, i. Majority of Access Points are stationary, ii. it is possible to miss out
120
Table B-2. Scan Efficiency using traces for 20 users, EE4 means 4 time the minimumscan period is the upper bound of scan interval, similarly in EE8 & EE16 its 8& 16 times respectively. This result used Bluetooth traces only. Higher valueis better
Algo. Average Std. Dev.STAR 64.64 8.22EE4 57.81 9.56EE8 66.45 11.56EE16 70.81 13.12FIBO4 60.28 11.68FIBO8 62.79 12.86
FIBO12 64.87 12.80FIBO16 66.11 14.40
Table B-3. s/e ratio for Star, MIMD and FIBO algorithmsAlgo. s/eSTAR 6.49EE4 7.76EE8 6.36EE16 5.19FIBO4 7.31FIBO8 7.32
FIBO12 5.93FIBO16 5.39
on an AP, even though it has strong signals strength at the location, and iii. Range of
a WiFi AP is much larger than a Bluetooth device. Using these properties of WiFi, we
designed the matching up of scanned AP less stringent, i.e. if number of common AP in
the two sets of scans is more than number of distinct AP found and number of common
is greater than 0, we consider it to be the same location (same set of AP seen). This is
slightly different than Bluetooth scanning where exact same number of users are needed
to consider the two scans to give same results. Also, due to the application requirements
of iTrust, we cannot let Wi-Fi and Bluetooth work independent of each other.
B.4 Conclusion
We note that STAR, EE4 and FIBO4 algorithms perform closely, but EE4 algorithm
is a clear winner in terms of the ‘s/e’ ratio, next being FIBO algo. We note that EE and
FIBO algorithms are parametric. In an event, where accuracy can be sacrificed to save
121
Table B-4. Combining Wi-Fi and Bluetooth scanningAlgo. Error Saving s/eSTAR 11.47 65.84 5.74EE4 7.45 54.42 7.30EE8 10.94 63.03 5.76
EE16 14.66 67.54 4.60FIBO4 8.23 56.75 6.89FIBO8 9.09 59.27 6.52FIBO16 11.76 62.20 5.29
energy, higher threshold for scan time interval can be selected that is not possible in
STAR algorithm, thus they provide a efficiency grade selection mechanism. Current
implementation of iTrust uses EE4 and FIBO4 algorithm for performing energy efficient
scanning.
122
APPENDIX CUSER BEHAVIOR ANALYSIS
Below are results from all the areas we could identify in the Universities.
C.0.1 Spatial Distribution
The details are in Tab. C-2 and Tab. C-1.
C.0.2 Temporal Distribution
Details are in Tab. C-4 and Tab. C-3
123
Table C-1. Spatial Distribution of Users at U2Area Male-Oct07 Female-Oct07 Male-Nov07 Female-Nov07 Male-Mar08 Female-Mar08 Male-Apr08 Female-Apr08
Administration 1152 1140 1096 1136 1254 1327 1507 1656Agriculture and biology 197 113 180 127 339 264 471 343
Architecture 543 587 464 569 589 651 708 784Biology 330 331 339 337 411 435 524 515
Bookstore 172 125 145 128 247 176 333 264Economics 846 591 907 666 965 704 1118 867
Cafeteria food 278 223 263 205 301 248 332 287Computer Engineering 975 763 930 789 1080 841 1300 1097
Fine Arts 488 543 410 505 524 610 642 787Fraternity 254 84 246 113 268 123 326 184
Health sport human 556 460 495 450 562 598 679 806Infirmary 151 124 161 152 203 202 159 148
Communication 406 418 411 475 445 538 545 659Law 566 511 558 523 522 495 696 656
Music 122 105 111 70 230 197 330 308Philosophy and Stati 94 109 119 124 121 92 152 163
Psychology 78 71 80 83 87 77 116 106Recreation food cafeteria 192 273 137 254 154 302 111 263
Social Science 818 815 818 880 858 833 1043 1042Sorority 271 969 299 959 331 991 529 1138
Space science and CNS 321 229 282 258 377 321 224 236Sport recreation 119 121 85 103 131 136 148 131
Theater 121 139 121 146 131 143 155 211University Auditorium 43 41 45 37 48 48 61 58
Engineering 1900 895 1784 888 2033 1139 2437 1371Library 3767 3749 3415 3667 3556 3903 4497 4968
Table C-2. Spatial Distribution of Users at U1Area Male-Feb2006 Female-Feb2006 Male-Oct2006 Female-Oct2006 Male-Feb2007 Female-Feb2007
Accounts 11 5 21 22 15 14Admin 7 9 13 16 7 10
chemistry 9 8 9 7Communication 96 81 115 109 19 48
Economics 37 26 69 58 56 36Engineering 26 35 37 37 44 31
Law 3 1 5 2 0 3Medicines 6 3 6 8 7 15
Music 9 11 7 12 4 10Residence 42 48 53 47 52 49
Social 88 113 143 161 110 128Sports 16 19 12 21 4 11
Table C-3. Average Duration of Users at U2Area Male-Oct07 Female-Oct07 Male-Nov07 Female-Nov07 Male-Mar08 Female-Mar08 Male-Apr08 Female-Apr08
Administration 2830.54 2674.35 2708.18 2515.91 3005.99 2735.44 2535.49 2756.56Agriculture and biology 5496.84 2835.95 4605.61 2804.1 6646.08 5334.13 4045.3 3166.2
Architecture 3102.69 4472.13 3819.61 5723.87 3990.28 4247.61 3774.16 4221.17Biology 2855.78 3770.86 3259.26 3801.92 2643.61 2385.45 2397.15 2471.11
Bookstore 1425.17 1717.32 1720.15 737.4 1568.72 1398.41 1238.44 1485.88Communication 3062.22 2974.99 3240.94 3067.82 2652.34 2693 2830.52 2758.33Cafeteria food 1322.97 1755.13 1779.43 1332.48 1617.81 1283.37 1655.4 1546.05
Computer Engineering 2226.74 2017.67 2387.85 2070.76 2266.88 1735.37 2613.42 2038.3Fine Arts 3723.13 3234.67 4788.84 3702.77 3945.24 3509.83 4439.7 3519.1Fraternity 6102.32 3132.25 5627.62 2724.89 6250.4 2825.14 6041.43 2275.41
Health sport human 2021.73 2345.55 1719.18 2161.11 2063.47 1895.39 2083.5 2004.84Infirmary 851.93 1702.52 885.8 1224.36 978.22 1114.76 1392.41 1140.61
Journalism 1895.75 2125.34 2288.49 2179.88 1976.58 1801.81 2143.43 1880.18Law 3191.82 3212.97 3430 3614.9 3849.59 3760.19 4555.09 4695.18
Music 1911.7 1711.29 2565.34 1851.83 1767.29 1167.49 1764.87 1210.22Philosophy and Stati 4464.41 2168.24 2484.02 2475.97 2923.91 1469 3576.86 2241.14
Psychology 4317.27 5591.4 3740.35 4841.61 4541.85 3262.46 3415.07 4058.18Recreation food cafeteria 3346.32 3949.6 3977.89 4763.73 2754.9 2955.62 2528.34 3130.86
Social Science 1513.08 1809.37 1582.34 1858.61 1728.01 1643.11 1563.03 1736.9Sorority 3681.18 5881.25 4396.69 5658.94 2035.76 5035.05 2131.98 5171.22
Space science and CNS 2200.49 1492.75 2082.87 1681.06 1819.9 1423.21 3427.35 1895.1Sport recreation 1489.49 1683.24 2230.28 1600.73 1064.57 1763.8 941.31 1141.93
Theater 1548.75 1810.34 1791.42 1658.96 2434.57 2035.92 2377.37 2109.12University Auditorium 3088.45 3131.85 2902.46 4571.47 1362.95 1902.15 1497.05 1852.46
Engineering 2696.45 2361.65 2693.97 2433.75 2664.03 2167.38 2825.3 2486.6Library 3953.34 4156.35 4168.5 4531.48 3875.23 4067.77 4388.33 4618.98
124
Table C-4. Average Duration of Users at U1Area Male-Feb2006 Female-Feb2006 Male-Oct2006 Female-Oct2006 Male-Feb2007 Female-Feb2007
Accounts 1108 636 956.65 1114.98 484 1206 AccountsAdmin 835 1612 346.89 1162.18 536 432 Admin
Chemistry 1806 1411 842.24 896 900 720 ChemistryCommunication 1862 2007 1474.38 1417.27 1838 2758 Communication
Economics 2044 1587 1826.88 2204.25 1729 2745 EconomicsEngineering 2797 1834 2341.09 2380.02 2181 782 Engineering
Law 1545 2096 4776.09 1468.76 91 3528 LawMedicine 2860 963 1562.8 1940.78 1723 2450 Medicine
Music 2354 1395 1090.04 686.81 493 534 MusicResidence 2341 1510 1491.92 1185.63 1861 1401 Residence
Social 2341 2787 2162.14 2336.53 2008 2243 SocialSports 1652 2191 636.22 895.02 650 594 Sports
125
APPENDIX DSURVEY FORM - ITRUST VALIDATION
SURVEY: Encounter-based TrustUdayan Kumar and Ahmed Helmy
{ukumar, helmy}@ cise.ufl.edu, University of Florida, Gainesville.[Assume your device has enough battery and computation power. Also, your device runs a Bluetooth scanner program that records number, duration and location of
encounters with other devices. An encounter occurs when two devices appear in the radio range of each other.]
Please rate (on scale of 1 through 10) your willingness to cooperate with other peer
devices to setup an Ad Hoc or Delay Tolerant Network (DTN).
1. If your device does not have any information about other devices (Strangers)?
2. If your device identifies another device as frequently-encountered (e.g., more than10 times in the last week)?
3. If your device identifies the other device as encountered for long duration (e.g., formore than 5 hrs total in the last week) but infrequently (e.g., less than 4 times inthe last week)
4. If your device identifies the other device as encountered with both high frequencyand long durations.
5. If the encounter locations are visited frequently by your device.
6. If the encounter locations have restricted access (e.g., mobicom or NSF).
7. Rate each of the factors that would most affect your willingness to accept amessage:
a. Frequency of encounters
b. Duration of encounters
c. Location visited
8. What do you think is the most important combination of the above factors to haveyou trust others to cooperate in an Ad Hoc or DTN setting? (e.g., do you need allfactors (freq, duration, locations) or stats in the restricted locations are enough?)
Other Comments:
Your participation in this study is completely voluntary. There are no anticipated risks, compensation or other direct benefits to you as a participant in this study. You are
free to withdraw your consent to participate and may discontinue your participation in the study at any time without consequence.
126
REFERENCES
[1] Supplementary Information. Available from: https://sites.google.com/site/confanon/.
[2] Popular baby names, September 2007. Available from: http://www.ssa.gov/OACT/babynames/.
[3] UNC/FORTH: Repository of traces and models for wireless networks, SyslogDataset #2, August 2007. Available from: http://netserver.ics.forth.gr/datatraces/.
[4] The Passive Measurement and Analysis Project, June 2008. Available from:http://pma.nlanr.net/.
[5] Predict: Protected Repository for the defense of the Infrastructure Against CyberAttacks, June 2008. Available from: http://www.predict.org.
[6] The Skitter Project, June 2008. Available from: http://www.caida.org/tools/measurement/skitter/.
[7] CRAWDAD, August 2008. Available from: http://crawdad.cs.dartmouth.edu/data.php.
[8] The metrosense project, Feb 2011. Available from: http://metrosense.cs.dartmouth.edu/projects.html.
[9] GasBuddy, 2012. Available from: http://gasbuddy.com/.
[10] Participatory Sensing, 2012. Available from: http://participatorysensing.org/.
[11] R. Albert and A. L. Barabsi. Statistical mechanics of complex networks. Rev. Mod.Phys., Vol. 74, pp. 47-97, 2002.
[12] Mark Alllman and Vern Paxson. Issues and etiquette concerning use of sharedmeasurement data. In IMC ’07: Proceedings of the 7th ACM SIGCOMM conferenceon Internet measurement, pages 135–140, New York, NY, USA, 2007. ACM.
[13] Eitan Altman. Competition and cooperation between nodes in delay tolerantnetworks with two hop routing. In NET-COOP, 2009.
[14] Eitan Altman, Arzad A. Kherani, Pietro Michiardi, Refik Molva, Pietro Michiardi , andRefik Molva . Non-cooperative forwarding in ad-hoc networks. Technical report,PIMRC, 2004.
[15] Fan Bai and Ahmed Helmy. Chapter 1 A SURVEY OF MOBILITY MODELS inWireless Adhoc Networks. Springer, 2006.
127
[16] Fan Bai, Narayanan Sadagopan, and Ahmed Helmy. The IMPORTANT frameworkfor analyzing the impact of mobility on performance of routing protocols for adhocnetworks. AdHoc Networks Journal, 1:383–403, 2003.
[17] Magdalena Balazinska and Paul Castro. Characterizing mobility and network usagein a corporate wireless local-area network. In ACM MobiSys, 2003.
[18] Christopher M. Bishop. Pattern Recognition and Machine Learning (InformationScience and Statistics). Springer, August 2006.
[19] Sonja Buchegger and Jean-Yves Le Boudec. A robust reputation system for mobilead-hoc networks. In P2PEcon, 2003.
[20] Sonja Buchegger and Jean-Yves Le Boudec. Self-Policing Mobile Ad-HocNetworks by Reputation. IEEE Comm. Mag., 43(7):101, 2005.
[21] Ronald S. Burt. Decay functions. Social Networks, 22(1):1 – 28, 2000.
[22] Levente Buttyan and et al. Barter-based cooperation in delay-tolerant personalwireless networks. In WoWMoM, 2007.
[23] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: Asurvey. ACM Comput. Surv., 2009.
[24] Chia-Hsin Owen Chen and et al. Gangs: gather, authenticate ’n group securely. InMobiCom ’08, 2008.
[25] G. Chen, H. Huang, and M. Kim. Mining frequent and periodic association patterns.Computer Science TR2005-550, Dartmouth College, 2005.
[26] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.Introduction to Algorithms (second edition ed.), pages 350–355. MIT Press andMcGraw-Hill, 2001.
[27] Scott E. Coull, Charles V. Wright, Fabian Monrose, Michael P. Collins, andMichael K. Reiter. Playing devils advocate: Inferring sensitive information fromanonymized network traces. In Proc. of the 14th Annual Network and DistributedSystem Security Symposium, pages 35–47, 2007.
[28] Mark E. Crovella and Azer Bestavros. Self-similarity in world wide web traffic:Evidence and possible causes. IEEE/ACM Transactions on Networking, 5:835–846,1997.
[29] Jon Crowcroft, Richard Gibbens, Frank Kelly, and Sven Ostring. Modellingincentives for collaboration in mobile ad hoc networks. Performance Evaluation, 57,2004.
[30] Ruby Roy Dholakia and et al. The Internet Encyclopedia, chapter Gender andInternet Usage. Wiley, 2003.
128
[31] Catalin Drula, Cristiana Amza, Franck Rousseau, and Andrzej Duda. Adaptiveenergy conserving algorithms for neighbor discovery in opportunistic bluetoothnetworks. IEEE J.Sel. A. Commun., 25(1), January 2007.
[32] A.C. Gallagher and T.H. Chen. Estimating age, gender, and identity using firstname priors. In CVPR, 2008.
[33] Elizabeth Gray, Jean-Marc Seigneur, Yong Chen, and Christian Jensen. Trustpropagation in small worlds. In Trust management, 2003.
[34] Ben Greenstein, Ramakrishna Gummadi, Jeffrey Pang, Mike Y. Chen, TadayoshiKohno, Srinivasan Seshan, and David Wetherall. Can ferris bueller still have his dayoff? protecting privacy in the wireless era. In HOTOS’07: Proceedings of the 11thUSENIX workshop on Hot topics in operating systems, pages 1–6, Berkeley, CA,USA, 2007. USENIX Association.
[35] T. Henderson, D. Kotz, and I. Abyzov. The changing usage of a maturecampus-wide wireless network. In ACM MobiCom ’04, September 2004.
[36] W. Hsu and A. Helmy. On modeling user associations in wireless lan traces onuniversity campuses. In WiNMee, 2006.
[37] W. Hsu, T. Spyropoulos, K. Psounis, and A. Helmy. Modeling time-variant usermobility in wireless mobile networks. In Proc. IEEE INFOCOM, May 2007.
[38] Weijen Hsu, Debojyoti Dutta, and Ahmed Helmy. Mining behavioral groups in largewireless lans. In MobiCom, 2007.
[39] Weijen Hsu, Debojyoti Dutta, and Ahmed Helmy. Profile-Cast: Behavior-awaremobile networking. In IEEE WCNC, 2008.
[40] Weijen Hsu and Ahmed Helmy. On nodal encounter patterns in wireless lan traces.In WiNMee, 2006.
[41] Weijen Hsu and Ahmed Helmy. MobiLib, June 2008. Available from: http://nile.cise.ufl.edu/MobiLib/.
[42] Peter Hwang and Willem P. Burgers. Properties of trust: An analytical view.Organizational Behavior and Human Decision Processes, 69(1):67–73, January1997.
[43] Kalervo Jarvelin and Jaana Kekalainen. Cumulated gain-based evaluation of irtechniques. ACM Trans. Inf. Syst., October 2002.
[44] T. Karagiannis, A. Broido, N. Brownlee, K.C. Claffy, and M. Faloutsos. Is p2p dyingor just hiding? [p2p traffic measurement]. GLOBECOM, 2004.
[45] Leonard Kaufman and Peter J. Rousseeuw. Finding Groups in Data: An Introduc-tion to Cluster Analysis. Wiley-Interscience, March 1990.
129
[46] David Kotz, Tristan Henderson, and Ilya Abyzov. CRAWDADdata set dartmouth/campus (v. 2007-02-08). Downloaded fromhttp://crawdad.cs.dartmouth.edu/dartmouth/campus, February 2007.
[47] Udayan Kumar and Ahmed Helmy. User classification and feature extraction fromwlan traces: A gender-based case study (detailed technical report). Available from:http://www.cise.ufl.edu/$\sim$ukumar/techreport-gender.pdf.
[48] Udayan Kumar and Ahmed Helmy. Human behavior and challenges of anonymizingWLAN traces. In IEEE GLOBECOM, 2009.
[49] Udayan Kumar, Nikhil Yadav, and Ahmed Helmy. Gender-basedgrouping of mobile student societies. In MODUS Workshop, IPSN 2008,http : //www.motorola.com/innovators/ ModusWorkshop/Gender Based.pdf,2008.
[50] Anthony LaMarca, Yatin Chawathe, Sunny Consolvo, Jeffrey Hightower, IanSmith, James Scott, Timothy Sohn, James Howard, Jeff Hughes, Fred Potter,Jason Tabert, Pauline Powledge, Gaetano Borriello, and Bill Schilit. Place lab:device positioning using radio beacons in the wild. In Proceedings of the Thirdinternational conference on Pervasive Computing, 2005.
[51] N.D. Lane, E. Miluzzo, Hong Lu, D. Peebles, T. Choudhury, and A.T. Campbell. Asurvey of mobile phone sensing. Communications Magazine, IEEE, 48(9), sept.2010.
[52] David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-Lszl Barabsi, DevonBrewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann,Tony Jebara, Gary King, Michael Macy, Deb Roy, and Marshall Van Alstyne.Computational social science. Science, 323(5915):721–723, 2009.
[53] Qinghua Li, Sencun Zhu, and Guohong Cao. Routing in socially selfish delaytolerant networks. In Infocom, 2010.
[54] Kaisen Lin, Aman Kansal, Dimitrios Lymberopoulos, and Feng Zhao.Energy-accuracy trade-off for continuous mobile device location. In MobiSys,2010.
[55] Yue-Hsun Lin and et al. Spate: small-group pki-less authenticated trustestablishment. In MobiSys, 2009.
[56] Anders Lindgren, Avri Doria, and Olov Schelen. Probabilistic routing inintermittently connected networks. LNC, pages 239–254, 2004.
[57] Ashwin Machanavajjhala, Johannes Gehrke, and Daniel Kifer. l-diversity: Privacybeyond k-anonymity. pages 24–24, April 2006.
[58] Sergio Marti, T. J. Giuli, Kevin Lai, and Mary Baker. Mitigating routing misbehaviorin mobile ad hoc networks. In Mobicom, 2000.
130
[59] Jonathan M. McCune, Adrian Perrig, and Michael K. Reiter. Seeing Is Believing:using camera phones for human authentication. Int. J. Secur. Netw., 4(1/2):43–56,2009.
[60] Miller Mcpherson, Lynn S. Lovin, and James M. Cook. Birds of a feather:Homophily in social networks. Annual Review of Sociology, 27(1):415–444,2001.
[61] Pietro Michiardi, , Pietro Michiardi, and Refik Molva. Simulation-based analysis ofsecurity exposures in mobile ad hoc networks. In European Wireless Conference,2002.
[62] Greg Minshall. Tcpdpriv, 1996, 1996.
[63] Jeffrey C. Mogul and Martin Arlitt. Sc2d: an alternative to trace anonymization. InMineNet ’06: Proceedings of the 2006 SIGCOMM workshop on Mining networkdata, pages 323–328, New York, NY, USA, 2006. ACM.
[64] D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver. Insidethe slammer worm. Security & Privacy, IEEE, 1(4):33–39, July-Aug. 2003.
[65] Mirco Musolesi and Cecilia Mascolo. A community based mobility model for ad hocnetwork research. In ACM REALMAN, 2006.
[66] Martin O’Connell and Gretchen E Gooding. The use of first names to evaluatereports of gender and its effect on the distribution of married and unmarried couplehouseholds. In Population Association of America (PAA) 2006 Annual Meeting,2006.
[67] A. Panagakis, A. Vaios, and I. Stavrakakis. On the Effects of Cooperation in DTNs.In COMSWARE, 2007.
[68] Jeffrey Pang, Ben Greenstein, Ramakrishna Gummadi, Srinivasan Seshan, andDavid Wetherall. 802.11 user fingerprinting. In MobiCom ’07: Proceedings of the13th annual ACM international conference on Mobile computing and networking,pages 99–110, New York, NY, USA, 2007. ACM.
[69] Ruoming Pang, Mark Allman, Vern Paxson, and Jason Lee. The devil and packettrace anonymization. SIGCOMM Comput. Commun. Rev., 36(1):29–38, 2006.
[70] Vikram Srinivasan Pavan, Vikram Srinivasan, Pavan Nuggehalli, Carla F.Chiasserini, and Ramesh R. Rao. Cooperation in wireless ad hoc networks. InIEEE Infocom, 2003.
[71] Kiran K. Rachuri and et al. Emotionsense: a mobile phones based adaptiveplatform for experimental social psychology research. In Ubicomp, 2010.
[72] Glenn Shafer. Perspectives on the theory and practice of belief functions. Int.Journal of Approximate Reasoning, 1990.
131
[73] Katie Shilton, Jeffrey A. Burke, Debra Estrin, Mark Hansen, and Mani Srivastava.Participatory privacy in urban sensing. In MODUS: International Workshop onMobile Device and Urban Sensing, 2008.
[74] Douglas C. Sicker, Paul Ohm, and Dirk Grunwald. Legal issues surroundingmonitoring during network research. In IMC ’07: Proceedings of the 7th ACMSIGCOMM conference on Internet measurement, pages 141–148, New York, NY,USA, 2007. ACM.
[75] Neil Spring, Ratul Mahajan, David Wetherall, and Thomas Anderson. Measuringisp topologies with rocketfuel. IEEE/ACM Trans. Netw., 12(1):2–16, 2004.
[76] Latanya Sweeney. k-anonymity: a model for protecting privacy. InternationalJournal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557–570,March 2002.
[77] Sapon Tanachaiwiwat and Ahmed Helmy. Worm Propagation and Interactionin Mobile Networks in Handbook on Security and Networks. World ScientificPublishing Co., 2010.
[78] G.S. Thakur, M. Sharma, and A. Helmy. Shield: Social sensing and help inemergency using mobile devices. In GLOBECOM, pages 1–5. IEEE, 2010.
[79] Amin Vahdat and David Becker. Epidemic routing for partially-connected ad hocnetworks. Technical report, Duke University, 2000.
[80] Wei Wang, Vikram Srinivasan, and Mehul Motani. Adaptive contact probingmechanisms for delay tolerant applications. In Proceedings of the 13th annual ACMinternational conference on Mobile computing and networking, MobiCom, 2007.
[81] Jun Xu, Jinliang Fan, Mostafa H. Ammar, and Sue B. Moon. Prefix-preservingip address anonymization: Measurement-based security evaluation and a newcryptography-based scheme. In Computer Networks, pages 280–289, 2002.
[82] Bojan Zdrnja, Nevil Brownlee, and Duane Wessels. Passive monitoring of dnsanomalies. In DIMVA, pages 129–139, 2007.
[83] Kai Zeng, Kannan Govindan, and Prasant Mohapatra. Non-cryptographicauthentication and identification in wireless networks. Wireless Communications,2010.
[84] Sheng Zhong, Jiang Chen, and Richard Yang. Sprite: A Simple, Cheat-Proof,Credit-Based System for Mobile Ad-Hoc Networks. In INFOCOM, 2002.
132
BIOGRAPHICAL SKETCH
Udayan Kumar received his B.Tech degree from DA-IICT, Gandhinagar, India and
MS degree in Computer Engineering from University of Florida. He started his PhD in
Computer Engineering at University of Florida in 2008. His research interests include
understanding users’ social behavior from network traces and utilizing the behavior
patterns to develop new insights and applications.
133