gdc: group discovery using co-location traces

23
GDC: Group Discovery using Co-location Traces Steve Mardenfeld Daniel Boston Susan Juan Pan Quentin Jones Adriana Iamntichi Cristian Borcea Department of Computer Science, New Jersey Institute of Technology Department of Information Systems, NJIT

Upload: clovis

Post on 23-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

GDC: Group Discovery using Co-location Traces. Steve Mardenfeld Daniel Boston Susan Juan Pan Quentin Jones † Adriana Iamntichi ‡ Cristian Borcea Department of Computer Science, New Jersey Institute of Technology † Department of Information Systems, NJIT - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GDC: Group Discovery using Co-location Traces

GDC: Group Discovery using Co-location TracesSteve MardenfeldDaniel BostonSusan Juan PanQuentin Jones†

Adriana Iamntichi‡Cristian Borcea

Department of Computer Science,New Jersey Institute of Technology†Department of Information Systems, NJIT‡Department of Computer Science, USF

Page 2: GDC: Group Discovery using Co-location Traces

Physical GroupsInformally: groups of people that meet

face to face◦ Formal definition: Homans’ sociology book “The

Human Group”Groups can be used in social or socially

aware applications◦ Recommender systems: recommend concerts

to people who go to concerts together◦ Data forwarding in delay-tolerant ad hoc

networks: give priority to members of same group as destination when selecting next hopHow to detect groups automatically?

Page 3: GDC: Group Discovery using Co-location Traces

Group Detection Using Location TracesUsers carry mobile phones and upload

location to central serverServer analyzes location traces to

detect groupsIn previous work, we developed an

algorithm for group/place detection◦ Achieved 96% accuracy with low false

positives

Problems: Location privacyBattery power

Page 4: GDC: Group Discovery using Co-location Traces

4

GDC: Use Bluetooth Co-location Traces

Advantages◦ Improved location privacy◦ Low power consumption◦ Practicality due to Bluetooth ubiquity in

mobile phones◦ Accuracy due to Bluetooth transmission

range

User Seen

TimeA B 1:00

B A 1:05INTERNET

A B 1:07

A B

B C 1:05

A C 1:07

Page 5: GDC: Group Discovery using Co-location Traces

5

Challenges Attendance at a group is variablePeople may be merely passing near a

group, not remaining part of itGroup members spend different

lengths of time with the groupSampling frequency and user mobility

can affect data completenessEach user may have a different

perspective on the same meeting

Page 6: GDC: Group Discovery using Co-location Traces

6

OutlineGDC AlgorithmUser Study ResultsDistributed GDCConclusions

Page 7: GDC: Group Discovery using Co-location Traces

7

GDC in a NutshellTransform raw Bluetooth records into

meeting records between pairs of users

Discover and record all combinations of users appearing at the same meeting (user clusters)

Resolve differences in user perspectives on shared clusters

Select all significant clusters and output as user groups

Page 8: GDC: Group Discovery using Co-location Traces

8

Creating Pair-wise Meeting Records

Time Stamp

User User With

11:02:01

djb38 jp238

11:02:01

djb38 mak43

11:04:14

djb38 jp238

11:04:14

djb38 mak43

11:07:05

djb38 mak43

Time Stamp

User User With

11:02:15

jp238 djb38

11:02:15

jp238 mak43

11:05:02

jp238 mak43

11:07:50

jp238 djb38

11:07:50

jp238 mak43

Time Stamp

User User With

11:01:30

mak43

jp238

11:01:30

mak43

djb38

11:04:18

mak43

jp238

11:10:10

mak43

jp238

User mak43

Time Stamp

User With

11:01:30

jp238

11:01:30

djb38

11:02:01

djb38

11:02:15

jp238

11:04:14

djb38

11:04:18

jp238

11:05:02

jp238

11:07:05

djb38

11:07:50

jp238

11:10:10

jp238

User djb38Time

StampUser With

11:01:30

mak43

11:02:01

jp238

11:02:01

mak43

11:02:15

jp238

11:04:14

jp238

11:04:14

mak43

11:07:05

mak43

11:07:50

jp238

User jp238Time

StampUser With

11:01:30

mak43

11:02:01

djb38

11:02:15

djb38

11:02:15

mak43

11:04:14

djb38

11:04:18

mak43

11:05:02

mak43

11:07:50

djb38

11:07:50

mak43

11:10:10

mak43

User mak43User With

Start Time

End Time

jp238 11:01:30

11:10:10

djb38 11:01:30

11:07:05

User djb38User With

Start Time

End Time

jp238 11:02:01

11:07:50

mak43 11:01:30

11:07:05

User jp238User With

Start Time

End Time

mak43 11:01:30

11:10:10

djb38 11:02:01

11:07:05

User mak43User With

Start Time

End Time

jp238 11:01:30

11:04:18

jp238 11:07:50

11:10:10

djb38 11:01:30

11:04:14

User djb38User With

Start Time

End Time

jp238 11:02:01

11:04:14

mak43 11:01:30

11:04:14

User jp238User With

Start Time

End Time

mak43 11:01:30

11:05:02

mak43 11:07:50

11:10:10

djb38 11:02:01

11:04:14

Decreasing Meeting Granularity (MG) from 5 min to

2 ½ min produces noticeable changes

Page 9: GDC: Group Discovery using Co-location Traces

9

Creating User Clusters

User mak43User With

Start Time

End Time

jp238 11:01:30

11:10:10

djb38 11:01:30

11:07:05

User djb38User With

Start Time

End Time

jp238 11:02:01

11:07:50

mak43 11:01:30

11:07:05

User jp238User With

Start Time

End Time

mak43 11:01:30

11:10:10

djb38 11:02:01

11:07:05User mak43

Users With

Time Spent

jp238, djb38

00:05:35

jp238 00:08:40djb38 00:05:35

User djb38Users With

Time Spent

jp238, mak43

00:05:04

jp238 00:05:49mak43 00:05:35

User jp238Users With

Time Spent

djb38, mak43

00:05:04

djb38 00:05:04mak43 00:08:40

Page 10: GDC: Group Discovery using Co-location Traces

10

Creating Global ClustersResolve Perspective Differences

◦ Use Minimum Group Time (MGT)◦ Use Minimum Group Meeting Frequency

(MGMF)User mak43Users With Time

Spentjp238, djb38

00:05:35

jp238 00:08:40djb38 00:05:35

User djb38Users With Time

Spentjp238, mak43

00:05:04

jp238 00:05:49mak43 00:05:35

User jp238Users With Time

Spentdjb38, mak43

00:05:04

djb38 00:05:04mak43 00:08:40

Cluster Minimum Time

Min. Frequency

djb38, jp238, mak43

00:05:04 1

djb38, mak43 00:05:35 1djb38, jp238 00:05:04 1jp238, mak43 00:08:40 1

Page 11: GDC: Group Discovery using Co-location Traces

11

Selecting the User GroupsIdentify and remove subgroups of

significant groups◦ Keep a subgroup if it meets double the time of

the group that includes itCluster Minimum

Timedjb38, jp238, mak43

00:05:04

djb38, mak43 00:05:35jp238, mak43 00:10:40

Group Min. Time

djb38, jp238, mak43

00:05:04

jp238, mak43

00:10:40

Page 12: GDC: Group Discovery using Co-location Traces

12

Complexity AnalysisR - total number of Bluetooth recordsN - total number of users in the

datasetL - maximum number of users in a

group◦ Small value because relatively few users

are in the transmission range (10m)◦ Our experiments: max = 15, avg = 6.8Creating Pair-Wise Meeting

Records O(R)Creating User Clusters O(R * 2L)

Creating Global Clusters O(N * 2L)Selecting the User Groups O(R * 2L)

Total ComplexityO(R * 2L), R>> N

Page 13: GDC: Group Discovery using Co-location Traces

13

EvaluationGoals

◦ Analyze effect of group meeting frequency and time

◦ Compare GDC and K-Clique K-Clique uses a time threshold to select graph

edges and analyzes the graph for k-cliquesExperiments

◦ Collect data from mobile phones carried by 100+ volunteer students on campus for one month

◦ Run GDC and K-Clique on collected data Also tested on Reality Mining data from MIT

◦ Ask users to rank groups using Likert Scale 1 to 5, 5 is best

Page 14: GDC: Group Discovery using Co-location Traces

14

Data Collection Details

78 users each contributed less than 24 hours of recorded data

Sparse data: random volunteers, many students are commuters

Demographics: 72% male, 28% female, 25% graduate, 75% undergraduate

0

2

4

6

8

10

12

14

16

18

0 2 4 6 8 10 15 20 25 30 35 40 45 50 55 60 65 70 75 100

125

150

175

200

250

300

MoreNumber of Hours

Num

ber

of U

sers

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

FrequencyCumulative %

Page 15: GDC: Group Discovery using Co-location Traces

15

Effect of Meeting Time and Frequency

Detection accuracy increases significantly with meeting frequency

and total meeting time

0

2

4

6

8

10

12

14

16

2000-3000 3000-5000 5000-7000 >7000Minimum Group Time

Rati

ng F

requ

ency

Very BadBadOKGoodVery Good

0

5

10

15

20

25

1 2 3-4 5 and GreaterGroup Meeting Frequency

Rati

ng F

requ

ency

Very BadBadOkGoodVery Good

Page 16: GDC: Group Discovery using Co-location Traces

16

GDC vs. K-Clique

Overall, GDC groups rated 30% better than the popular K-Clique algorithm◦ GDC groups are guaranteed to meet◦ Not all K-Clique groups meet

Some GDC groups are rated poorly because members don’t know their names

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Very Bad Bad OK Good Very GoodRating Category

Perc

enta

ge o

f Tot

al R

atin

gsK-CliqueGDC GDC:

MGT = 2000s

MGMF = 2

K-Clique:

Threshold 2000s

Page 17: GDC: Group Discovery using Co-location Traces

17

GDC Groups: NJIT Dataset vs. Reality Mining Dataset

Group distributions as a function of size are relatively similar despite the fact that Reality Mining is a denser dataset

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

3 4 5 6 7 8 9 10 11 12 13Group Size

Perc

enta

ge o

f Tot

al G

roup

s

Reality Mining

NJ IT Datset

NJIT: MGT = 2000s, MGMF = 1 Reality Mining: MGT = 18000s,

MGMF = 9 (normalized for 9 months)

Page 18: GDC: Group Discovery using Co-location Traces

18

OutlineGDC AlgorithmUser Study ResultsDistributed GDCConclusions

Page 19: GDC: Group Discovery using Co-location Traces

19

Distributed GDC (D-GDC)GDC executed on the phonesBenefits

◦ Better privacy Avoid “Big Brother” scenario Ability to control message exchange on a per-case

basis◦ Resiliency: no bottleneck & no single point of

failure◦ Flexibility: each user controls how often to run

D-GDC

Page 20: GDC: Group Discovery using Co-location Traces

20

D-GDC ImplementationCollect Bluetooth records locally

through message exchange◦ No global aggregation like in GDC

Control exchange with heuristic policies◦ These policies can be specified by users◦ Allows greater individual privacy control

Run remainder of GDC device-localEvaluated using replay simulation over

our real traces

Page 21: GDC: Group Discovery using Co-location Traces

Preliminary Results

Overall similarity: compute similarity of each user’s GDC groups against the closest matches in D-GDC and average the results

Compared D-GDC with a version running only on data collected locally by phones◦ D-GDC performs significantly better than local-

only version

D-GDC Local only

Average similarity 77.33% 58.24%Groups with similarity > 90%

59.77% 19.14%

21

Page 22: GDC: Group Discovery using Co-location Traces

22

ConclusionPhysical groups enable new socially-

aware features in applicationsGDC: practical, high-accuracy, no

location collection◦ Validated by users and outperforms K-

Clique by 30%◦ Higher accuracy can be achieved by

increasing frequency and time parametersA decentralized version improves

privacy and produces promising results

Page 23: GDC: Group Discovery using Co-location Traces

23

Thank You!

Mobius project: http://www.cs.njit.edu/~borcea/mobius/

Acknowledgement: NSF grants CNS-0831753 and CNS-0834585