using geolocated twitter traces to infer residence and mobility nigel swier, bence kormaniczky, and...

15
Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Upload: branden-stewart-crawford

Post on 19-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Using geolocated Twitter traces to infer residence and mobility

Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Page 2: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Background

• ONS Big Data Project: This is one of four pilots exploring the use of big data for official statistics

• Users tweeting from a smartphone have an option to provide a GPS location

• 300,000-plus such tweets sent daily within GB• Data is relatively accessible• Can these data be used to infer residence and

mobility patterns?

Page 3: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Age Distribution of UK Twitter Users

Page 4: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Data Acquisition

• Target data: All geolocated tweets sent within Great Britain between (1 April 2014 to 31 October 2014)

• Combination of Twitter API and procured data (GNIP)

• 81.4 million tweets• Stored as JSON files in MongoDB

Page 5: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Distribution of user activity

Page 6: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Distribution of persistence levels

User frequency

count

Users with geolocated tweets on just one day not shown

Page 7: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Geo-located Twitter volumes by Device Type Great Britain, 15 August to 31 October 2014

Page 8: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Lots of activity in different places but where does this person* live?

* This example is based a real data but has been altered to prevent identification

Page 9: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

DBSCAN

DBSCAN (Density Based Spatial Clustering Algorithm with Noise)

•i = distance (radius)•minpts = minimum points to define a cluster

Developed by Ester et al (1996)

Page 10: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Raw Data

Cluster Centroid

Noise

Cluster_id Northing Easting Count Type

60033_1 105?31 530?02 28 Residential

60022_2 104?41 530?94 4 Residential

60033_6 182?46 532?10 13 Commercial

60033_13 104?56 531?17 3 Commercial

60033_15 179?30 533?95 3 Commercial

60033_21 165?47 532?51 3 Commercial

Most likely lives here:“Dominant Residential Cluster”

Page 11: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Time of day profile by address type

Page 12: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Geolocated penetration rates*by local authority

* Dominant residential cluster with date range of at least one month

Page 13: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Student mobility

Page 14: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Conclusions

• Twitter may be useful for identifying short-term mobility patterns

• DBSCAN can identify anchor points and AddressBase can classify them

• Results are indicators NOT estimates - may be possible to produce new de-facto based population statistics

• Twitter could help inform public policy but we need to be extremely alert to source changes.

Page 15: Using geolocated Twitter traces to infer residence and mobility Nigel Swier, Bence Kormaniczky, and Ben Clapperton

Next Steps

• Technical Report to be published shortly• Developing methods for inferring socio-

demographic characteristics• Development of an estimation framework

(including a benchmarking survey)