Transcript
Page 1: Spatio-temporal demographic classification of the Twitter users

Spatio-temporal demographic classification of the Twitter users

Paul Longley, Muhammad Adnan, Guy LansleyDepartment of Geography, University College London

Web: http://www.uncertaintyofidentity.com

Page 2: Spatio-temporal demographic classification of the Twitter users

Outline

1. Introduction• Geodemographics • Social Media Geodemographics

2. Twitter

3. A geo-temporal demographic classification of Twitter users• Residence of Twitter users• Ethnic classification of Twitter users

• Age classification of Twitter users• Computing the demographic classification

Page 3: Spatio-temporal demographic classification of the Twitter users

Introduction

• Geodemographics• Analysis of people by where they live” [1] • Night time characteristics of the population

• Social Media Geodemographics • Moving beyond the night time geography

• Who: Ethnicity, Gender, and Age of social media users

• When: What time of day conversations happen

• Where: Where social media conversations happen

[1] Sleight, P. (2004). Targetting Customers-How to Use Geodemographic and Lifestyle Data in Your Business.

Page 4: Spatio-temporal demographic classification of the Twitter users

Twitter (www.twitter.com)

• Online social-networking and micro blogging service• Launched in 2006

• Users can send messages of 140 characters or less

• Approximately 200 million active users [2]

• 350 million tweets daily

• In 2013, UK and London were ranked 4th and 3rd, respectively, in terms of the number of posted tweets [3]

[2] Twitter. 2012. What is Twitter ?. Retrieved 31st December, 2012, from https://business.twitter.com/basics/what-is-twitter/.

[3] Bennet, S. 2013. Revealed: The Top 20 Countries and Cities of Twitter [STATS]. Retrieved 31st December, 2013, from http://www.mediabistro.com/alltwitter/twitter-top-countries_b26726.

Page 5: Spatio-temporal demographic classification of the Twitter users

Data available through the Twitter API

• User Creation Date

• Followers

• Friends

• User ID• Language• Location• Name

• Screen Name

• Time Zone

• Geo Enabled• Latitude• Longitude

• Tweet date and time

• Tweet text

Page 6: Spatio-temporal demographic classification of the Twitter users

Twitter data for the case study

• Approx. 8 million geo-tagged tweets (Jan – Dec, 2013)• Sent by 385,050 unique users

• 155,249 users sent 5 or more tweets (7.6 million tweets)

Page 7: Spatio-temporal demographic classification of the Twitter users

Variables for creating a geo-temporal classification

1. Residence• Where twitter users live

1. Ethnicity• Probable ethnic origins of Twitter users

1. Age• Probable Age of Twitter users

1. Land Use Category of a Tweet message• Residential; Non-domestic building; Park etc.

2. Temporal Scales• Day, Afternoon, Night, Peak travel hours

Page 8: Spatio-temporal demographic classification of the Twitter users

Residence of Twitter Users

• 170m X 170m grid was used to find the probable residence of users

• Probable residence was found for the 75,522 users

Page 9: Spatio-temporal demographic classification of the Twitter users

Extracting demographic attributes of Twitter users by using their forenames and surnames

A name is a statement of the bearer’s cultural, ethnic, and linguistic identity [4]

[4] Mateos P, Longley P A, O’Sullivan D 2011. Ethnicity and population structure in personal naming networks. PloS ONE (Public Library of Science) 6 (9) e22943.

Page 10: Spatio-temporal demographic classification of the Twitter users

Analysing Names on Twitter

• Some examples of NAME variations on Twitter

• Approx. 68% of the accounts have real names

Fake Names

Castor 5.

WHAT IS LOVE?

MysticMind

KIRILL_aka_KID

Vanessa

Justin Bieber Home

Real Names

Kevin Hodge

Andre Alves

Jose de Franco

Carolina Thomas, Dr.

Prof. Martha Del Val

Fabíola Sanchez Fernandes

Page 11: Spatio-temporal demographic classification of the Twitter users

Onomap: Names to Ethnicity classification

• Onomap was created by clustering names of 1 billion individuals around the world

• Applied ONOMAP (www.onomap.org) on forename – surname pairs

Kevin Hodge (English)

Pablo Mateos (Spanish)

Page 12: Spatio-temporal demographic classification of the Twitter users

• Monica dataset provided by CACI Ltd, UK• Supplemented with UK birth certificate records

Age estimation from ‘forenames’

[5] Longley, P., Adnan, M., Lansley, G. 2013. “The geo-temporal demographics of Twitter usage”. Environment and Planning A. (In Press)

Page 13: Spatio-temporal demographic classification of the Twitter users

Age distribution of Twitter users

Twitter Users vs. 2011 Census (Greater London)

[5] Longley, P., Adnan, M., Lansley, G. 2013. “The geo-temporal demographics of Twitter usage”. Environment and Planning A. (In Press)

Page 14: Spatio-temporal demographic classification of the Twitter users

Land-use Categories• Every tweet message was assigned a land-use category

Page 15: Spatio-temporal demographic classification of the Twitter users

Variables for creating a geo-temporal classification1. ResidenceV1: Tweet made near probable London residence

V2: Tweeter lives ‘outside the UK’

V3: Tweeter lives in the rest of the UK outside London

2. Total Number of TweetsV4: Total number of tweets made by the user

3. EthnicityV5: West European

V6: East European

V7: Greek or Turkish

V8: South East Asian

V9: Other Asian

V10: African & Caribbean

V11: Jewish

V12: Chinese

V13: Other minority

4. AgeV14: <=20

V15: 21 - 30

V16: 31 - 40

V17: 41 - 50

V18: 50+

5. Tweets outside the UKV19: In West Europe (not including UK)

V20: In East Europe

V21: In North America

V22: In Central or South American

V23: In Australasia

V24: In Africa

V25: In Middle East

V26: In Asia

V27: In Paris

Page 16: Spatio-temporal demographic classification of the Twitter users

Variables for creating a geo-temporal classification

6. Number of countries visitedV28: Number of countries tweeter has visited

7. London Land Use CategoryV29: Residential location

V30: Non-domestic buildings

V31: Transport links and locations

V32: Green-spaces

V33: All other land uses

8. 2011 London Output Area ClassificationV34: Intermediate Lifestyles

V35: High Density and High Rise Flats

V36: Settled Asians

V37: Urban Elites

V38: City Vibe

V39: London Life-Cycle

V40: Multi-Ethnic Suburbs

V41: Ageing-City Fringe

9. Temporal ScalesV42: Morning Peak Hours

V43: Week Day

V44: Afternoon

V45: Week Night

V46: Weekend

Page 17: Spatio-temporal demographic classification of the Twitter users

• Segmentations were created by using K-means clustering algorithm

• K-means tries to find cluster centroids by minimising

• Seven clusters

• Group A: London Residents

• Group B: Commuting Professionals

• Group C: Student Lifestyle

• Group D: The Daily Grind

• Group E: Spectators

• Group F: Visitors

• Group G: Workplace and tourist activity

Computing the geo-temporal classifications

∑∑ −= =

=n

x

n

yyxV z

1 1

2

)( µ

Page 18: Spatio-temporal demographic classification of the Twitter users

Group A: London Residents

• Tweets made near primary residential locations

• Tweets made on weeknights or weekends

Page 19: Spatio-temporal demographic classification of the Twitter users

Group B: Commuting Professionals

• Tweets made from• Transport locations• ‘Urban Elites’ LOAC classification

• Tweets made by individuals of intermediate age (21-30)

Page 20: Spatio-temporal demographic classification of the Twitter users

Group F: Visitors

• Tweeters live outside London

• Tweets originated from residential land uses

• Mixed age groups

Page 21: Spatio-temporal demographic classification of the Twitter users

Group G: Workplace and tourist activity

• Tweets sent from non-domestic buildings

• Full range of Twitter age cohorts

• Tweets originate from a mix of residents and international visitors

Page 22: Spatio-temporal demographic classification of the Twitter users

Conclusion

• Geo-temporal demographic classifications• Census (night time geography)

• Social media data (day and travel time geography)• Issues of representation

• An insight into the residential and travel geographies of individuals

• An insight into the spatial activity patterns of different kind of social media users

Page 23: Spatio-temporal demographic classification of the Twitter users

Any Questions ?

Thank you for Listening


Top Related