uncertainty of identity: classifying twitter data

Post on 12-May-2015

241 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation proposes the methods of classifying Twitter Data. There has been a tremendous rise in the growth of online social networks all over the world in recent times. Here we present the analysis performed on the Twitter data to identify the aspects of cultural and ethnic identity.

TRANSCRIPT

Uncertainty of Identity: Classifying Twitter Data

Muhammad Adnan (and Prof. Paul Longley)

University College London

Uncertainty of Identity: Project Aims• A combined project between UCL, City University, and

University of Birmingham

• Combining real and virtual world datasets to better understand the identity of individuals• Real world datasets (Surname data, socio-economic datasets)• Virtual world datasets (Email addresses, Social media accounts)

My research interests

• Data mining• Analysis of Twitter data • Visualisation of the data

Twitter (www.twitter.com)

• Online social-networking and micro blogging service

• Was launched in 2006. After 6 years, Twitter has 500 million active users.

• Generates 350 million tweets daily

• One of the top 10 most visited websites on the internet

• Twitter API can be used to download live tweets

Twitter API’s data

• User Creation Date• Followers• Friends• User ID• Language• Location• Name• Screen Name• Time Zone

• Geo Enabled• Latitude• Longitude• Tweet date and time• Tweet text

Classifying Twitter Data to ethnic origins

• User Creation Date• Followers• Friends• User ID• Language• Location• Name• Screen Name• Time Zone

• Geo Enabled• Latitude• Longitude• Tweet date and time• Tweet text

Classifying Twitter Data to ethnic origins

• Some examples of NAME variations on Twitter

Real Names

Kevin Hodge

Andre Alves

Jose de Franco

Carolina Thomas, Dr.

Prof. Martha Del Val

Fabíola Sanchez Fernandes

Fake Names

Castor 5.

WHAT IS LOVE?

MysticMind

KIRILL_aka_KID

Vanessa

Petuna

Top Twitter Users

Where they tweet from:

Surname: JONES

Where they tweet from:

Surname: DEE

Where they tweet from:

Surname: SHAH

Classifying Twitter Data to ethnic origins• Applied ONOMAP (www.onomap.org) on FORENAME +

SURNAME pairs

Kevin Hodge (ENGLISH)

Andre de Franco (ITALIAN)

English Scottish Welsh Italian

Pakistani Chinese

Spanish

Indian Polish

German French Portuguese

Bangladeshi

African

Irish

Twitter Ethnicity Maps

English Scottish Welsh Italian

Pakistani Chinese

Spanish

Indian Polish

German French Portuguese

Bangladeshi

African

Irish

Twitter Ethnicity Maps

SpanishGerman

Twitter Ethnicity Maps

French African

Twitter Ethnicity Maps

English Italian

Pakistani Indian

TurkishGreek

Bangladeshi

Spanish

German French

Portuguese

Sikh

Twitter Ethnicity Maps

Chinese Polish Jewish

SwedishNigerian Somalian Ghanian

Sri Lankan

Danish

Twitter Ethnicity Maps

Chinese Polish Jewish

SwedishSomalian Ghanian

Twitter Ethnicity Maps

http://www.guardian.co.uk/news/datablog/

London

Which places they are talking about ?• Tweets containing ‘London’ in their text string• Applying text matching algorithms to remove tweets contain places

which are not London e.g. London Road or London, Ontaio

New York

Which places they are talking about ?

Madrid

Which places they are talking about ?

Twitter Language Maps

Twitter Language Maps

Twitter Language Maps

Conclusion

• Use of social media is increasing day by day

• Social-media datasets can give an insight into people’s behaviour in virtual worlds

• Investigation of ethnicity origins in other countries to establish inferences on migration trends in developed and developing countries

• Future work will involve the investigation of Four Square and Facebook data

Any Questions ?

Thank you for Listening

Web: http://www.uncertaintyofidentity.com

Email: m.adnan@ucl.ac.uk

Twitter: @gisandtech

top related