an interactive method for inferring demographic attributes

24
Valentina Beretta, Daniele Maccagnola, Timothy Cribbin and Enza Messina University of Milano- Bicocca Brunel University

Upload: daniele-maccagnola

Post on 15-Feb-2017

269 views

Category:

Social Media


3 download

TRANSCRIPT

Page 1: An interactive method for inferring demographic attributes

Valentina Beretta, Daniele Maccagnola, Timothy Cribbin and Enza Messina

University of Milano- Bicocca Brunel University

Page 2: An interactive method for inferring demographic attributes

Gathering User Data in Social Media

• Social Media offer to social scientists an unprecedented opportunity for collecting data about people and their characteristics

2 TweetClass - Hypertext 2015 - Cyprus

Traditional survey methods

More reliable More expensive Much slower

Social Media Analytics

Less Reliable Huge amount of data Users share their views for free

VS

Page 3: An interactive method for inferring demographic attributes

Many characteristics make SM based research more attractive:

• Collection of large datasets is fast and relatively cheap • SM users tend to comment in a responsive, ad hoc manner (their opinion is therefore more timely than “designed” research) • The perceived anonymity of SM lead to more “honest” responses

Gathering User Data in Social Media

• Social Media offer to social scientists an unprecedented opportunity for collecting data about people and their characteristics

2 TweetClass - Hypertext 2015 - Cyprus

Social Media Analytics

Page 4: An interactive method for inferring demographic attributes

Demographic Data in Social Media

• However, a key barrier in SM data collection is the absence of explicit or reliable demographic attribute data (specifically age and gender) • Without ready demographic data, researchers make subjective judgements by explaining qualitative characteristics of the users (their post content or visual profile)

• This is very time consuming!

• Automatic methods can be used, but they are not always reliable

3 TweetClass - Hypertext 2015 - Cyprus

Page 5: An interactive method for inferring demographic attributes

TweetClass

A semi-automatic framework that combines automatic classification with an user interface for manually refining ambiguous cases

4 TweetClass - Hypertext 2015 - Cyprus

Page 6: An interactive method for inferring demographic attributes

Outline

• Dataset Collection • Insights on TweetClass

• Automatic classification • Manual refinement

• Cognitive Walkthrough and Summative Evaluation

• First trial (performed by experts) • Results and improvement • Second trial (performed by general users) • Quantitative and qualitative results

5 TweetClass - Hypertext 2015 - Cyprus

Page 7: An interactive method for inferring demographic attributes

Dataset Collection

• Considered classes: • Gender: Male / Female • Age: less than 30 years old / more than 30 years old

• Dataset collected from Twitter • Users labeling:

• Gender: manually determined • Age: automatical search + manual refinement

6 TweetClass - Hypertext 2015 - Cyprus

Page 8: An interactive method for inferring demographic attributes

Automatic Classification

• Machine learning methods are not the main focus of this work

7 TweetClass - Hypertext 2015 - Cyprus

Page 9: An interactive method for inferring demographic attributes

Automatic Classification

• Machine learning methods are not the main focus of this work

7 TweetClass - Hypertext 2015 - Cyprus

• Gender classification was based on the 40N database [Michael 2007]

• Three possible classes: F (female), M (male) and U (unknown) • Unknown users will be refined afterwards

Page 10: An interactive method for inferring demographic attributes

Automatic Classification

• Machine learning methods are not the main focus of this work

7 TweetClass - Hypertext 2015 - Cyprus

• For age classification we tried several classifiers and chosen the best one based on their accuracy • Two models trained on separate datasets (only male and only female) perform better than a single model

• Gender classification was based on the 40N database [Michael 2007]

• Three possible classes: F (female), M (male) and U (unknown) • Unknown users will be refined afterwards

Page 11: An interactive method for inferring demographic attributes

Refinement Phase

8 TweetClass - Hypertext 2015 - Cyprus

• Gender refinement is performed manually on users classified as “unknown”

Page 12: An interactive method for inferring demographic attributes

Refinement Phase

8 TweetClass - Hypertext 2015 - Cyprus

• Gender refinement is performed manually on users classified as “unknown”

• The end-user is shown several data regarding the user to classify: • The user name, screen name and description • The user photo and background image • A subset of the tweets posted by the user

• The end-user then can select the most appropriate class

Page 13: An interactive method for inferring demographic attributes

Refinement Phase

8 TweetClass - Hypertext 2015 - Cyprus

• Gender refinement is performed manually on users classified as “unknown”

Page 14: An interactive method for inferring demographic attributes

Refinement Phase (cont.)

• For age refinement, we introduce a confidence level , that indicates how “confident” is the automatic classifier of its prediction

9 TweetClass - Hypertext 2015 - Cyprus

• The value of confidence level varies between 0.5 (no confidence at all) and 1 (complete confidence) • The end-users will refine only the age class of users whose confidence level is below a certain threshold

Page 15: An interactive method for inferring demographic attributes

Refinement Phase (cont.)

• For age refinement, we introduce a confidence level , that indicates how “confident” is the automatic classifier of its prediction

9 TweetClass - Hypertext 2015 - Cyprus

Page 16: An interactive method for inferring demographic attributes

Cognitive Walkthrough

• To understand if the interface we designed is intuitive and easy to use, we performed a formal evaluation using a method called cognitive walkthrough • The aim is to determine the ease with which naïve users are able to employ the UI to achieve their objectives at each step of the task • Special attention is payed to how well the interface supports “exploratory learning”, i.e. first-time use without formal training

10 TweetClass - Hypertext 2015 - Cyprus

Page 17: An interactive method for inferring demographic attributes

Cognitive Walkthrough

• The usability analyst studied how the end-user progressed through the steps of TweetClass, and asked the following questions:

• Will the user try to achieve the right effect? • Will the user notice that the correct action is available? • Will the user associate the correct action with the effect to be achieved? • If the correct action is performed, will the user see that progress is being made?

11 TweetClass - Hypertext 2015 - Cyprus

Page 18: An interactive method for inferring demographic attributes

Cognitive Walkthrough (trial)

• We recruited two domain experts • Participants were given a 10 minute presentation of the tool – only to introduce the aim and the basic conceptual steps • We used a “thinking aloud” method to induce participants to express their comments and doubts • Participants were then asked to asnwer questions regarding effectiveness, efficiency, information undestanding and easiness of use of the tool

12 TweetClass - Hypertext 2015 - Cyprus

Page 19: An interactive method for inferring demographic attributes

Cognitive Walkthrough (results)

The cognitive walkthrough highlighted several problems:

1. Both participants suggested to include a continuous update about the age and gender composition of the current set of Twitter users, to better decide how many instances to refine;

2. During the two refinement phases, the attention of the experts was mainly captured by the images – and less by the textual information

3. They suggested various improvements to the shown messages and buttons

13 TweetClass - Hypertext 2015 - Cyprus

Page 20: An interactive method for inferring demographic attributes

Interface Prototype

14 TweetClass - Hypertext 2015 - Cyprus

Page 21: An interactive method for inferring demographic attributes

Summative Evaluation

• Finally, we conduced a summative evaluation of the second interface prototype • We recruited 22 participants (15 males and 7 females), of which 12 PhD students, 7 researchers and 3 master students • We collected data regarding completion time, inter-rate agreement and success rate

15 TweetClass - Hypertext 2015 - Cyprus

Page 22: An interactive method for inferring demographic attributes

Summative Evaluation (results)

• Assignment of age takes twice the time for assignment of the gender • The inter-agreement rate (measured ising the Fleiss’ kappa index [Fleiss, 1981]) shown a higher agreement for gender than for age (77,34% vs 70,45%) • The accuracy of the refined instances was generally high: 92% for gender classification, and 91% for age classification • The participants also answered questions about the easiness of use, learning and information understanding regarding the tool – the overall satisfaction was very high

16 TweetClass - Hypertext 2015 - Cyprus

Page 23: An interactive method for inferring demographic attributes

Conclusions and Future Work

• We introduced TweetClass, a proof-of-concept tool to support social scientist in the identification of demographic attributes of Twitter users • As the collection of this data is generally a difficult problem, TweetClass can help to increase the quality and/or dimension of Twitter user samples • Future work will include:

• incorporation of other automatic techniques in the tool • identification of other demographic attributes • expansion to larger datasets

17 TweetClass - Hypertext 2015 - Cyprus

Page 24: An interactive method for inferring demographic attributes

E-mail: [email protected] Website: mind.disco.unimib.it