an interactive method for inferring demographic attributes

Post on 15-Feb-2017

269 Views

Category:

Social Media

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Valentina Beretta, Daniele Maccagnola, Timothy Cribbin and Enza Messina

University of Milano- Bicocca Brunel University

Gathering User Data in Social Media

• Social Media offer to social scientists an unprecedented opportunity for collecting data about people and their characteristics

2 TweetClass - Hypertext 2015 - Cyprus

Traditional survey methods

More reliable More expensive Much slower

Social Media Analytics

Less Reliable Huge amount of data Users share their views for free

VS

Many characteristics make SM based research more attractive:

• Collection of large datasets is fast and relatively cheap • SM users tend to comment in a responsive, ad hoc manner (their opinion is therefore more timely than “designed” research) • The perceived anonymity of SM lead to more “honest” responses

Gathering User Data in Social Media

• Social Media offer to social scientists an unprecedented opportunity for collecting data about people and their characteristics

2 TweetClass - Hypertext 2015 - Cyprus

Social Media Analytics

Demographic Data in Social Media

• However, a key barrier in SM data collection is the absence of explicit or reliable demographic attribute data (specifically age and gender) • Without ready demographic data, researchers make subjective judgements by explaining qualitative characteristics of the users (their post content or visual profile)

• This is very time consuming!

• Automatic methods can be used, but they are not always reliable

3 TweetClass - Hypertext 2015 - Cyprus

TweetClass

A semi-automatic framework that combines automatic classification with an user interface for manually refining ambiguous cases

4 TweetClass - Hypertext 2015 - Cyprus

Outline

• Dataset Collection • Insights on TweetClass

• Automatic classification • Manual refinement

• Cognitive Walkthrough and Summative Evaluation

• First trial (performed by experts) • Results and improvement • Second trial (performed by general users) • Quantitative and qualitative results

5 TweetClass - Hypertext 2015 - Cyprus

Dataset Collection

• Considered classes: • Gender: Male / Female • Age: less than 30 years old / more than 30 years old

• Dataset collected from Twitter • Users labeling:

• Gender: manually determined • Age: automatical search + manual refinement

6 TweetClass - Hypertext 2015 - Cyprus

Automatic Classification

• Machine learning methods are not the main focus of this work

7 TweetClass - Hypertext 2015 - Cyprus

Automatic Classification

• Machine learning methods are not the main focus of this work

7 TweetClass - Hypertext 2015 - Cyprus

• Gender classification was based on the 40N database [Michael 2007]

• Three possible classes: F (female), M (male) and U (unknown) • Unknown users will be refined afterwards

Automatic Classification

• Machine learning methods are not the main focus of this work

7 TweetClass - Hypertext 2015 - Cyprus

• For age classification we tried several classifiers and chosen the best one based on their accuracy • Two models trained on separate datasets (only male and only female) perform better than a single model

• Gender classification was based on the 40N database [Michael 2007]

• Three possible classes: F (female), M (male) and U (unknown) • Unknown users will be refined afterwards

Refinement Phase

8 TweetClass - Hypertext 2015 - Cyprus

• Gender refinement is performed manually on users classified as “unknown”

Refinement Phase

8 TweetClass - Hypertext 2015 - Cyprus

• Gender refinement is performed manually on users classified as “unknown”

• The end-user is shown several data regarding the user to classify: • The user name, screen name and description • The user photo and background image • A subset of the tweets posted by the user

• The end-user then can select the most appropriate class

Refinement Phase

8 TweetClass - Hypertext 2015 - Cyprus

• Gender refinement is performed manually on users classified as “unknown”

Refinement Phase (cont.)

• For age refinement, we introduce a confidence level , that indicates how “confident” is the automatic classifier of its prediction

9 TweetClass - Hypertext 2015 - Cyprus

• The value of confidence level varies between 0.5 (no confidence at all) and 1 (complete confidence) • The end-users will refine only the age class of users whose confidence level is below a certain threshold

Refinement Phase (cont.)

• For age refinement, we introduce a confidence level , that indicates how “confident” is the automatic classifier of its prediction

9 TweetClass - Hypertext 2015 - Cyprus

Cognitive Walkthrough

• To understand if the interface we designed is intuitive and easy to use, we performed a formal evaluation using a method called cognitive walkthrough • The aim is to determine the ease with which naïve users are able to employ the UI to achieve their objectives at each step of the task • Special attention is payed to how well the interface supports “exploratory learning”, i.e. first-time use without formal training

10 TweetClass - Hypertext 2015 - Cyprus

Cognitive Walkthrough

• The usability analyst studied how the end-user progressed through the steps of TweetClass, and asked the following questions:

• Will the user try to achieve the right effect? • Will the user notice that the correct action is available? • Will the user associate the correct action with the effect to be achieved? • If the correct action is performed, will the user see that progress is being made?

11 TweetClass - Hypertext 2015 - Cyprus

Cognitive Walkthrough (trial)

• We recruited two domain experts • Participants were given a 10 minute presentation of the tool – only to introduce the aim and the basic conceptual steps • We used a “thinking aloud” method to induce participants to express their comments and doubts • Participants were then asked to asnwer questions regarding effectiveness, efficiency, information undestanding and easiness of use of the tool

12 TweetClass - Hypertext 2015 - Cyprus

Cognitive Walkthrough (results)

The cognitive walkthrough highlighted several problems:

1. Both participants suggested to include a continuous update about the age and gender composition of the current set of Twitter users, to better decide how many instances to refine;

2. During the two refinement phases, the attention of the experts was mainly captured by the images – and less by the textual information

3. They suggested various improvements to the shown messages and buttons

13 TweetClass - Hypertext 2015 - Cyprus

Interface Prototype

14 TweetClass - Hypertext 2015 - Cyprus

Summative Evaluation

• Finally, we conduced a summative evaluation of the second interface prototype • We recruited 22 participants (15 males and 7 females), of which 12 PhD students, 7 researchers and 3 master students • We collected data regarding completion time, inter-rate agreement and success rate

15 TweetClass - Hypertext 2015 - Cyprus

Summative Evaluation (results)

• Assignment of age takes twice the time for assignment of the gender • The inter-agreement rate (measured ising the Fleiss’ kappa index [Fleiss, 1981]) shown a higher agreement for gender than for age (77,34% vs 70,45%) • The accuracy of the refined instances was generally high: 92% for gender classification, and 91% for age classification • The participants also answered questions about the easiness of use, learning and information understanding regarding the tool – the overall satisfaction was very high

16 TweetClass - Hypertext 2015 - Cyprus

Conclusions and Future Work

• We introduced TweetClass, a proof-of-concept tool to support social scientist in the identification of demographic attributes of Twitter users • As the collection of this data is generally a difficult problem, TweetClass can help to increase the quality and/or dimension of Twitter user samples • Future work will include:

• incorporation of other automatic techniques in the tool • identification of other demographic attributes • expansion to larger datasets

17 TweetClass - Hypertext 2015 - Cyprus

E-mail: daniele.maccagnola@disco.unimib.it Website: mind.disco.unimib.it

top related