what and how children search on the web sergio duarte torres, ingmar weber

Post on 20-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

WHAT AND HOW CHILDREN SEARCH ON THE WEB

Sergio Duarte Torres, Ingmar Weber

WHAT IS LOVE?

Motivation

Goals of this work

• Identify and quantify search struggle of young users

•Retrace stages of child development through their web searches

What data was used?• US Yahoo! search logs from May to August of 2010• Cleaning steps:

• User wise:• Logs from users without Yahoo! accounts were removed

• Query wise:• Queries issued by a single user were removed• Queries with personally identifiable information• Non alpha-numerical single token queries

Why the cleaning? What could be advantages/disadvantages?

An aside about the data• Users under 13 years old required the consent of an

responsible adult to register at Yahoo! (costs $.50)

• Some people may lie about their age…• General trends are expected to be robust to noise• People may lie about their age but … usually they tend to make

themselves appear older

Where do you think millions of children lie about their age?http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3850/3075

Data segmentation• Users grouped based on their reported birth year• Age estimated as: 2010 – Birth year• Following age buckets were created:

• 6-7: early elementary • 8-9: readers• 10-12: advance readers• 13-15: teenagers• 16-18 : mature teenagers• >18: grown ups

Data characteristics

• Data set size

Below 10 years old Above 10 years old

Volume of queries >100K >1M

Number of users >10K >100K

Methodology: Micro- vs. Macro-Averages

• User A:• 100x cooking• 10x science

• User B:• 1x cooking• 5x science

• User C:• 2x cooking• 10x science

• Micro avg.: cooking = (100+1+2)/(100+10+1+5+2+10) = 0.80• Macro avg.: cooking = (100/110 + 1/6 + 2/12) / 3 = 0.41

People search mostly for cooking.True? False?

Methodology: Detecting Navigational Queriesfacebook, yahoo mail, google, ...

How would you do it?

• Editorial judgments• Ask human judges to mark queries a navigational• Drawbacks?

• Click entropy• Look at the diversity of the results clicked in response• Drawbacks?

• String similarity heuristics• Try to find query as substring in clicked domain• Drawbacks?

Search Difficulty Outline

1. Query length

2. Natural language usage

3. Click position bias

4. Other signs of click position bias

5. Children expose to adult content

6. Time spent on web results

7. Sessions characteristics

Query length• Increasing query length through the age groups

• Slightly bigger gap for non-navigational queries

• Greater ambiguity in children queries

6-7 10-12 13-15 Adults2.5

2.6

2.7

2.8

2.9

3.0

3.1

3.2

All

Avg

(T

oke

ns)

Natural language usage (I)• Questions instead of queries

• what is the only immortal animal?

• Modal queries• I don’t want to go to school

• Factual queries• describe the parts of a cell

• Superlative queries• the fastest dog

• Targeted queries for kids• car photos for kids

Natural language usage (II)• Greater NL usage at younger ages• Teenagers behavior closer to children than adults

behavior

6-7 10-12 13-15 Adults0.0%1.0%2.0%3.0%4.0%5.0%6.0%7.0%8.0%

NLQuestionTargetedF

ract

ion

Click position bias

0 1 2 3 40.5

0.7

0.9

1.1

1.3

1.5

1.7

1.9

2.1

2.3

2.5

6-710-1213-15Adults

Rat

io r

elat

ive

to a

du

lts

Other explanations?

Clicks on ads• Children aged 6-9 more likely to click on ads!• Evidence of disorientation during the search process

6-7 10-12 13-15 Adults0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

Rat

io r

elat

ive

to a

du

lts

How to evaluate search success using click data?

•How would you do it?

Time spent on web results• Click duration as a signal of search success. Hassan et al

(2010) WSDM ‘10

• Short click (0-10 secs): Unsuccessful click• Long click (≥ 100 secs): Successful click

6-7 10-12 13-15 19-25 Adults0.0

0.5

1.0

1.5

2.0

2.5

3.0

ShortLong

Rat

io

rela

tive

to

ad

ult

s

Children exposed to adult content• Likelihood of accidental click on adult content:

• Click on adult content is short and the action is immediately reverted by a click on a non-adult content

6 to 7 8 to 9 10 to 12 Adults1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

Rat

io r

elat

ive

to a

du

lts

Sessions characteristics (I)• Shorter sessions in young users• Jump to adulthood also occurs in the group of users from 19 to 25

6-7 10-12 13-15 19-25 adults3.53.73.94.14.34.54.74.95.15.35.5

Avg

nu

mb

er o

f ac

tio

ns

Sessions characteristics (II)• Query refinding

c

q

q’

q

What do refinding queries indicate?

Sessions characteristics (III)• Click refinding

q

c

c’

c

Sessions characteristics (IV)

6-7 10-12 13-15 19-25 Adults0.1

0.15

0.2

0.25

0.3

0.35

Query ref.Click ref.

Avg

ref

. p

er s

essi

on

Shorter sessions?

Tracing children development on the web: Outline

1. What do children search for?

2. What entities are children interested in?

3. Does the reading level of the clicks varies across ages and education?

Classifying queries into topics

“sigir 2011”?

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

computers_and_internet/programming_and_development

Classifying queries into topics

What do children search for?

6 to 7 10 to 12 13 to 15 19 to 25 adults0.0

0.1

0.2

GamesEntertainment/musicAdult contentComputers & InternetNews Entertainment/tvEducation

Fra

ctio

n

• Children and teenager groups have few dominant topics• Adults have more diverse query topics • Also due to smaller vocabulary

Gender differences (I)• Topic distribution per each group and gender• 1-Norm to quantify gender differences• Example for age group 10-12• ||

Which topic is most responsible for gender differences?

Gender differences (II)

6-7 10-12 13-15 Adults0.2

0.25

0.3

0.35

0.4

0.45

Avg gender differenceAvg gender difference (without adult content)

Avg

gen

der

dif

fere

nce

What entities are children interested in?

• Queries mapped to Wikipedia entities using site search on wikipedia.org/wiki

Query Entity

facebook, facebook login en.wikipedia.org/wiki/Facebook

back to school clothes, london schol uniforms

en.wikipedia.org/wiki/School_uniform

Hummus recipe, ideal protein en.wikipedia.org/wiki/Hummus

How to map web queries to Wikipedia pages?

What entities are children interested in? (10-12)

What entities are adults interested in? (40+)

What entities are children interested in?

• Greater used of child oriented entities at young ages

6-7 8-9 10-12 19-25 Adults0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

Fra

ctio

n

Does the reading level of the clicks varies across ages?• Based on Google reading level classification

• 70% (kids) vs 50% (adults) of clicks classified as basic

Does the reading level of the clicks vary across ages? (II)• Reading level also varies according to education level

• Education level of adults according to US census

8-9 10-12 13-15 Adults0%

10%

20%

30%

40%

50%

60%

70%

80%

Basic (Low-edu)Basic(high-edu)F

ract

ion

CIKM 2011. Glasgow, 26 of October

Gender: MaleBirth year: 1978ZIP code: 95054

cheap holidays

Expected income: $ 31k

Expected education: 45% BA

Race distribution: 38% w, 47% A

Label (Q,D) with $31k, 45%BA, ...

Q

D

US Census Datafactfinder.census.gov

Getting demographics from US census

Conclusions• Clear behavioral differences between children and adults

• Although not clean between teenagers and children

• Sudden jump to adulthood from 19 to 25 years old

• Stronger position click biased for children, including ads

• Assistance of question queries

• Understanding concerns expressed in their queries

THANK YOU FOR YOUR ATTENTION

top related