what and how children search on the web sergio duarte torres, ingmar weber
Post on 20-Jan-2016
217 Views
Preview:
TRANSCRIPT
WHAT AND HOW CHILDREN SEARCH ON THE WEB
Sergio Duarte Torres, Ingmar Weber
WHAT IS LOVE?
Motivation
Goals of this work
• Identify and quantify search struggle of young users
•Retrace stages of child development through their web searches
What data was used?• US Yahoo! search logs from May to August of 2010• Cleaning steps:
• User wise:• Logs from users without Yahoo! accounts were removed
• Query wise:• Queries issued by a single user were removed• Queries with personally identifiable information• Non alpha-numerical single token queries
Why the cleaning? What could be advantages/disadvantages?
An aside about the data• Users under 13 years old required the consent of an
responsible adult to register at Yahoo! (costs $.50)
• Some people may lie about their age…• General trends are expected to be robust to noise• People may lie about their age but … usually they tend to make
themselves appear older
Where do you think millions of children lie about their age?http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3850/3075
Data segmentation• Users grouped based on their reported birth year• Age estimated as: 2010 – Birth year• Following age buckets were created:
• 6-7: early elementary • 8-9: readers• 10-12: advance readers• 13-15: teenagers• 16-18 : mature teenagers• >18: grown ups
Data characteristics
• Data set size
Below 10 years old Above 10 years old
Volume of queries >100K >1M
Number of users >10K >100K
Methodology: Micro- vs. Macro-Averages
• User A:• 100x cooking• 10x science
• User B:• 1x cooking• 5x science
• User C:• 2x cooking• 10x science
• Micro avg.: cooking = (100+1+2)/(100+10+1+5+2+10) = 0.80• Macro avg.: cooking = (100/110 + 1/6 + 2/12) / 3 = 0.41
People search mostly for cooking.True? False?
Methodology: Detecting Navigational Queriesfacebook, yahoo mail, google, ...
How would you do it?
• Editorial judgments• Ask human judges to mark queries a navigational• Drawbacks?
• Click entropy• Look at the diversity of the results clicked in response• Drawbacks?
• String similarity heuristics• Try to find query as substring in clicked domain• Drawbacks?
Search Difficulty Outline
1. Query length
2. Natural language usage
3. Click position bias
4. Other signs of click position bias
5. Children expose to adult content
6. Time spent on web results
7. Sessions characteristics
Query length• Increasing query length through the age groups
• Slightly bigger gap for non-navigational queries
• Greater ambiguity in children queries
6-7 10-12 13-15 Adults2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
All
Avg
(T
oke
ns)
Natural language usage (I)• Questions instead of queries
• what is the only immortal animal?
• Modal queries• I don’t want to go to school
• Factual queries• describe the parts of a cell
• Superlative queries• the fastest dog
• Targeted queries for kids• car photos for kids
Natural language usage (II)• Greater NL usage at younger ages• Teenagers behavior closer to children than adults
behavior
6-7 10-12 13-15 Adults0.0%1.0%2.0%3.0%4.0%5.0%6.0%7.0%8.0%
NLQuestionTargetedF
ract
ion
Click position bias
0 1 2 3 40.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
2.1
2.3
2.5
6-710-1213-15Adults
Rat
io r
elat
ive
to a
du
lts
Other explanations?
Clicks on ads• Children aged 6-9 more likely to click on ads!• Evidence of disorientation during the search process
6-7 10-12 13-15 Adults0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
Rat
io r
elat
ive
to a
du
lts
How to evaluate search success using click data?
•How would you do it?
Time spent on web results• Click duration as a signal of search success. Hassan et al
(2010) WSDM ‘10
• Short click (0-10 secs): Unsuccessful click• Long click (≥ 100 secs): Successful click
6-7 10-12 13-15 19-25 Adults0.0
0.5
1.0
1.5
2.0
2.5
3.0
ShortLong
Rat
io
rela
tive
to
ad
ult
s
Children exposed to adult content• Likelihood of accidental click on adult content:
• Click on adult content is short and the action is immediately reverted by a click on a non-adult content
6 to 7 8 to 9 10 to 12 Adults1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
Rat
io r
elat
ive
to a
du
lts
Sessions characteristics (I)• Shorter sessions in young users• Jump to adulthood also occurs in the group of users from 19 to 25
6-7 10-12 13-15 19-25 adults3.53.73.94.14.34.54.74.95.15.35.5
Avg
nu
mb
er o
f ac
tio
ns
Sessions characteristics (II)• Query refinding
c
q
q’
q
What do refinding queries indicate?
Sessions characteristics (III)• Click refinding
q
c
c’
c
Sessions characteristics (IV)
6-7 10-12 13-15 19-25 Adults0.1
0.15
0.2
0.25
0.3
0.35
Query ref.Click ref.
Avg
ref
. p
er s
essi
on
Shorter sessions?
Tracing children development on the web: Outline
1. What do children search for?
2. What entities are children interested in?
3. Does the reading level of the clicks varies across ages and education?
Classifying queries into topics
“sigir 2011”?
computers_and_internet/programming_and_development
computers_and_internet/programming_and_development
computers_and_internet/programming_and_development
computers_and_internet/programming_and_development
computers_and_internet/programming_and_development
computers_and_internet/programming_and_development
Classifying queries into topics
What do children search for?
6 to 7 10 to 12 13 to 15 19 to 25 adults0.0
0.1
0.2
GamesEntertainment/musicAdult contentComputers & InternetNews Entertainment/tvEducation
Fra
ctio
n
• Children and teenager groups have few dominant topics• Adults have more diverse query topics • Also due to smaller vocabulary
Gender differences (I)• Topic distribution per each group and gender• 1-Norm to quantify gender differences• Example for age group 10-12• ||
Which topic is most responsible for gender differences?
Gender differences (II)
6-7 10-12 13-15 Adults0.2
0.25
0.3
0.35
0.4
0.45
Avg gender differenceAvg gender difference (without adult content)
Avg
gen
der
dif
fere
nce
What entities are children interested in?
• Queries mapped to Wikipedia entities using site search on wikipedia.org/wiki
Query Entity
facebook, facebook login en.wikipedia.org/wiki/Facebook
back to school clothes, london schol uniforms
en.wikipedia.org/wiki/School_uniform
Hummus recipe, ideal protein en.wikipedia.org/wiki/Hummus
How to map web queries to Wikipedia pages?
What entities are children interested in? (10-12)
What entities are adults interested in? (40+)
What entities are children interested in?
• Greater used of child oriented entities at young ages
6-7 8-9 10-12 19-25 Adults0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
Fra
ctio
n
Does the reading level of the clicks varies across ages?• Based on Google reading level classification
• 70% (kids) vs 50% (adults) of clicks classified as basic
Does the reading level of the clicks vary across ages? (II)• Reading level also varies according to education level
• Education level of adults according to US census
8-9 10-12 13-15 Adults0%
10%
20%
30%
40%
50%
60%
70%
80%
Basic (Low-edu)Basic(high-edu)F
ract
ion
CIKM 2011. Glasgow, 26 of October
Gender: MaleBirth year: 1978ZIP code: 95054
cheap holidays
Expected income: $ 31k
Expected education: 45% BA
Race distribution: 38% w, 47% A
Label (Q,D) with $31k, 45%BA, ...
Q
D
US Census Datafactfinder.census.gov
Getting demographics from US census
Conclusions• Clear behavioral differences between children and adults
• Although not clean between teenagers and children
• Sudden jump to adulthood from 19 to 25 years old
• Stronger position click biased for children, including ads
• Assistance of question queries
• Understanding concerns expressed in their queries
THANK YOU FOR YOUR ATTENTION
top related