the power of social media
DESCRIPTION
La presentazione di Ricardo Baeza-Yates (Yahoo! Research Barcelona, Spain & Santiago, Chile) al workshop La Memoria al Tempo di InternetTRANSCRIPT
The Power
Ricardo Baeza-Yates
VP, Yahoo! Research
Barcelona, Spain & Santiago, Chile
Social Media
of
Today is theMemory
of Tomorrow
Remember!
3
Yahoo! Research
Agenda
� The Internet and the Web today
� Web 2.0 and Social Media
� Example: Social Search
� Yahoo! Research
� The Wisdom of the Crowds
� The Future
Internet and the Web
6
Yahoo! Research
Internet and the Web Today
� Between 1 and 2.5 billion people connected
– 5 billion estimated for 2015
� 1.8 billion mobile phones today
– 500 million expected to have mobile broadband in 2010
� Internet traffic has increased 20 times in last 5 years
� Today there are more than 185 million Web servers
– 50% Apache, 34% Windows
� The Web is in practice unbounded
– Dynamic pages are unbounded
– Static pages are over 12 billion?
7
Yahoo! Research
Trends
• Web 2.0, social networks
– Fragmentation of content ownership
– Fragmentation of the access (age, topic, etc.)
– Fragmentation of the right to access
• Increase of the Semantic Web
– RDF, microformats, metadata in general
• Increase of Internet advertising associated to search/content
Yahoo! Research
Advertising
2011
2007
2012
2011
USA
9
Yahoo! Research
Advertising and the Web 2.0
� The power of the mouth to mouth
� The power of the influential bloggers
� Viral Marketing
– Positive (Dove)
– Negative (HSBC)
� Presence in virtual(?) worlds (Second Life)
11
Yahoo! Research
Yahoo! Scale (2007)
24 languages, 20 countries
� > 4 billion page views per day (largest in the world)
� > 500 million unique users each month (half the Internet users!)
� > 250 million mail users (1 million new accounts a day)� 95 million groups members � 7 million moderators� 4 billion music videos streamed in 2005
� 20 Pb of storage (20M Gb) – US Library of congress every day (28M books, 20TB)
� 15 Tb of data processed per day
� 7 billion song ratings
� 2 billion photos stored
� 2 billion Mail+Messenger sent per day
Social Media
13
Yahoo! Research
New Trends
14
Yahoo! Research
The Web: A Play in Three Acts
“ O u r”
W e b
“ My ”
We b
“ T h e ”
W e b
Public
Personal
Social
15
Yahoo! Research
Web 2.0: Ingredients
Reviews
RSS
PhotosVideo
Blogs
Bookm arks
Playlists
Audio
Podcasts
IM
TagsVoIP
APIs
Groups
16
Yahoo! Research
Some Social Networks
� Blogs
– Directed collaborative topical discussions
� Instant messenger
– Buddy list
� Yahoo! Groups
– Topically focused communities
� MySpace, Facebook, Friendster, Orkut
– Friendship network
� Del.icio.us
– Collaborative bookmarking
� Flickr, You Tube
– Photo/video sharing and tagging
� Yahoo! Answers
– People answering people
Yahoo! Research
Web 2.0 in Yahoo!
• Yahoo! Groups 8 million, 1 of each 10 members
• Del.icio.us 2 million users
• Flickr 1 million pictures per day
• Yahoo! Respuestas 100M users, 150M answers
• Messenger 85M unique users
Sit ios sociales tuvieron 115M visitantes únicos, 56M “ m enores de 35” .
(datos del 2007)
18
Yahoo! Research
Why do people come online?
� To communicate
� To be informed
� To be entertained
� Increasingly… to be part of new forms of participation,
belonging and sharing
� To be part of social media
– also referred as Social Networks
20
Yahoo! Research
“One-way” ContentFilm Clips
Competition
Critics
Picture Gallery
Community
Content
User’s photos
User’s reviews
User knowledge
22
Yahoo! Research
S o c ia l
Ne t w o rks
Ma in ly y o u n g
p e o p le (1 3 -2 5 )
Mo b ile u s e
25
Yahoo! Research
Who are they?
Ag e % Re p re s e n t a t iv e in t e re s t s
26
Yahoo! Research
What makes Flickr special?
1. User Generated Content
Content not licensed from providers such as Corbis or Getty, but rather
contributed by users.
2. User Organized Content
Content is tagged, described, organized, discovered, etc. not by “editors” but
by the users themselves.
3. User Distributed Content
Flickr achieved distribution across the internet, not through “business deals”
per se, but rather through the Flickr community which distributed Flickr
content on 3rd-party blogs.
4. User Developed Functionality
Flickr exposed APIs (PHP, Perl, etc.) that allowed the community of
developers to build against the Flickr platform.
Entire ecosystem created by less than ten employees…
aided by millions in the Flickr community.
27
Yahoo! Research
Visualizing Tags: Tag Cloud from Flickr
29
Yahoo! Research
A Digression: Computer Vision is hard
30
Yahoo! Research
34
Yahoo! Research
38
Yahoo! Research
In t e rn e t UGC (Us e r Ge n e ra t e d Co n t e n t )
Ty p e s o f Co n t e n tHa v e y o u e x p e rie n c e d UGC?
Mu lt ip le Ch o ic e No
Ty p e s o f Co n t e n t
Ye s
As a
Pu b lis h e r
As a
Co n s u m e r
Ph o t o s ,
Im a g e s
Te x t
Vid e o s
Mu s ic
An im a t io n , Fla s h
Ot h e rs
Source: National Internet Development Agency Report in June, 2006 (South Korea)
40
Yahoo! Research
Using a syst em of user-assigned rat ings, LAUNCHcast builds up a profile of preferences for each individual. .
The m ore rat ings users m ake, t he m ore int e lligent t he radio becom es.
W e have over 6 billion rat ings
LAUNCHcast = m usic t hat list ens t o you
Users can t hen share t he ir cust om radio st at ion w it h fr iends t hrough Yahoo! M essengert aking a ll t he hassle out of discovering new m usic
Simple acts create value and opportunity
41
Yahoo! Research
Community Dynamics
1 creators
10 synthesizers
100 consumers
Next generation products will blur distinctions between
Creators, Synthesizers, and Consumers
Example: Launchcast
Every act of consumption is an implicit act of production
that requires no incremental effort…
Listening itself implicitly creates a radio station…
42
Yahoo! Research
Social Process
�Millions of users of Flickr share and tag each others’ photographs (why???)
�Fernando Flores: Blogs
– Look into the future
– Warning
– Commotion
– Institution
� Individual or collaborative
– Community newspaper: www.elmorrocotudo.cl
�Power law distribution
Social Search
44
Yahoo! Research
The Knowledge Challenge
Challenge � Enabling users to share knowledge with their community to create a
better search experience
Number of Results
Vacation Chile 26,800,000
“Everything Ricardo knows about Chile” 0
Exam pleQuery: Vacat ion Chile
Query: “ Everything Ricardo knows about Chile”
45
Yahoo! Research
Subjective Queries
The kinds of queries that rely on domain expertise…
� “Do you know a reputable plumber in Southampton?”� “Where is the cool nightlife in Trento?”� “What political blogs do you think I’d enjoy reading?”� “Where can I buy a cool pair of shoes?”
These kinds of queries are ill-served by today’s search
engines, but are ironically the most valuable (i.e.
transactional queries.)
How do we capture the people’s experience?
48
Yahoo! Research
Social Powered Search: Yahoo! Answers
� Democratize process of “voting”
(whether explicit or implicit)� Move out of the purview of webmasters and hand
control back to users� Allow dynamic assignment to various authorities of
trust, new degree of freedom
“Better Search Through People”
49
Yahoo! Research
Challenges in Social Search
�How do we use UGC for better search?�What’s the ratings and reputation system?�How do you cope with (social) spam?�What are the incentive mechanisms
�The bigger challenge: Where else can you
leverage the power of the people?
Yahoo! Research
51
Yahoo! Research
Agenda
� European search vision
� Knowledge - the next challenge
� People power
� Making knowledge pay
Leader board
Poorly formed questions
Yahoo! Research
P. Jurczyk, E. Agichtein: “Discovering authorities in Q.A. communities by using link
analysis” CIKM'07
Askers
Answerers
53
Yahoo! Research
No definitive
answer
Unverifiable
answer
Community consensus
54
Yahoo! Research
What are the Problems?
�Which questions are legitimate?
�What is the incentive system?
�How do we validate answers?
�What is the role of the community?
�What is the reputation system?
57
Yahoo! Research
What are the challenges?
� Community of users
– Social system
� Incentives and reputations
– Economic system
� Poorly phrased, “gramatically” limited queries
– Language analysis
� Improving user experience from past data
– Data mining
58
Yahoo! Research
What are the sciences?
� Information retrieval & language processing
�Microeconomics
�Data Mining
�Sociology and human-computer interaction
�Community networks
Duncan Watts
Six Degrees of Separation
The Wisdom of the Crowds
61
Yahoo! Research
� The Wisdom of Crowds
- James Surowiecki - 2004
– “Under the right circumstances, groups are remarkably
intelligent”
• Importance of diversity, independence and decentralization
– “large groups of people are smarter than an elite
few, no matter how brilliant—they are better at
solving problems, fostering innovation, coming to
wise decisions, even predicting the future”.
• How to deploy this in the next generation of social search and
media services?
– SEMEDIA video retrieval EU Project
(with BBC, Glasgow U., Smoke & Mirrors, Joaneeum & UPF)
The Rationale behind Web Mining
63
Yahoo! Research
64
Yahoo! Research
Anchor Text
� The wisdom of the crowds can be used to search
� The principle is not new – anchor text is used in
“standard” search: when indexing a document D, include
anchor text from links pointing to D
www.ib m .com
Arm on k, NY-b a s e d com p u te r
g ia n t IBM a n n ou n ce d tod a y
Joe ’s com p u te r h a rd wa re lin ks
Com p a q HP IBM
Big Blu e tod a y a n n ou n ce d
re cord p rofit s fo r th e q u a rte r
Yahoo! Research
Chris Anderson: “The Long Tail”. Hyperion, 2006.
Frequency
Quality
Traditional
publishing
User-
generated
Quality and Frequency
Yahoo! Research
Chris Anderson: “The Long Tail”. Hyperion, 2006.
Quantity
Quality
User-
generated
Traditional
publishing
Quality and Quantity
Yahoo! ResearchChris Martin from Coldplay in The Rolling Stone, Fortieth Aniversary, July 2007.
Quantity
Quality
“ We t h in k it 's a ll
a b o u t q u a lit y o v e r
q u a n t it y n o w ,
b e c a u s e t h e re 's s o
m u c h n o is e
e v e ry w h e re , t h e re 's
n o p o in t in p u t t in g
a n y t h in g o u t u n le s s
it 's fu c kin g
a m a z in g . ”
Yahoo! Research
Quantity
Quality
User-
generated
Traditional
publishing
The Push for Quality
?
Yahoo! Research
Yahoo! Research
¼ questions want an
opinion: informal polls
¾ questions seek for
information or advice
Yahoo! ResearchQ. Su, D. Pavlov, J.-H. Chow, W. C. Baker. “Internet-scale collection of
human-reviewed data”.WWW'07.
17%-45% of
answers
were correct
65%-90% of
questions had
at least one
correct answer
Yahoo! Research
There are top contributors ...
... but they don't have all the answers
Yahoo! Research
High Medium Low
High 41% 15% 8%
Medium 53% 76% 74%
Low 6% 9% 18%
100% 100% 100%
Answer
quality
Question quality
Question quality and answer quality are not independent
and can be predicted reasonable well (Castillo et al, 2008)
What about real quality?
77
Yahoo! Research
Influence Leadership (Bopal et al, 2008)
� Influence of social graph in particular actions
– Social graph: Yahoo! Instant Messenger
– Actions log: Yahoo! Movies
• Action = user u rated movie m at time t
– joined through common users identifiers
�Started from Yahoo! Instant Messenger subgraph of “most active” users (110M nodes) and 21M ratings from Yahoo! Movies.
– Ended with 217.5K nodes, 221.4K edges and 1.8M ratings.
78
Yahoo! Research
Leaders vs. Tribe leaders
79
Yahoo! Research
The Wisdom of Crowds
� Crucial for Search Ranking
� Text content: Web Writers
– not only for the Web!
� Links: Web Publishers
� Annotations: Web 2.0 Users
– Tags, bookmarks, comments, ratings, etc.
� Queries: All Web Users!
– Queries and actions
80
Yahoo! Research
Query Intention (Broder, 2000)
•~40% Navigational
•~35% Transactional
~25% Informational
85
Yahoo! Research
Mining Queries for ...
�Improved Web Search
�Ranking
�Query recommensations
�User Driven Design
– Information Scent
– The Web Site that the Users Want
– The Web Site that You should Have
– Improve content & structure
�Bootstrap of pseudo-semantic resources
Yahoo! Research
Query Mining: Relating Similar Queries
Yahoo! Research
Implicit Folksonomy
Yahoo! Research
Implicit Knowledge (Baeza-Yates et al, 2007)
Yahoo! Research
Experimental Evaluation
Yahoo! Research
Some Open Issues
• Implicit social network
– Any fundamental similarities?
• How to evaluate with partial knowledge?
– Data volume amplifies the problem
• User aggregation vs. personalization– Optimize common tasks: help more people– Move away from privacy issues
Epilogue
92
Yahoo! Research
The Future
�The Web is scientifically young
� It is intellectually diverse
– The human element
– The social element
�The technology mirrors the economic,
legal and sociological reality
93
Yahoo! Research
Mirror of the Society
94
Yahoo! Research
Exports/Imports vs. Domain Links
Baeza-Yates & Castillo, WWW2006Web Spam Challenge:• UK Web Collection• Training set with thousands of
judged sites
96
Yahoo! ResearchWhat’s next? Fourth generation: From Information Retrieval to Information Supply
Explicit dem and for inform at ion driven by a user query
Increase use of context
Act ive inform at ion supply driven by user act ivity and context
97
Yahoo! Research
Web 3.0?
� We are at Web 2.0 beta
� People wants to get tasks done
– Where I do go for a original holiday with 1,000
euros?
� Take in account the context of the task
I want to book a vacation in Tuscany.Start Finish
Yahoo! Experience