RawSugar
What is Web 2.0Tim O’Reilly:Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications are those that make the
most of the intrinsic advantages of that platform: delivering software as a continually-updated service that gets better the more people use it, consuming and remixing data from multiple sources, including individual users, while providing their own data and services in a form that allows remixing by others, creating network effects through an "architecture of participation," and going beyond the page metaphor of Web 1.0 to deliver rich user experiences.
http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
RawSugar
What is Web 2.0?Social Web – “Wisdom of Crowds”
– Users are publishers– Network effect – SHARE - – e.g: blogger.com, flickr, youtube, del.icio.us, tadalist.com, i4giveu.com,
Technology:– Software delivery: Hours, Users are testers– AJAX (more later)– E.g.: 30Boxes, Writely, Google Calendar
Business model:– Free for users, Paid Advertisements– Share revenues with users– E.g., Google adsense, simpy, RawSugar– Pageviews => $$$$
RawSugar
Social Web – Wisdom of Crowds
(1) diversity of opinion
(2) independence of members from one another
(3) decentralization and
(4) a good method for aggregating opinions
Show: Digg amazon.com Yahoo! Movies
RawSugar
Before Tagging: Classification
• Too hard to classify• Too expensive• Not scalable
• Yahoo! directory• Dmoz• Semantic Web
RawSugar
Categorization is hard!!
Multiple concepts activated
Choose ONE of the activated concepts.
Categorize it!
Object worth
remembering (article, image…)
Analysis-Paralysis!
From Rashmi Sinha
RawSugar
Tagging is simpler
Multiple concepts
are activated
Tagit!
Note all concepts
Object worth
remembering (article, image…)
From Rashmi Sinha
RawSugar
Tagging is a reality
• Bookmarkers tag:– Delicious, Rawsugar, Shadows, Simpy, Blinklist, …
• Bloggers tag:– 27 million blogs, doubles every 6 months– 1/3rd of blog posts now use tags (or categories)
• Many more:– BBC – news site– News - Digg– YouTube - Video– Flickr, photo publishing and tagging – Enterprise? Museums? Cell phones?
Most user generated content is tagged !
RawSugar
What Tagging is NOT
– NOT: Generous and altruistic people classifying the Web for the sake of the community
– NOT: Smart software automatically classifying Web pages and tagging them
– NOT: A collaborative way to classify the web into a growing giant ontology (folksonomy)
RawSugar
So why do People Tag?
– Recovery/sharing of personal information:• Bookmarks• Photos• Videos, etc.
– Increased traffic and findability• Bloggers
– Social reward – Advertisement $
Tagging brings value to the tagger
RawSugar
Why is Tagging successful?Semantic Web
Tagging
Who classifies
Publishers or Librarians
Everybody, consumers
Controlled vocabulary
Yes No
Imposed structure
Yes No
Classification cost
High Free
Recovery NA Yes
Searchability Low Medium
Navigation High Medium
•Tagging is free•Tagging is easy•Tagging brings value
[Marlow, Naaman, Boyd & Davis 2006]
RawSugar
RawSugar
• Covers the last mile of search• Provides Guided Search on tagged pages• Publish guided search
– Provide guided search to your site, Blog– Get more traffic – Receive advertising revenues!
Search and Explore – Navigate by topics, people, directories– Find Experts
RawSugar
What’s Great What’s not Great ?
• Great: – You know what you’re looking for:
• “Zibibbo restaurant” -
• Not so great:– You’re hungry !– You want to browse - Discover information, explore.– You want to know what is popular (“restaurants,
digital camera, Java Tutorial, Free Games, etc.”)
RawSugar
State of the art:The Last Mile of Search
• 83% unhappy with search results (WSJ survey)– Most searches point to a list of content websites and directories– Navigation of these sites is cumbersome and tedious
• Google 2 steps approach:– Search “restaurants”– While (true) { explore guide; }– Change the query and Repeat
“The last mile of search” Examples:Digital CameraPalo Alto bikeDaily Kos Sprol dot Com
RawSugar
Where is the last mile?
Google stops here:
Human Knowledge:• Small and mid-size websites and blogs • Content is organized by human and manually:
– Categorization
– recommendations • Poor search and navigation• Each directory is an island of information and
does not connect to related directories
RawSugar
What’s Missing?Browsing with Facets
“Easy to discover information without prior knowledge of collection contents “
Faceted Search Paradigm
Not new:• Library systems: “American history”, “Shakespeare”, etc.• Search Engines: Endeca, Shopping.com, Yahoo! Directories, Dmoz, etc.• Google/MSN/Yahoo! Local Search - Browse by Location -• Current uses: E-Commerce
Problems:• Maintained by humans – Expensive• Rely on a world order – Brittle • Facets use a controlled vocabulary – Not easy to define.
=> Not Scalable
RawSugar
Problem 1:Searching the TagSpace
Tags: Ikura, Uni, Ebi, Sushi, Nigiri, Japanese food, lunch in Tokyo, Ezobafun-uni, Kitamurashiuni, Murasakiuni, Akazaebi, Tenagaebi, etc.
How wouldYou tag this?
How wouldYou searchFor it?
RawSugar
RawSugar Tag Hierarchy
• Key idea: Some users (4%) define tag hierarchies – (food>sushi, european>spanish, …)
• We mine this tag space to learn simple tag-relations (ISA relations and RELATED) using statistics.
• At search time: We apply this learned knowledge to group tags from results.
RawSugar
RawSugar –Guided Search Combining Hierarchy Fragments
europe
UK
Scotland
Edinburgh
Spain
Italy
food
vegetarian
Sushi
food
cooking
recipes
Asian
Chinese
Thai
Southwest
California
Bay Area
San Francisco
Texas
User 1
User 2
User 3
User4
User 5
RawSugar
RawSugar: Mining and Clustering
• Related tags: Tags that are related – (collocations, synonymy, antinomy, ISA, HASA, …)
• Related pages: Pages tagged similarly
• Related people: People with similar interests
Tags
Pages
People
RawSugar TagSpace
sailing
Cyclin
g group
RawSugar
Related workRashmi Sinha: “Tag Sorting: Another tool in an information architect's toolbox” http://www.rashmisinha.com/archives/05_02/tag-sorting.html
Emanuele Quintarelli: “Hierarchical taxonomies from flat tag spaces” http://www.infospaces.it/wordpress/topics/information-architecture/91
Paul Heyman (Stanford): “Tag Hierarchies” http://i.stanford.edu/~heymann/taghierarchy.html
Brooks, Montanez, University of San Francisco: “Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering ” http://www.cs.usfca.edu/~brooks/papers/brooks-montanez-www06.pdf
Siderean fac.etio.us: “Faceted search on delicious tags” http://www.siderean.com/delicious/facetious.jsp
Marti Hearst: “Clustering vs. Faceted Search” http://bailando.sims.berkeley.edu/papers/cacm06.pdf
And more …
RawSugar
What should we do?Smart Backend – Easy Tagging“Tag Relations improve searchability and exploration.”
Similar tags:• Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged,• Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming,
Tag groups or subtags:•Location -> san francisco, london, new york, etc.•Food -> sushi, sashimi, pizza, etc.•Programming -> html, java, css, etc.
Goal : Discover them by Mining the tag space
RawSugar
What should we do?Smart Backend – Friendly Frontend
• Backend should not dictate Frontend (Patrick Schmitz, Berkeley/Yahoo!)
•Smart processing is done by the backend under the hood.
• Tagging should be as effortless as possible, assisted but not automatic. Fight Analysis-Paralysis (Rashmi Sinha)
• Systems should be built to incite people to tag. Bring Value to the tagger
RawSugar
What is Missing? Tag relations
“Tag Relations improve searchability and exploration.”
Similar tags:• Spelling and morphology: macos<->mac_os<->mac os; tagging <-> tags <->tagged,• Synonyms: macos <-> tiger; films <-> movies; new york <-> nyc; • Related: cooking <-> recipes, software development <-> programming,
Tag groups or subtags:•Location -> san francisco, london, new york, etc.•Food -> sushi, sashimi, pizza, etc.•Programming -> html, java, css, etc.
Goal : Discover them by Mining the tag space
RawSugar
Faceted Search on TagSpaceChallenges
• Faceted search paradigm on the TagSpace:– Not a controlled environment– Large scale (1 facet for every 5 documents)– Lots of noise: search, search engine, google,
search_engines, searchengine, searchengines, search_engine, engine, web, internet, tools, reference, news, information, portal, engines, searching, tech, buscadores, tool …
RawSugar
Faceted Search on TagSpaceChallenges
How to rank facets? What facets should be displayed? How to show them?
• Performance: Reduce the search space - • Refining facets: Tags that allow the user to
refine (reduce) the search (depth)• Related facets: Tags that allow the user to
explore (breadth)• Group facets: Cluster tags that are related -
RawSugar
Searching the TagSpace with RawSugar: Suggestion Engine
Goals:- Ease of tagging- Cohesiveness of our tagspace. Attempts to have our users re-use the same tags instead of creating
infinite variations. (search engines, searchengine, search, search tools, search sites, etc.)
Key Ideas :- Always suggest first the most popular tags- Use tag hierarchy and tag context to find the most relevant tags.- Use information on the user and the other users to refine the suggestions.
RawSugar
What’s Missing?Human Meta Knowledge
Is it good or no? What is it about? Is it popular?
Not new:• Guides: paloaltoonline.com, expedia.com, etc..• Review Sites - Zagat.com, dpreview.com, etc.• Shopping sites – shopping.com, Amazon,
Problems:• Limited to small environments or verticals (digital camera,
restaurants, etc.)• Not real search across sites -• Manpower – hiring, training, etc.
=> Not Scalable