cpg iitm mar_29_2012_final
DESCRIPTION
TRANSCRIPT
Cloud Platform Group (CPG)
Presentation at IIT Chennai
March 29, 2012
Agenda
Yahoo! Presentation, Confidential 2
CPG Mission and Value Proposition
Fit within the Yahoo Stack
Drill-down: User Generated Content (UGC)
Drill-down: User Location
Drill-down: Web Extractions
Drill-down: Trending
Q & A
04/11/20233Yahoo! Presentation, Confidential
Create a global, scalable platform built on science that enables rapid innovation and
delivery of personalized, monetizable experiences across devices.
Cloud Platform Group Mission
Yahoo! Presentation, Confidential
Agility with Stability1
4
LEGO powered by Content Agility
CPG Value Proposition
CPG Value Proposition
Yahoo! Presentation, Confidential
Science at Scale 2
5
CPG powers all of Yahoo! today
Yahoo! Presentation, Confidential
DISPLAY ADS powered by Hadoop
MAILpowered by Edge,
Storage, Ranking, & Hadoop
LEGO (YPP)powered by Content Agility
3x improvement in accuracy of ad placements and our ability to forecast
supply over legacy systems
40% faster download time, 300K+ spam mails blocked/ sec
Reduce time to launch new sites from quarters to weeks
LIVESTANDpowered by Mobile &
Cocktails Presentation Services
SOCIAL CHROMEpowered by Social Platform
FRONT PAGEpowered by CORE
Seamlessly distribute content across devices in an experience that is
elegant and personalized
Over 22M net cumulative installs since launch, Integrated into News,
Games, Movies, OMG, TV
Increased CTR by +263% for Today Module by serving right content to the
right user (over pre-CORE)
ILLUSTRATIVE SAMPLE
6
RESULTS
UGC platforms are used by over 200 Yahoo! properties with over 650M UGC actions per year
SOLUTION
UGC Cloud is a scalable, real-time platform that lets users to express themselves, resulting in increased user engagement and a vibrant Yahoo! community
USE CASE
Increase content stickiness and user retention; drive repeat usage across the Yahoo! network
Comments
Polls
Message Boards
Ratings & Reviews
40M user ratings
per month
1.2M poll votes per
month
1/3 of US Finance
traffic from MB
6M comments per month
Unified, scalable platform that enables self expression and gets users to connect over content
User Generated Content
User Generated Content – Applications
Improving Comment Quality
3 pronged approach – Machine; Human and Community Moderation300M analyzed, 70 M blocked with machine moderationReactive Volume (cost of reacting to abuse) avoided
Sentiment Slider
http://news.yahoo.com/open-business-free-agency-set-begin-211828913--spt.html
User Generated Content – Social Poll
User Generated Content – In the WorksTopical Organization of Comments Social Conversations
RESULTSProperties can launch location aware services with faster time to market on a single platform
237M users with 550M locations
User LocationStore, manage & share user locations and locations of interest to create deeply personal digital experiences
USE CASEUser location information was siloed, inconsistent, and not shareable across properties and users
SOLUTION
Create a single data store of user locations, shareable across Yahoo! properties and advertising systems
Management, Authorization, and Control
LOCDROPNormalized, Geo-Aware User LocationsCentralized, Consistent, and Contextual
Accurate, Relevant, Valuable ExperiencesIncrease Content, Targeting and Revenues
Read locations to drive local news, events and deals
YAHOO! CONFIDENTIAL
Contextual Locations for Yahoo News
User cannot find a place and decides to create a new location to check-in
User is asked for permission to detect current location from device
Users location is pointed on a map. This will be used to get the lat/long of the created place
User enters a location “Russian Tea Room”
A new location is stored in UGP platform and the user is checked-in to this location
User has an option to curate the locations created by other users
UGP platform enables algorithmic curation
User Generated Places: Enable users to submit (and curate) a location if one does not exist
Android Messenger Use Case
Yahoo! Presentation, Confidential 15
KAFE: Technologies*
Manual SDE Rules Large Aggregator Websites
(e.g. amazon)
Editorial Effort
Precision
Dapper Small Websites
(e.g. community sites) Behind the Form sites
(Deep Web)
PSOX (Y! Labs) Unsupervised extractions
from large number of websites
Goldrush, Dish-a-wish, Restaurant Photos
Web Content
Bing WCC YST HVC
KAFE
S.D.E Dapper PSOX
W.O.O PropertiesLegacy
Backend
Live Pages (LLFS)
* Supports Multiple Sources of Data and Multiple Technologies
Answers Not Links Dappfactory
16
Dappfactory used by DD Builder to create over 3000+ DD experiences !
Answers Not Links Dappfactory
17
Dappfactory used by DD Builder to create over 3000+ DD experiences !
Answers Not LinksS-DEKAFE XSL Rules
18
Creating Vertical Search Experiences for Recipes
Answers Not Links PSOX-Unsupervised Extractions
19
Y! Dish-a-WishCraving for Hummus in Sunnyvale ?
Y! GoldrushLooking for where to buy Amana dishwashers ?
Enhanced Listings Dappfactory
20
Before:After:
• Taken from Roadmap deck for Y! Local by Erin Johns• Data being provided to Y! Local, Front End revamp on Local Roadmap
Local Events for N.I.L.E Dappfactory
21
As of Feb ‘12, over 22,000 events for 250 US cities have been extracted using Dappfactory
Extracted using Dappfactory
Yahoo! Presentation, Confidential 22
Data Extraction – Challenges
Technology whitespace Head – Fully manual scales fine. Gives high precision. Torso – Mostly use human assisted learning. Drop in recall and
precision, but acceptable for production use. Tail content – Only option is ML/no-human-in-loop models.
Recall and Precision need lot of improvement. Semantic Web initiatives – Web of Objects
Linked Open Data Format (RDF-a, OWL, Sparql) Lod Cloud – Few Thousand data sets, 10s of billions of
interlinked facts. Confhopper – Sample/Demo application
Unstructured Corpus – NLP Extraction Systems /Engineering Challenges – Low Latency processing,
tokenization/parsing – Intl support Sciences Challenges – polysemy, synonymy,
aboutness/concepts, sentiment analysis. CAP – Contextual analysis platform
TimeSense – usecases/business value proposition
23
Plumbing, Monetization, & Games
US FP Trending Now local pool for a given DMA powered by TS –6% CTR lift attributed to local terms
Search Suggestions in SD box – Timesense powered suggestions triggered for 6% of all gossip requests
Trending searches in Left Rail on Yahoo US SRP – triggered for ~6% of all user queries
TW FP Trending Now automated by Timesense API
TimeSense
24
Plumbing, Monetization, & Games
In Bucket
AUTOMATED trending module on shopping.yahoo.com : First module with no editorial intervention, vertically categorized trends, fast refresh and rotating terms
Soon to Launch
HK , TW and KR Automated trends modules on FP, Mail, OMG, news etc
Editorial Power users of Timesense • Search Forecasting Editorial Team – updates sent twice a day to 500+ subscribers• FP Trending Now team• Regional Content programming , search editorial and SEO teams : US ,UK, HK, TW, IN [Q1 launch – all
INTLs]Upcoming
• Trending Now Syndication for Yahoo Hosted Search partners – via BOSS• Trending Image experience• Trending Now 2.0 automation expansion
Yahoo! Presentation, Confidential 25
Trending topic detection – Challenges
Systems Challenges• Low latency requirement• GBs of data analyzed from multiple data sources every 5
minutes• Scalability – different verticals, segmented models.• High Availability requirement
Sciences Challenges Algorithmic improvements for near real time detection without
precision loss Short Phrase Categorization Deduping/Clustering – intent detection Segmentation/Smoothing – Age/gender/Behavioral Tracking
Categories/Geography – signal sparsity with fine grained segmentation.