overview of research at hp labs india
DESCRIPTION
Overview of research at HP Labs India. Bristol. Palo Alto. St. Petersburg. Haifa. Beijing. Bangalore. Tokyo. HP Labs around the world. 7 locations. 600 researchers in 23 labs. 20-30 large projects in 8 high-impact areas. High-Impact Research Areas - PowerPoint PPT PresentationTRANSCRIPT
© 2006 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice
Overview of research at HP Labs India
HP Labs around the world
HaifaHaifa
BangaloreBangalore
BeijingBeijingBeijingBeijing
St. PetersburgSt. PetersburgBristolBristol
Palo AltoPalo Alto
TokyoTokyo
7 locations 600 researchers in 23 labs
20-30 large projects in 8 high-impact areas
Cloud
High-Impact Research AreasThe next technology challenges and opportunities
Information Management
Digital Commercial Print
Sustainability
Immersive InteractionAnalytic
s
Intelligent Infrastructure
Content Transformation
HP Labs’ research contribution: Breakthrough technology to accelerate the transformation to digital commercial printing
Printing ProcessCommercial-grade throughput, cost and quality
ColorSelf-calibration, intuitive rendering
Digital Commercial Print
Data PathEfficient processing of massive data streams
Job CreationAutomated content generation
End State: Flexible, customized, on-demand printing that replaces the traditional distribution of mass-produced materials
Content Transformation
End State: Complete convergence of physical and digital information
HP Labs’ research contribution: Technologies to transfer content seamlessly from paper to digital and access digital content wherever paper is used today
Content ManagementIntuitive, personalized organization; Intelligent content extraction; Live, interactive documents
Displays/MaterialsUnbreakable, conformable, ultra-thin and lightweight; Digital with the look and feel of paper
Immersive Interaction
End state: Intuitive human interaction through and with technology
HP Labs’ research contribution: Radically simplify the user experience to make technology more useful, intuitive and pervasive
Intuitive Interfaces Natural, multi-modal, computer-human interactions
Seamless Collaboration Immersive multimedia communication – anytime, anywhere – with no physical barriers
Contextual Services Delivering “the right thing at the right time”; Personal paradigms to simplify Web interaction
Information Management
HP Labs’ research contribution: Redefine the twin tasks of taming and exploiting information to revolutionize enterprise decision makingManagementSuperior analysis, extraction and delivery of massive enterprise content
IntelligenceCapabilities to transform massive-scale, real-time data into transactional, operational business intelligence
End State: The vast universe of enterprise information transformed into immediate, business-relevant insight
Analytics
End state: Application of mathematic and scientific methodologies create better run businesses
HP Labs’ research contribution: Drive secure, informed, highly effective decision making
Software Enhance automation and business processes
Services Analytics that address and transform operational efficiency and security
Solutions Predictive customer behavior; Individual profile learning
End state: Everything-as-a-Service: Billions of users, millions of services, thousands of service providers, millions of servers, exabytes of data, terabytes of traffic
HP Labs’ research contribution: Develop an integrated cloud stack, from infrastructure to services
Cloud
InfrastructureEnterprise-grade security, capacity and management
ServicesDisrupt traditional industries and offer rich, dynamic experiences
Intelligent Infrastructure
End state: Capture more value via dramatic computing performance and cost improvements
HP Labs’ research contribution: Radical, new approaches for collecting, storing and transmitting data to feed the exascale data center
Data CenterCost and power efficient; Manageable, reliable; Easily programmable
Intelligent StorageCloud-scale, dynamic enterprise-grade
NetworksProgrammable, scalable, energy-efficient
NanotechnologyMemristors, Sensors, Photonic Interconnect
Sustainability
End state: An IT industry with a light carbon footprint that drives the reduction of carbon emissions throughout the global economy
HP Labs’ research contribution: Displace conventional supply chains with sustainable IT ecosystems
Data Centers Integrated, end-to-end management of compute, power & cooling resources from cradle to cradle
Tools & Methodologies Reengineer existing value chains using IT to lower environmental footprint
12 April 21, 2023
2008 HP Labs Innovation Research Awards41 awards, 34 universities,14 countries
12 April 21, 2023
• Stanford University• University of California,
Berkeley• University of California, Davis• University of California, San
Diego• University of California, Santa
Barbara• University of Southern
California
• University of Toronto• Carnegie Mellon University• Massachusetts Institute of
Technology• State University of New
York at Buffalo• Rochester Institute of
Technology
• University of Illinois at Urbana-Champaign
• University of Michigan• University of Wisconsin-
Madison• Purdue University• Georgia Institute of
Technology
• University of Edinburgh, Scotland• University of Bath, England• University of Leeds, England• University of Bristol, England
• Konstanz University, Germany• Technische Universitaet Muenchen, Germany• Vrije Universiteit Amsterdam, Netherlands• Universidade do Minho, Portugal
• Indian Institute of Technology, Madras, India
• Indian Institute of Technology, Bombay, India
Americas
EMEA
• Russian Academy of Sciences, Russia• University of Saint-Petersburg, Russia
• Bilkent University, Turkey
• National Institute of Informatics, Japan
• Peking University, China• Tsinghua University, China
• Nanyang Technological University, Singapore
APJ
• Technion, Israel Institute of Technology, Israel
Europe, Middle East & Africa
Asia-Pacific & Japan
Open cloud computing research test bed• A loose federation of “Centers of
Excellence” around the globe −UIUC, Singapore IDA, KIT: 3 initial CoE−HP, Intel, Yahoo: 3 initial sponsors with CoE
• Research objectives−Multi-datacenter, multi-geography, multi-
tenancy, secure, massive scale, open test bed
• Each center: 1000-4000 cores and up to PB storage −Base service: PRS (physical resource set)−Required services: Open EC2-like, S3, and
Hadoop-on-demand−Plus additional local
extensions/variants/service types
© 2006 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice
HP Labs India
Gesture-based keyboard (GKB)
Uplink Side
Downlink Side
Solid State Power Amplifier
Up converter
Modulator
Encoder
Inserter
Data from PCAV Signal
Television
Printer
PrintCast Decoder
Receiver Dish & LNBC
Uplink Dish
Set Top Box
PrintCast
Paper & IT convergence
Secure AiO
HP Labs India• Three ongoing projects
−Simplifying web consumption for the next billion (SWAN) – Remainder of this talk
−Intuitive multimodal and gestural interaction (IMAGIN)
−Paper in the digital enterprise (PRIDE)
SWAN project - Motivation
Simplifying web consumption for all
Web is useful but complex to use for non-tech-savvy people
Web has to be useful in the mobile context as well
Why is web consumption complex ?
• Each web site forces its own cognitive model on the user
− Website decides the interaction model, user has to learn it & remember it
− Different websites of the same genre impose their model
• Web requires very “low” level instructions − Information access is through query and manual filtering approach− Content adaption, e.g. translation, require a lot of technical skills
• Mobile web consumption is challenging− User’s frame of mind is different (limited attention span, distracted)− Devices are resource challenged
• Broken web experience across different access methods− experience continuity across broadband, mobile & disconnected
connectivity
State of the art
Web Widgets
Alerts Personalized web pages
Mobile environmentsMobile environments
Passive consumption
Web Simplification
Web Simplification
Personalized Web Content
Personalized Web Content
Browser Scripting
Mashups
chumby
Pipes
The Gap:The Gap: Need to Need to SimplifySimplify Personal WebPersonal Web InteractionsInteractions - especially for - especially for MobileMobile Environments Environments
Technical Goals Users to set their own preferred interaction
pattern Enabling users to easily express their own web
interaction patterns
Providing a familiar interface to all personal actions on the web
Higher level intent while interacting with services Implicit web content consumption based on higher user
intent expression, user feedback and user profile.
Understanding and translating user intent to web actions
Always responsive interactions Providing continual interaction across multiple devices
& connectivity situations
Providing ‘Responsive-Behavior’ despite disconnections
Intent Query Goal
Approach
Create simple interactions for long term and exploratory information needs
End user value: Simplify the “Intent -> Query -> Goal” cycle
User Profiles
Query expansion
Aggregation, ranking
Summarization
Google Youtube Digg/Delicious
Using User profiles to personalize services
Data Collection
User
Profile Constructor
Application
Personalized services
(Search, news, video, shopping)
User
Profile
Explicit and Implicit info
Aren’t online portals already doing this?• Online portals and search engines build
user profiles using cookies and other stored data (search keywords, web pages accessed)−However, they don’t see all the user data
−No way for users to aggregate and reuse the profiles different websites (Google, Yahoo, ..) build using their data
−Privacy is a big problem
Implicit profile construction - Prior approaches and their limitations• Word based Approach
−Use words in user documents to represent user interests
−Problems• Words appear independent of page content (“Home”,
“page”)• Polysemy and Synonymy• Large profile sizes
• DMOZ approach−Use existing ontology maintained for free−Problems
• Too large (about 6 lakh DMOZ nodes), ontology has to be drastically pruned for use
• Need to build classifiers for each DMOZ node
Our approach• Use Wikipedia as the language of profile
representation, map user documents to Wikipedia concepts−Has bias lower than DMOZ and variance lower
than words
• Build a hierarchical profile based on Wikipedia
• Tag the profile concepts as (transactional or recreational)
• Compute recency of user interests in a particular topic
Item: “Sony to slash PlayStation3 price”Term vector Representation: <sony:1>,<slash:1>, <playstation3:1>,<price:1>
Item: “Jittery Sony Knocks $100 Off PS3 Price Tag”Term vector Representation: <jittery:1>, <sony:1>, <knocks:1> <ps3:1>,<price:1>, <tag:1>
Index of Wikipedia dump
query
Sony to slash PlayStation3 price
Additional features: titles of the retrieved articles
1. PlayStation Network Platform2. PlayStation 23. Ducks demo4. PlayStation 35. PlayStation6. Ken Kutaragi7. PlayStation Portable8. Console manufacturer9. Sony Group10. Crystal Dynamics11. PlayStation 3 accessories12. …13. …
Mapping documents (web pages) to Wikipedia concepts
Term Vector vs Wikipedia profiles
Words in TF * IDF based user profile Concepts in Wikipedia Based user profile
Search
Home
Help
News
Privacy
Terms
New
Page
Use
Web
View
Results
Information
Account
Text Retrieval Conference
HTML element
Bank of America
Google search
ICICI Bank
IDBI Bank
Bank fraud
Artificial neural network
Web crawler
Web design
Debit card
Extensible Markup Language
Hewlett-Packard
Microsoft
XHTML
Demand account
Constructing the hierarchical profileAlgorithm of Xu et.al. [WWW 2007]
Wild life photography (5)
Nature photography (10)
Photography (15)
Photography (15)
Wild life photography (5)
Nature photography (10)
Support (# pages mapped to this concept)
Tagging concepts in user profiles• Two types of tags
−Whether the concept is of commercial or recreational interest
−Recency of interest• Tagging Commercial interest
−Crawl shopping site pages, map pages to concepts and label these concepts as commercial interests
• Tagging Recreational interest−Use topics in Wikipedia recreational/hobby
categories• Recency of Interest – Sigma(1/e^(today –
time page supporting topic last accessed))
Wikipedia based profile
Evaluation results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 200 400 600
Number of web pages in cache
Sta
bili
ty Stability_alpha
Stability_date
0.7
0.75
0.8
0.85
0.9
0.95
1
Support > 5 3 < Support < 5 Support < 3
Prec
isio
n
0
0.2
0.4
0.6
0.8
1
1.2
Level 1 Level 2 Level 3 Level 4 Level 5 Level 6
Pe
rce
nt
(%)
Percentage in profile
Precision
Figure 1
Figure 2 Figure 3
•Profiles are stable (fig 1)
•Profile elements with high support have high precision (fig 2)
•Profile elements at all levels of the hierarchy have similar precision (fig 3)
•Bookmarks are not a good data source for profiles
Query expansion – Personalized video• Approach
− Create three additional queries (based on terms with high TF in title, tags and description)
− Evaluating which expansion is better
• Example: Query on Youtube for “trains”
• Expansion using −Title
train+osbourne+midnight+bullet+rollin+mystery+maglev
−Description train+runaway+record+version+video+http+track
−Tags train+railroad+guitar+osbourne+railway+bullet
• Cross-lingual expansions−Baba Ramdev− Baba+ramdev+yoga+swami+prana
yam+liye+ram+disease+dev+india+dhyan
Query expansion - “Find similar” Problem – Can we construct queries to make getting “similar content” easier ?
Approach - Identify key phrases for text document, query standard search engine, rank results
•Retrieving the original documentcapture restart+ capture random+random walk+page rank+capture random walk+restart yields
retrieves Hopcroft’s talk at rank 1 in Google
Query - Ed Lazowska’s talk
Result – Hopcroft’s talk
Query expansion – “Find similar”economic growth
global development
economic history
economic governance
adam smith
good governance
economic growth process
modern technology
economic+growth+global+development+history+governance+adam+smith+process+rich+good+new+knowledge+cgd+brief+world+property+rights+productivity+labor+human+capital+getting+use+modern+technology+trade+barriers+public+goods+poor+countries+machine+natural+resources+research+intellectua
Query
Aggregating search results• Current search interfaces geared to
immediate gratification, no way to tradeoff search latency for more relevant results
• Different search engines have different coverage, no way to benefit from this
• Navigation of results requires clicking back and forth on search results−Search result snippets often misleading
Our solution• To create an
aggregated and personalized Information Retrieval (IR) system that −compiles and
consolidates the most relevant information on particular topic(s) from the web
−automatically creates a PDF document on the topic
Ranking results• Content Based Ranking (based on TF,IDF,
Document Boost, Field Boost)
• Delicious Vector Cosine Similarity
Rank (URL) = d*(CBR) + (1-d) ( DVCS)
User Interface User study results
Document summarization using Wikipedia
Index of Wikipedia content
query
Sony to slash PlayStation3 price
Additional features: titles of the retrieved articles
1. PlayStation Network Platform
2. PlayStation 23. Ducks demo4. PlayStation 35. PlayStation6. Ken Kutaragi7. PlayStation
Portable8. Console
manufacturer9. Sony Group10. Crystal
Dynamics11. PlayStation 3
accessories12. …13. …
C1 C2 C3 C4
S1
1 0 1
0
S2
0 1 1 0
S3
0 0 0 1
In degree = 2
Algorithm1
Document sentences mapped to Wikipedia concepts
Uses in degree of concept-sentence bipartite graph for sentence selection
Tested on DUC 2002 data from NIST
Would have come in 3rd in the NIST challenge
Limitations
- Controlling size of the summary
- General concepts (e.g. Sports) may win over specific concepts (e.g. Soccer)
Document summarization - Algorithm 2
Intuition : Important sentences in the document map to important concepts and vice versa
Propagate sentence importance to concepts and concept importance to sentences over multiple iterations
Future work – Size of summary, multi-document summaries, Indian language summaries
,G) f(xx tn
tn 1
Accumulate step
mn
tn
tm x y
N
1
nm
tm
tn y x
M
1
Broadcast step
Challenge 1• Better intent expression• Multi-lingual query reformulation
−Baba Ramdev−Baba+ramdev+yoga+swami+pranayam+liye+ram+disea
se+dev+india+dhyan
• Interfaces to simplify feedback for query reformulation
Challenge 2• Long standing queries• Queries spread over time
−Learning photography
−Information delivery needs to be incremental and non-repetitive
−Video retrieval
• Channels • Create Initial stickiness
• Ensure ongoing interest
−Caching – Utility models
• What are good evaluation measures for such systems ?
Challenge 3• Document summarization
−Extracting leads
−Compression versus missed information
−Cross lingual summarization