amazon cloudsearch session with elsevier: re:invent 2013
DESCRIPTION
Session SV302 from re:Invent 2013 Today's applications work across many different data assets - documents stored in Amazon S3, metadata stored in NoSQL data stores, catalogs and orders stored in relational database systems, raw files in filesystems, etc. Building a great search experience across all these disparate datasets and contexts can be daunting. Amazon CloudSearch provides simple, low-cost search, enabling your users to find the information they are looking for. In this session, we will show you how to integrate search with your application, including key areas such as data preparation, domain creation and configuration, data upload, integration of search UI, search performance and relevance tuning. We will cover search applications that are deployed for both desktop and mobile devices. Peter Simpkin from Elsevier provides a summary of their use of CloudSearch.TRANSCRIPT
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Enrich Search User Experience For Different Parts of Your Application Using Amazon CloudSearch
Jon Handler, CloudSearch Solution Architect
November 15, 2013
Agenda • Sourcing your documents • Retrieval and ranking • Search user interface • Performance and Scale
• Developer example: Peter Simpkin, Solution Architect, Elsevier
Architecting with CloudSearch
Hands-Off Operation
SEARCH INSTANCE Index Partition n
Copy 1
SEARCH INSTANCE Index Partition 2
Copy 2
SEARCH INSTANCE Index Partition n
Copy 2
SEARCH INSTANCE Index Partition 2
Copy n
SEARCH INSTANCE
Document Quantity and Size
Search Request Volume and Complexity
Index Partition n Copy n
SEARCH INSTANCE Index Partition 1
Copy 1
SEARCH INSTANCE Index Partition 2
Copy 1
SEARCH INSTANCE Index Partition 1
Copy 2
SEARCH INSTANCE Index Partition 1
Copy n
MovieMate Application
Multiple Sources Multiple Functions
When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. !
Iron Man (2008)!
Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...!
Iron Man 2 (2010)!
When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. !
Iron Man 3 (2013)!
On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...!
The Man With The Iron Fists (2012) !
Cancel Iron Man!
Movies Search Social Account Nearby
Done Iron Man
!
Movies Search Social Account Nearby
Mobile Experience
Agenda • Sourcing your documents • Retrieval and ranking • Search user interface • Performance and Scale
• Developer example: Peter Simpkin, Elsevier Oxford
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
CloudSearch Documents • Unique identifier • Version • Fields
– Indexed according to configuration – Source of matches
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Amazon RDS
Application Content Movie data Theater data User reviews, lists etc.
DynamoDB
User actions
Amazon S3
Help files Media (clips, images) Articles
Bootstrap Strategy
Source System
Processing Script
Queuing Batching
Amazon EC2
Amazon EC2
Amazon CloudSearch
Amazon SQS
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Document Construction • One source will be the master
for each record
determine doc id and version create fields for each auxiliary source gather additional data send or queue the document
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Relational DB
Movie
Title
Description
TheaterID
Theater
Name
AddressesID
ShowtimesID
Addresses
Street
City
State
Showtimes
Date
Time
State
S3 • Clips, images, reviews • Apache Tika to extract content • S3 Metadata for additional fields
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Dynamo DB
DynamoDB CloudSearch
Table Domain
Item DocumentAttribute FieldAttributeAttributeAttribute
FieldFieldField
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. !
Iron Man (2008)!
Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...!
Iron Man 2 (2010)!
When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. !
Iron Man 3 (2013)!
On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...!
The Man With The Iron Fists (2012) !
Cancel Iron Man!
Movies Search Social Account Nearby
Done Iron Man
!
Movies Search Social Account Nearby
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Searching Show Times id title description t_name t_street date time
1 Iron Man
... Galaxy Main 11/11 12:30pm
2 Iron Man
... Galaxy Main 11/11 1:15pm
3 Iron Man
... Galaxy Main 11/11 2:45pm
4 Iron Man
... Galaxy Main 11/11 6:00pm
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Heterogenous Data
Multi Domain Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Updating CloudSearch
Amazon EC2 Amazon CloudSearch
Amazon SQS Amazon EC2
Amazon S3 DynamoDB Amazon RDS
Web Server
Users
Update Processor
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Section Summary • Multiple sources • Bootstrap / Update • Heterogeneous data
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Agenda • Sourcing your documents • Retrieval and ranking • Search user interface • Performance and Scale
• Developer example: Peter Simpkin, Elsevier Oxford
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Good Matches When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. !
Iron Man (2008)!
Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...!
Iron Man 2 (2010)!
When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. !
Iron Man 3 (2013)!
On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...!
The Man With The Iron Fists (2012) !
Cancel Iron Man!
Movies Search Social Account Nearby
The Search Algorithm • Locate documents that satisfy Boolean
constraints – Usually intersection
• Relevance rank those documents – Differentiates from databases by relevance
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Document Structure Movie
title
description
user_rating
likes
release_date
latitude
longitude
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Configuring for Search • Text fields for individual word search
– User-generated and external text – titles, descriptions
• Literal fields for exact matches – Application-generated text like facets
• Integer fields for range searching and ranking
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Searching Text http(s)://<endpoint>/2011-02-01/search? • Simple searches
– q=<text>
• Filtering – bq= (or title:'iron' (and description:'iron' description:'man'))
• Filtering with integer ranges – bq=(and 'iron man' year:..2010)
• Geo filtering – bq=(and 'iron man' latitude:12700..12900 longitude:5700..5800)
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Search Results {"rank": "-‐text_relevance", "match-‐expr": "(label 'iron man')", "hits": { "found": 204, "start": 0, "hit": [ { "id": "sontsst12cf5f88b42" }, { "id": "sopvopr12ab017f082" }, { "id": "sorzrpw12ac468a13b" }, ] }, ... }
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Relevant Results When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil. !
Iron Man (2008)!
Tony Stark has declared himself Iron Man and installed world peace... or so he thinks. He soon realizes that not only is there a mad man...!
Iron Man 2 (2010)!
When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution. !
Iron Man 3 (2013)!
On the hunt for a fabled treasure of gold, a band of warriors, assassins, and a rogue British soldier descend upon a village in feudal China, where a humble blacksmith...!
The Man With The Iron Fists (2012) !
Cancel Iron Man!
Movies Search Social Account Nearby
Customizing Ranking • text_relevance and cs.text_relevance • Rank expressions
– Compute a score for each document – &rank=<function>
• Defined in the console • Defined at query-time
– &q='iron-man'&rank-recency=text_relevance + year &rank=recency
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Field Weighting
Field Weighting • Adjust relative importance of fields • &rank-title=
cs.text_relevance({"weights":{"title":4.0}, "default_weight":1})
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Popularity
Popularity • Convert floating point to integer • Weight by the number of ranks • rank-pop=text_relevance +
log10(user-rating * number-user-ranks) * 10 + metascore * 3
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Freshness
Freshness • Exponential decay function
• &rank-decay=text_relevance + 200*Math.exp(-0.1*days_ago)
r = ce−λt
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Done Iron Man
!
Movies Search Social Account Nearby
Location Sort
Location Sort • Latitude and longitude
expressed as integers • Denormalized for particular
theaters with locations
Movie
title
description
user_rating
likes
release_date
latitude
longitude
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Location Sort • Cartesian distance function
• &rank-geo=sqrt(pow(latitude - lat, 2) + pow(longitude - lon), 2)
• &rank=-geo
(lat − latuser )2 + (lon− lonuser )
2
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Rank Expressions: Combined • &rank-combined=text_relevance + 2.0 * geo +
0.5 * popularity + 0.3 * freshness • &rank=combined
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Section Summary • Search API basics • Customizing ranking
– Field weighting, popularity, freshness, GEO, combined
• Rank expression comparison tool
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Agenda • Sourcing your documents • Retrieval and ranking • Search user interface • Performance and Scale
• Developer example: Peter Simpkin, Elsevier Oxford
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Facets
Facets
Simple Faceting: Document
Movie
title
description
genre
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Simple Faceting: Configuration Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Simple Faceting: Query q=iron+man&facet=genre {"rank": "-‐text_relevance", "match-‐expr": "(label 'star wars')", "hits": {"found": 7, "start": 0, "hit": [] }, "facets": { "genre": { "constraints": [ {"value": "Family", "count": 62}, {"value": "Action/Adventure", "count": 21}, {"value": "Drama", "count": 5 },
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Simple Faceting: UI <div class='facet'> <ul class='facet_list'> <?php $genres = $resultsObj-‐>facets-‐>genre-‐>constraints; for ($i = 0; $i < count($genres); $i++) { $curGenre = $genres[$i]; $curCount = $thisGenre-‐>count; ?> <li class='facet_item'> <div class='facet_name'><?=$curGenre?></div> <div class='facet_count'><?=$curCount?></div> </li> <?php } ?> </ul> </div>
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Facets
Document • title: Lincoln • description: ... • oscar1: Awards • oscar2: Awards/Best Actor • oscar3: Awards/Best Actor/
Daniel Day Lewis
Movie title description oscar1 oscar2 oscar3
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Query &q=lincoln&facet=oscar1,oscar2,oscar3 {"rank": "-‐text_relevance", "hits":{...}, "facets": { "oscar1": { "constraints": [ {"value": "Awards", "count": 23}, {"value": "Nominations", "count": 124}]}, "oscar2": { "constraints": [ {"value": "Awards/Best Actor", "count": 6}, {"value": "Awards/Best Actress", "count": 3}...]}, "oscar3": { "constraints": [ {"value": "Awards/Best Actor/Daniel Day Lewis", "count": 1}, {"value": "Awards/Best Actor/Denzel Washington", "count": 2}...]},
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Drilldown • bq=oscar1:'Awards' • bq=oscar2:'Awards/Best Actor' • bq=oscar3:'Awards/Best Actor/Daniel Day Lewis' • bq=(and 'star' oscar2:'Awards/Best Actor')
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Section Summary • Simple faceting • Hierarchical faceting • Hierarchical data handling
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Agenda • Sourcing your documents • Retrieval and ranking • Search user interface • Performance and Scale
• Developer example: Peter Simpkin, Elsevier Oxford
The Search Algorithm • Locate documents that satisfy Boolean
constraints – Usually intersection
• Relevance rank those documents – Differentiates from databases by relevance
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Performance Best Practices • Match set size • Text queries perform better than integer queries • Complex relevance functions
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Optimizing Index Size • Trade off literal and uint for cost/performance • Result fields matter most • Enabling faceting increases size
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Wrap Up • Sourcing documents from various locations • Building queries and ranking • UI Components for faceting • Getting the most out of your index
Agenda • Sourcing your documents • Retrieval and ranking • Search user interface • Performance and Scale
• Developer example: Peter Simpkin, Elsevier Oxford
Sourcing your documents Retrieval and ranking Search user interface Performance and Scale Developer example
Agenda
• Elsevier Intro • Search Problem Statement • Enterprise Content Search • Hints and Tips • CloudSearch Observations
• 7,000+ employees in 26 countries • 2,200 journals / article market
share 25% • $3B revenue • Scientific, Technical & Medical
Customers Products Academic Research Institutions
Government & Health
Corporate Research Labs
Individual Researchers
Content Systems
Content Challenges:
• No central place for consumers to discover content
• Is not currently possible to search and retrieve atomic assets
• Assets are not reusable across products
Consumer Platforms
Enterprise Content Search Engine
Search Opportunities:
• Create a comprehensive inventory to discover easily content Elsevier owns
• Provide access to Granular / Modular content they want at will
• Assets must be uniquely addressable
Empower our product development partners
Enterprise Content Search eco-system
Federated Content Warehouse Product Platform Data center
E.U Corporate Data center
U.S Corporate Data center
Amazon S3 DynamoDB
Amazon SWF Amazon CloudSearch
SDF metadata
Simple Search UI
Elsevier Technical Drivers & Approach • Fully-managed, full featured search service in
the cloud • Automatically scales for data & traffic • Easy to set up and use • PoC created in days • Search Engine as a Service • Pay-as-you-go pricing model
Hints & Tips (and issn:'0022-1694'
(and type:'1.2'
(and (not action:'D')
(or (and pubstartdate:..2013176 pubenddate:2005002..)
(or (and pubstartdate:2005001
(and pubstarttime:0.. pubstarttime:..235959))
(or (and pubstartdate:2013177 pubstarttime:..235959)
(or (and pubenddate:2005001 pubendtime:0..)
(and pubenddate:2013177
(and pubendtime:..235959 pubendtime:0..)))))))))
• Query Response Time = 5 seconds
Optimising Nested Queries (and issn:'0022-1694' type:'1.2'
(not action:'D')
(or (and pubstartdate:..2013176 pubenddate:2005002..)
(and pubstartdate:2005001 pubstarttime:0..235959)
(and pubstartdate:2013177 pubstarttime:0..235959)
(and pubenddate:2005001 pubendtime:0..)
(and pubenddate:2013177 pubendtime:0..235959)))
• Response Time = 2.5 seconds
Optimised Nested Query ((not action:'D')
(or (and issn:'0022-1694' and type‘1.2'
and pubstartdate:..2013176 pubenddate:2005002..)
(and issn:'0022-1694' and type‘1.2'
and pubstartdate:2005001 pubstarttime:0..235959)
(and issn:'0022-1694' and type‘1.2'
and pubstartdate:2013177 pubstarttime:0..235959)
(and issn:'0022-1694' and type‘1.2'
and pubenddate:2005001 pubendtime:0..)
(and issn:'0022-1694' and type‘1.2'
and pubenddate:2013177 pubendtime:0..235959)))
• Response Time = 0.17ms
CloudSearch Observations facilitate knowledge sharing on content matters across Elsevier’s product platforms
ability to leverage content infrastructure and capabilities across Elsevier’s divisions
easy to integrate with existing on-premise Content Systems
speed to market, allows developers to focus building other core Content Strategy components
need to spend time optimising queries to maximise performance
Thank YouPlease give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
SVC302