svc101 building search into your app - aws re: invent 2012
DESCRIPTION
Amazon CloudSearch is a fully-managed search service in the cloud that allows customers to easily integrate fast and highly scalable search functionality into their applications. In this session, we cover the basics of search and search engines. We take an introductory look at CloudSearch along with a deep dive showing how to build a CloudSearch-based web application.TRANSCRIPT
Search experience = user retention and revenue
DNS / Load Balancing AWS Query
Search API Console Config
API
Command
Line Tools Console Doc
Svc API
Command
Line Tools Console
SEARCH SERVICE Search Documents
DOCUMENT SERVICE Add Documents
Update Documents
Delete Documents
Create Domains
Configure Domains
Delete Domains
CONFIG SERVICE
ACCESS CONTROL ACCESS CONTROL ACCESS CONTROL
Search Domain
SEARCH INSTANCE Index Partition n
Copy 1
SEARCH INSTANCE Index Partition 2
Copy 2
SEARCH INSTANCE Index Partition n
Copy 2
SEARCH INSTANCE Index Partition 2
Copy n
SEARCH INSTANCE
DATA Document Quantity and Size
TRAFFIC Search Request Volume and Complexity
Index Partition n Copy n
SEARCH INSTANCE Index Partition 1
Copy 1
SEARCH INSTANCE Index Partition 2
Copy 1
SEARCH INSTANCE Index Partition 1
Copy 2
SEARCH INSTANCE Index Partition 1
Copy n
• The Challenge
• The Data: The Million Song Data Set
http://labrosa.ee.columbia.edu/millionsong/
• The Application
Field name Description
artist_mbid The musicbrainz.org ID
artist_name Name of the artist
audio_md5 Hash code of the audio
danceability According to The Echo Nest
duration In seconds
loudness General loudness of the track
song_hottnesss According to Echo Nest
title Song title
year Song year
• Create an Amazon CloudSearch domain
• Identify use case and supporting data
• Upload data
• Configure the domain
• Improve document ranking
• Integrate with the front end
• Keep documents up-to-date
• Create an Amazon CloudSearch domain
• Identify use case and supporting data
• Upload data
• Configure the domain
• Improve document ranking
• Integrate with the front end
• Keep documents up-to-date
• Create an Amazon CloudSearch domain
• Identify use case and supporting data
• Upload data
• Configure the domain
• Improve document ranking
• Integrate with the front end
• Keep documents up-to-date
Artist Name
Song Title
Familiarity
Year
Genre
Artist
Year
Title
Artist Name
Genre
Artist Familiarity
Year
• Create an Amazon CloudSearch domain
• Identify use case and supporting data
• Prepare and upload data
• Configure the domain
• Improve document ranking
• Integrate with the front end
• Keep documents up-to-date
Million Song
DataSet
SDF
Batches
Amazon
CloudSearch
SDF Batches [
{"type":"add",
"id": "soaczam12ab0181559",
"version":5,
"lang":"en",
"fields": {
"title":"Ruby Tuesday",
"artist_name":"The Rolling Stones",
"year":"1967",
"artist_familiarity":864830,
"genre":["alternative", "ambient", "dance",
"electronic", "pop", "r&b", "reggae"]
}
},
… ]
• Create an Amazon CloudSearch domain
• Identify use case and supporting data
• Prepare and upload data
• Configure the domain
• Improve document ranking
• Integrate with the front end
• Keep documents up-to-date
Text fields for
matching user terms
Result enabled to
retrieve source data
Literal fields for
Faceting
Facet enabled to
retrieve facet counts
Search enabled for
narrowing
Integer fields for
ranking, narrowing
• Create an Amazon CloudSearch domain
• Identify use case and supporting data
• Prepare and upload data
• Configure the domain
• Improve document ranking
• Integrate with the front end
• Keep documents up-to-date
• Create an Amazon CloudSearch domain
• Identify use case and supporting data
• Prepare and upload data
• Configure the domain
• Improve document ranking
• Integrate with the front end
• Keep documents up-to-date
PHP Integration $results =
file_get_contents(
http://search-mn-songs-5bbplyghbb5tk257rsb7iamlsy." .
"us-east-1.cloudsearch.amazonaws.com" .
"/2011-02-01/search?q=" . $keyword .
"&return-fields=title,artist_name,year&" .
"facet=artist_name,year_facet,genre&" .
"rank=-" . $rank);
$resultsObj = json_decode($results);
Simple Search Result
{"rank": "-text_relevance",
"match-expr": "(label 'rolling stone')",
"hits": { "found": 204, "start": 0,
"hit": [ { "id": "sontsst12cf5f88b42" },
{ "id": "sopvopr12ab017f082" },
{ "id": "sorzrpw12ac468a13b" },
] },
...
}
Search Results With Return Values
"hit":
[ { "id": "sontsst12cf5f88b42",
"data": {
"artist_familiarity": [ "925048" ],
"artist_name": [ "The Rolling Stones" ],
"text_relevance": [ "326" ],
"title": [ "Heart Of Stone" ],
"year": [ "1964" ]
}
},
Facets In Search Results
{…"hits": { … },
"facets": {
"genre": {
"constraints": [
{ "value": "pop", "count": 126 },
{ "value": "rock", "count": 125 },
{ "value": "alternative", "count": 109 },
{ "value": "electronic", "count": 106 },
{ "value": "jazz", "count": 58 }, ...
] } }
X
X
• Create an Amazon CloudSearch domain
• Identify use case and supporting data
• Prepare and upload data
• Configure the domain
• Improve document ranking
• Integrate with the front end
• Keep documents up-to-date
26ms
https://github.com/pbs/haystack-cloudsearch
Get Started Now, Free Trial
We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.