rees craig - search, apis, solr and the sensis journey

Post on 24-Mar-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Search, APIs, Capability Management and the Sensis Journey Craig Rees •  Search capability •  Platform selection •  Relevance •  Architecture •  Hurdles •  What’s next •  Two of the top 10 visited online sites in Australia (WhitePages.com.au and YellowPages.com.au) •  Sensis helps Australians find, buy and sell •  From print directories to a cross-platform lead generator •  Sensis publishes over 1.8 Million business listings

TRANSCRIPT

Search, APIs, Capability Management

and the Sensis Journey

Craig Rees

•  Project background

•  Platform selection

•  Search capability

•  Relevance

•  Architecture

•  Quality management

•  Hurdles

•  What’s next

Today’s menu

•  Sensis helps Australians find, buy and sell

•  From print directories to a cross-platform lead generator

•  Sensis publishes over 1.8 Million business listings

•  Two of the top 10 visited online sites in Australia (WhitePages.com.au and YellowPages.com.au)

Sensis

Business objectives •  Drive presence in the local

search market place •  Open up the largest database of

business listings in Australia •  Reduce the effort required from

local search developers •  Free to use, we are after the

reporting

Technology objectives •  Develop a total search platform •  Relevancy testing as part of the

development lifecycle •  A framework to identify problem

spaces •  Manageable platform •  Continuous deployments

Project background

Developer portal

Platform selection

•  Support for the search capability team

•  Structured vs non structured data

•  Deterministic vs black box

•  Non propriety code base

•  Community backing

Unmanaged

Adhoc

Monitored

Managed

Optimized

• No resources • No reporting • Out of the box

features

• Adhoc processes • Part time team • Static dictionaries •  Individual led innovation

• Defined team • Regular monitoring • Static autosuggest • Basic linguistics

• Online dashboards • Test environments • Dynamic search refinements • Targets and metrics

• A/B testing • Machine learning • External collaboration • Multiple contexts

The Sensis Search capability maturity model *Courtesy of Pete Crawford & Craig Lonsdale

Lvl 5

Lvl 4

Lvl 3

Lvl 2

Lvl 1

Context is key

Intent •  Name •  Type •  Product •  Spatial

Location

Chronology

Social Graph

Individual

Device

Historical search Data

MongoDB Business

Data

Geo Service

Index

Name Query Handler

Type Query Handler

Business Data

Search Service

Reporting Service

Reporting Events

Publisher

Solr

API

Ontologies

Mashery

Our architecture

Historical search Data

MongoDB Business

Data

Geo Service

Index

Name Query Handler

Type Query Handler

Business Data

Search Service

Reporting Service

Reporting Events

Publisher

Solr

API

Ontologies

Mashery

Data staging

Historical search Data

MongoDB Business

Data

Geo Service

Index

Name Query Handler

Type Query Handler

Business Data

Search Service

Reporting Service

Reporting Events

Publisher

Solr

API

Ontologies

Mashery

Search

Historical search Data

MongoDB Business

Data

Geo Service

Index

Name Query Handler

Type Query Handler

Business Data

Search Service

Reporting Service

Reporting Events

Publisher

Solr

API

Ontologies

Mashery

API

Historical search Data

MongoDB Business

Data

Geo Service

Index

Name Query Handler

Type Query Handler

Business Data

Search Service

Reporting Service

Reporting Events

Publisher

Solr

API

Ontologies

Mashery

API proxy

•  Moved from a black box solution to a manageable platform

•  Deliver search improvements without major code changes

•  Understand how results were calculated

•  Identity problems scientifically

•  Continuously tune and test relevance

Evolution of search management

Yesterday Today Tomorrow

Problem spaces, quality management & tuning

Path Analysis used to identify problems spaces

Problem spaces, quality management & tuning

“Gold Sets” used to define overall quality score (TREC)

Features signed off only when they make a positive impact to quality score

Specific gold sets for each problem space:

Ø  Intent Ø  Spelling & stemming Ø  Location Ø  Phrase parsing

Search quality analysis and testing

Results examiner

Score analysis

Tuning

Lather, rinse, repeat

Hurdles along the way

•  Data redundancy and homogeneity •  Solr ranking of rare terms •  Intent differentiation •  Contextual synonyms

Where next?

•  Query engine •  Facets / autosuggest •  Real time tuning •  Machine learning •  Multi term queries •  Scoring thresholds •  Content Value

Questions?

Email: craig.rees@sensis.com.au www: developers.sensis.com.au Twitter: @SensisAPI

@ablebagel

top related