search analytics at enterprise search summit fall 2011

51
Search Analytics What? Why? How? Otis Gospodnetić Sematext International @otisg @sematext sematext.com sematext.com/search-analytics

Upload: sematext-group-inc

Post on 27-Jan-2015

106 views

Category:

Technology


1 download

DESCRIPTION

This presentation describes what Search Analytics is, what value it brings to the table, how it can be used, what additional functionality and values can be build with search data, etc.

TRANSCRIPT

Page 1: Search Analytics at Enterprise Search Summit Fall 2011

Search Analytics

What? Why? How?

Otis Gospodnetić – Sematext International@otisg ◦ @sematext ◦ sematext.com

sematext.com/search-analytics

Page 2: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.2

About Otis Gospodnetić

• ASF Member: Lucene, Solr, Nutch, Mahout

• Author: Lucene in Action 1 & 2

• Entrepreneur: Sematext, Simpy

Page 3: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.3

Sematext Metrics

100% organic: no GMO, no VC 4 years old < 10 people 7 countries 3 timezones 2 continents > 100 customers

Page 4: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.4

About Sematext

Products & Services

Consulting, Development, Tech Support:

Search (Lucene, Solr, ElasticSearch...) Big Data (Hadoop, HBase, Voldemort...) Web Crawling (Nutch, Droids) Machine Learning (Mahout)

Page 5: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.5

Agenda

What is Search Analytics and why it matters Example reports and their value Optional: Search Analytics in the Cloud

Page 6: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.6

Communication

twitter.com/sematext twitter.com/otisg hash tags: #stsa or #stanalytics http://sematext.com/search-analytics/index.html Raise your hand! [email protected]

Page 7: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.7

Why

searchusers

searchproviders

searchexperience

Page 8: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.8

Why Oh Why

searchproviders

searchexperience

This search sucks!It takes 17 tries to find anything here!

F!?@#$%^&?!?

searchusers

Cool, the latest search tweaks made our site really sticky!

Awesome!

Page 9: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.9

Fill in the Missing Piece

Search Analytics

Performance Monitoring

Quality Assurance

Tuning UI

Page 10: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.10

Blind Leading the Blind

Page 11: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.11

Analytics as Compass

Search logs are your Map

Search Analytics is your Compass

Page 12: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.12

The Bottom Line Why

Measure and monitor everything. Supports (re)design, navigation choices Helps with content acquisition & enhancement Improve search experience Mula

Page 13: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.13

The Moment of Truth

Question for the audience #1

What do you use for Search Analytics?

a) Home grown stuffb) Google Analyticsc) Omnitured) Webtrendse) Otherf ) Nothing

Page 14: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.14

Search Analytics Basics

Collect: queries & clicks & interactions & ... Analyze: actions / xactions / conversions Output: reports – over time Output++: feedback loop

The means, not the goal Ongoing, not one-off

remember this

Page 15: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.15

Search vs. Web Analytics

User intent and information needs vs. inferring Hand in hand Ideally you can relate data from both or even

unify it

Page 16: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.16

Report Types

Failures vs. non-failures

Actionable vs. non-actionable

Trends vs. summaries

Page 17: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.17

Failures vs. Non-Failures

Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency

Query rate Query volume Top seen & clicked

docs Top queries Terms per query Search sessions Search users Distinct queries

Page 18: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.18

Value of Failure Fixes

Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency

Re-search

Findability

Relevance Tuning

Performance Tuning

Page 19: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.19

Measure, then Fix

If you can't measure, it you can't fix it!

Page 20: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.20

Relevance A/B Testing

Page 21: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.21

Tracking Zero Hits

Page 22: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.22

Watching Latency

Page 23: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.23

Search Analytics & Measuring

If you can't measure it, you can't fix it!

You can't measure it if you don't have Analytics

Page 24: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.24

Actionable vs. Non-Actionable

Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency

Query rate Query volume Top seen & clicked

docs Top queries Terms per query Search sessions Search users Distinct queries

Page 25: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.25

More Fixin' Query rate Query volume Search sessions Search users Top seen & clicked

docs Top queries Terms per query Distinct queries

Navigation & Design

Results Shuffling Diversification

Recommendations

AutoCompleteSearch box size

Page 26: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.26

Output++: Data is Power

AutoComplete - $MM improvement Better DYM Spellchecker Related Searches Recommendations Relevance Feedback ...

Page 27: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.27

Closing the Loop

searchusers

searchproviders

searchexperience

Page 28: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.28

Resources

http://rosenfeldmedia.com/books/searchanalytics/

Search Analytics for Your SiteLouis Rosenfeld

Search Analytics What? Why? How?

Search Analytics with Flume and HBase

Search Analytics Business Value & NoSQL Backend

http://blog.sematext.com/tag/analytics/

Page 29: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.29

Key Take-aways

Without Analytics you are blind

If you can't measure it, you can't fix it

Use Search Analytics to understand, measure and improve search

Using Search Analytics means having a competitive advantage

Page 30: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.30

Time permitting:

Behind the scenes of Sematext Search Analytics

Behind the Scenes

Page 31: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.31

sematext.com blog.sematext.com @sematext @otisg [email protected]

Want SA? Grab me or go to: sematext.com/search-analytics

Hash tags: #stsa or #stanalytics

Contact

Page 32: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.32

What We've Built

Search Analytics SaaS Numerous reports (e.g. query volume,

rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.)

Trending over time Comparisons of time periods Top N reports Filter, slice and dice

Page 33: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.33

Sematext Search Analytics

Page 34: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.34

Big Dreams

SaaS Multitenant Large Scale – Massive Data Cloud

Page 35: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.35

Storage Choices

RDBMS: MySQL, PostgreSQL HDFS Hive HBase Cassandra

Page 36: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.36

SaaS vs. In-House

Question for the audience #2

SaaS vs in-house Search Analytics?

a) SaaSb) in-house

Page 37: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.37

Sematext Search Analytics

Page 38: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.38

Sematext Search Analytics

Page 39: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.39

Sematext Search Analytics

Page 40: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.40

Sematext Search Analytics

Page 41: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.41

Data Flow See Search Analytics with Flume and HBase

http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/

Page 42: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.42

Data Collection See Search Analytics with Flume and HBase

http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/

Page 43: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.43

Core Tech

JavaScript Beacons Metric Capture Web App aka Receiver Flume Agents, Collectors, Sinks HBase MapReduce Aggregations Search Analytics Reporting Web App

Page 44: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.44

What is Flume

Distributed data/log collection service Scalable, configurable, extensible Centrally manageable, open source

Agents get data from app, Collectors save it Abstractions: Source → Decorator(s) → Sink

Page 45: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.45

What is HBase

Scalable, reliable, distributed, column-oriented DB On top of HDFS MapReducable

Page 46: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.46

Data Flow, Detailed

Page 47: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.47

Why Flume

Reliable delivery e.g. queue msgs locally if destination unreachable

Easy, centralized management via Web UI or console

Good community, good progress, now @ASF But: more complex, more moving parts On Flume: slideshare.net/cloudera/inside-flume Alternatives: Kafka, Scribe...

Page 48: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.48

Why HBase

Scalable raw & aggregate data storage MapReduce data input Fast scans for time ranges, fast key lookups Easy storage and compute power expansion Good looking roadmap, community, progress

Page 49: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.49

Open Sourcing

2 open-source projects:

github.com/sematext/HBaseWD

github.com/sematext/HBaseHUT See sematext.com/open-source/index.html

Patches for Flume and HBaseblog.sematext.com/tag/flume/

Page 50: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.50

Challenges

Data size. Solutions: Compression (4-5x smaller with lzo) Data pruning (variable levels)

Query string distribution: very long-tail Lots of data to process, update, aggregate

Young tools: Flume, HBase Poor IO on EC2 Hadoop distributions

Page 51: Search Analytics at Enterprise Search Summit Fall 2011

Copyright 2011 Sematext Int'l. All rights reserved.51

sematext.com blog.sematext.com @sematext @otisg [email protected]

Want SA? Grab me or go to: sematext.com/search-analytics

Hash tags: #stsa or #stanalytics

Contact