site search analytics in a nutshell
Embed Size (px)
DESCRIPTION
Originally presented at SXSW March 13, 2011, on panel with Fred Beecher and Austin Govella. Modified and updated for Web 2.0 Expo talk, October 12, 2011, UX Web Summit September 26, 2012; Webdagene September 10, 2013.TRANSCRIPT

Site Search Analytics in a Nutshell
Louis Rosenfeld
[email protected] • @louisrosenfeld
Webdagane • 10 September 2013

Hello, my name is Lou
www.louisrosenfeld.com | www.rosenfeldmedia.com

Let’s look at the data

No, let’s look at the real dataCritical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET /searchaccess=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16

No, let’s look at the real dataCritical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET /searchaccess=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
What are users searching?

No, let’s look at the real dataCritical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800] "GET /search?access=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ud=1&site=AllSites&ie=UTF-8&client=www&oe=UTF-8&proxystylesheet=www&q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1" 200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800] "GET /searchaccess=p&entqr=0&output=xml_no_dtd&sort=date%3AD%3AL%3Ad1&ie=UTF-8&client=www&q=license+plate&ud=1&site=AllSites&spell=1&oe=UTF-8&proxystylesheet=www&ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
What are users searching?
How often are users failing?

SSA is semantically rich data, and...

SSA is semantically rich data, and...
Queries sorted by frequency

...what users want--in their own words

A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences

A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
Not all queries are distributed equally

A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences

A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
Nor do they diminish gradually

A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences

A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents meet the needs of your most important audiences
80/20 rule isn’t quite accurate

(and the tail is quite long)

(and the tail is quite long)

(and the tail is quite long)

(and the tail is quite long)

(and the tail is quite long)The Long Tail is
much longer than you’d suspect

The Zipf Distribution, textually

Some things you can do with SSA
1.Make it harder to get lost in deep content2.Make search smarter3.Reduce jargon4.Learn how your audiences differ5.Know when to publish what6.Own and enjoy your failures7.Avoid disaster8.Predict the future

#1Make it harder to get lost

Start with basic SSA data: queries and query frequency
Percent: volume of search activity for a unique query during a particular time period
Cumulative Percent: running sum of percentages

Tease out common content types

Tease out common content types

Tease out common content types
Took an hour to...• Analyze top 50 queries (20% of all search activity)
• Ask and iterate: “what kind of content would users be looking for when they searched these terms?”
• Add cumulative percentages
Result: prioritized list of potential content types#1) application: 11.77%
#2) reference: 10.5% #3) instructions: 8.6%
#4) main/navigation pages: 5.91%
#5) contact info: 5.79%
#6) news/announcements: 4.27%

Clear content types lead to better contextual navigation
artist descriptions
album reviews
album pages
artist biosdiscography
TV listings

#2Make search smarter

Clear content types improve search performance

Clear content types improve search performance

Clear content types improve search performance
Content objects related to products

Clear content types improve search performance
Content objects related to products
Raw search results

Contextualizing “advanced” features

Session data suggest progression and context

Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works

Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
search session patterns1. solar energy2. energy

Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
search session patterns1. solar energy2. energy
search session patterns1. solar energy2. solar energy charts

Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
search session patterns1. solar energy2. energy
search session patterns1. solar energy2. solar energy charts
search session patterns1. solar energy2. explain solar energy

Session data suggest progression and context
search session patterns1. solar energy2. how solar energy works
search session patterns1. solar energy2. energy
search session patterns1. solar energy2. solar energy charts
search session patterns1. solar energy2. explain solar energy
search session patterns1. solar energy2. solar energy news

Recognizing proper nouns, dates, and unique ID#s

#3Reduce jargon

Saving the brand by killing jargon at a community collegeJargon related to online education: FlexEd, COD,
College on Demand
Marketing’s solution: expensive campaign to educate public (via posters, brochures)
The Numbers (from SSA):
Result: content relabeled, money saved
query rank query#22 online*#101 COD#259 College on Demand#389 FlexTrack
* “online” part of 213 queries

#4Learn how your audiences differ

Who cares about what?

Who cares about what?

Who cares about what?

Who cares about what?

Why analyze queries by audience?
Fortify your personas with dataLearn about differences between audiences
• Open University “Enquirers”: 16 of 25 queries are for subjects not taught at OU
• Open University Students: search for course codes, topics dealing with completing program
Determine what’s commonly important to all audiences (these queries better work well)

#5Know when to publish what


Interest in the football team:
going...

Interest in the football team:
going...
...going...

Interest in the football team:
going...
...going...
gone

Interest in the football team:
going...
...going...
gone
Time to study!


Before Tax Day


After Tax Day

#6Own and enjoy your failures

Failed navigation?Examining unexpected searching
Look for places searches happen beyond main page
What’s going on?
• Navigational failure?
• Content failure?
• Something else?

Where navigation is failing (“Professional Resources” page)
Do users and AIGA mean different things by “Professional Resources”?

Comparing what users findand what they want

Comparing what users findand what they want

Failed business goals?Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)

Failed business goals?Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)

Failed business goals?Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)

#7Avoid disasters

The new and improved search engine that wasn’t
Vanguard used SSA to help benchmark existing search engine’s performance and help select new engine
New search engine “performed” poorlyBut IT needed
convincing to delay launch
Information Architect &
Dev Team Meeting
Search seems to have a few
problems… Nah
.
Where’s the
proof?
You can’t tell
for sure.

What to do? Test performance of common queries
“Before and after” testing using two sets of metrics1.Relevance: how reliably the search engine
returns the best matches first2.Precision: proportion of relevant results
clustered at the top of the list

Old engine (target) and new compared
Note: low relevance and high precision scores are optimal
More on Vanguard case study: http://bit.ly/D3B8c

Old engine (target) and new compared
Note: low relevance and high precision scores are optimal
More on Vanguard case study: http://bit.ly/D3B8c
uh-oh

Old engine (target) and new compared
Note: low relevance and high precision scores are optimal
More on Vanguard case study: http://bit.ly/D3B8c
uh-oh better

#8Predict the future

Shaping the Financial Times’ editorial agendaFT compares these
• Spiking queries for proper nouns (i.e., people and companies)
• Recent editorial coverage of people and companies
Discrepancy? • Breaking story?!
• Let the editors know!Seed your

Can SSA bring us together?

Lou’s TABLE OF OVERGENERALIZED
DICHOTOMIESWeb Analytics User Experience
What they analyze Users' behaviors (what's happening)
Users' intentions and motives (why those things happen)
What methods they employ
Quantitative methods to determine what's happening
Qualitative methods for explaining why things happen
What they're trying to achieve
Helps the organization meet goals (expressed as KPI)
Helps users achieve goals (expressed as tasks or topics of interest)
How they use data Measure performance (goal-driven analysis)
Uncover patterns and surprises (emergent analysis)
What kind of data they use
Statistical data ("real" data in large volumes, full of errors)
Descriptive data (in small volumes, generated in lab environment, full of errors)



Lands End and SKUs

Lands End and SKUs
SKU: # 39072-2AH1

Use SSA to start work on a site report card

Use SSA to start work on a site report card
SSA helps determine common information needs

Read this
Search Analytics for Your Site: Conversations with Your Customers by Louis Rosenfeld (Rosenfeld Media, 2011)
www.rosenfeldmedia.com
Use code WEBDAGENE2013
for 20% off allRosenfeld Media books

Louis Rosenfeld [email protected]
www.louisrosenfeld.comwww.rosenfeldmedia.comwww.slideshare.net/lrosenfeld
Say hello