how just a little data analysis can improve your content
DESCRIPTION
Slides from a webinar given on February 5th, 2013, organized by Comtech Services (http://comtech-serv.com/ ). Abstract as follows: In the past, it was often difficult for information development teams to obtain quantitative data on how their content was used. In recent years, with the spread of online content delivery, it has become easier to obtain such data. Now, the challenge is how to interpret it in order to make content more effective. In this webinar, Joe Pairman from HTC's User Education team will show how content usage data, ratings by users, and search query records can: • Indicate appropriate vocabulary • Contribute to taxonomy development • Suggest areas of focus for content improvements • Help to answer specific questions about designing effective contentTRANSCRIPT
How just a little data analysis can improve your content
Joe Pairman
Listening for movement in the mine. National Institute for Occupational Safety and Health (NIOSH). www.flickr.com/photos/25069384@N03/2492849690/
Introduction How just a little data analysis can improve your content — Joe Pairman
Background
• DITA XML implementation at HTC: effective web content a primary driver
• From “How do we design for the web?” to “What can we learn from the web?”
• Co-ordinated analytics and user feedback plan
• Main focus is improving content
• This presentation covers methods, tips, and lessons learned from that
• Exploration of ideas rather than a technical guide
How just a little data analysis can improve your content — Joe PairmanIntroduction
Slide types
Ideas and overviews
Cautionary notes
How to
Tips and insights
How just a little data analysis can improve your content — Joe PairmanIntroduction
Examples in this presentation
Online knowledge base of support articles for a fictitious e-reader device
http://commons.wikimedia.org/wiki/File%3AEbook_reader_icon.png By netalloy (Open Clip Art Library image's page) [see page for license], via Wikimedia Commons
How just a little data analysis can improve your content — Joe PairmanIntroduction
The predominant flavor of web analytics
“This is a fast-growing category that's generated tremendous interest in recent years due to the advertising and marketing value derived from tracking and understanding user behavior.”
Morville & Rosenfeld, Information Architecture for the World Wide Web, 3rd Edition (emphasis added)
• Much web analytics aims to directly improve sales
• In contrast, content-based sites focus on delivering effective information
• Of course (for a commercial site), the goal is still sales, but indirectly
How just a little data analysis can improve your content — Joe PairmanIntroduction
What can web data tell us about content?
• What people are searching for, and the language they use to search for it
• What they’re viewing and how long they’re staying there
• (With a ratings system) How much they like what they’re seeing
• (With a combination of metrics) What we can focus on for improvement
• What's the effect of particular qualities (graphics, word count, links, etc)
How just a little data analysis can improve your content — Joe PairmanIntroduction
You need…
• Access to analytics data
• Significant body of homogeneous content, such as knowledge base, established blog
• Significant views of that content
• Data such as searches, page views, ratings
How just a little data analysis can improve your content — Joe PairmanIntroduction
What can’t web data tell us?
• How to design our content (it can suggest which things work better but in the end we still need a coherent design)
• Why the patterns exist (interpretation is up to us)
• What the full context is
How just a little data analysis can improve your content — Joe PairmanIntroduction
Ultimately
The data provides focus and pointers, not answers
Search terms
How just a little data analysis can improve your content — Joe PairmanSearch terms
What can search query data tell us?
• Top searches (so crucial content)
• The vocabulary that customers use
• The way that customers classify things
• And much more
How just a little data analysis can improve your content — Joe Pairman
Site search External search
Pros
• Users more likely to know what they’re looking for?
• A much wider range of data available
• Potentially many more queries
• A much wider range of search terms
Cons
• Increasingly, Google is where people search first
• Poorer range and quality of results may drive people away from your site search
• Still only those who made it to your site
• Google encrypted search: now up to a third of queries may not have associated terms
Search terms
External search v.s. site search 1
How just a little data analysis can improve your content — Joe PairmanSearch terms
External search v.s. site search 2
• It’s possible that site search is used more for technical or specialized info
• But some argue against this
• Best way would be to actually compare external (referral) to site (local) terms
• External is probably still the best way to get startedRosenfeld, Louis. 2011. Search Analytics for Your Site. New York: Rosenfeld Media. www.rosenfeldmedia.com/books/searchanaly tics/
How just a little data analysis can improve your content — Joe PairmanSearch terms
Processing search terms 1: Data collection
• Even if your content is only one section of the site, it’s best to get the whole site’s search queries
• If a lot, try using a phrase to filter, such as "how to". Also filter out the obvious irrelevant terms
• But if you do this, compare with other sources to make sure not too skewed
How just a little data analysis can improve your content — Joe PairmanSearch terms
Processing search terms 2: Common phrases
• Filter out small words: and, the, a
• Consider getting 2- and 3-word phrases too:
back up ≠ back + up
• Even at this stage the results may be very interesting
How just a little data analysis can improve your content — Joe PairmanSearch terms
Processing search terms 3: Categorizing
• Based on the frequent keywords, draft out categories. Not too granular; the idea is to make big baskets to categorize quickly.
• Categorize the original search terms, based on these categories (automate this!) Anything uncategorized goes in “Other”.
• Spot check your categorized terms so far.
• Look at “Other”, and think up new categories.
• Iterate a couple of times. Probably some manual categorization at the end.
How just a little data analysis can improve your content — Joe PairmanSearch terms
Using search data 1: Prioritization
• Do you have gaps? Are you putting energy into the right places?
How just a little data analysis can improve your content — Joe PairmanSearch terms
Using search data 2: Language
• Based on your categories, look into the language that people actually search for most:
display or screen?
storage, memory, or just space?
• Best place for frequent terms is page title; next is intro paragraph
• After that, try to get terms into body of the page.
• Last resort is index or other non-visible keywords (but that’s mostly for internal site search, not external searches)
• Strike a balance between using a range of terms and “stuffing”
How just a little data analysis can improve your content — Joe PairmanSearch terms
Using search data 3: Classification
• How do your site users classify subject areas?
For example, a UI-driven category of “Sharing” might not match users’ distinct searches for recommend a book and sync notes
• If designing from scratch (or big revamp) this work should probably come first
• Search terms seem particularly amenable to a flat, “tagging” approach, but can be informative no matter the approach
Search terms How just a little data analysis can improve your content — Joe Pairman
Other avenues for exploration
• Segmentation by screen size / geography / language
• Social media monitoring
• Further site search data such as audience and searches with no results
Page views and time on page
How just a little data analysis can improve your content — Joe PairmanPage views and time on page
Food for thought
Page
vie
ws
Pages
(simulated data)
How just a little data analysis can improve your content — Joe PairmanPage views and time on page
High (unique) page views
• Some indication of what's popular
• Compare with search keyword categories, to identify gaps
• Doesn’t identify whether the pages are doing a good job, or even if they’re actually the things users were looking for
How just a little data analysis can improve your content — Joe PairmanPage views and time on page
Low (unique) page views
• Generally could indicate candidate for removal, but...
• Could be not effective information on a “niche” topic
• Could be useful but not findable
How just a little data analysis can improve your content — Joe PairmanPage views and time on page
Time on page
• Seems appealing at first — longer means better (up to a point)?
• But people can just leave a page open
• Some pages might be harder to read than others, so take longer?
• Some topics just deeper than others
• However, low time on page could be useful...
How just a little data analysis can improve your content — Joe PairmanPage views and time on page
Time on page correlates with related keywords
• When people land on a page that wasn’t what they wanted, they don’t tend to stay long:
• Pages with average time of less than a minute could be flagged.
• Though tip-style pages may have short time on page but still be popular.
Page ratings
How just a little data analysis can improve your content — Joe PairmanPage ratings
What can ratings tell us?
• Do people like the page or not? (For whatever reason.)
• Can be a good metric, when combined with other data. A simple example:
High page views Low page views
High positive ratings ratio Good Could be helpful info on a niche
subject, or perhaps is hard to findLow positive ratings ratio Needs improved Possible candidate for removal?
How just a little data analysis can improve your content — Joe PairmanPage ratings
Cautions about ratings
• Avoid assumptions. “Not helpful” doesn’t always mean the page content is unsuitable for its purpose.
• Don’t use in isolation.
• Combine with qualitative data if at all possible. Comments, usability studies, social media monitoring, etc.
How just a little data analysis can improve your content — Joe PairmanPage ratings
What you need…
• A rating per page
• Should have at least ability to rate positively and negatively (not just "like", which is dubious - people don't even remember what they liked and why)
• Not really about lengthy surveys — they are a separate thing and require a lot more preparation
How just a little data analysis can improve your content — Joe PairmanPage ratings
Getting a better response rate
• Keep the ratings system as simple as possible
• If there’s the chance to provide a comment, make sure this shows up after a rating is selected
Kohavi, R; Henne, R; and Sommerfield, D: Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO (Slides from talk on Controlled Experiments). www.exp-platform.com/Documents/controlledExperimentsHippoEbay.pdf
How just a little data analysis can improve your content — Joe PairmanPage ratings
How to prepare ratings data
• Make sure it's comparable — i.e. don't compare product section to support section
• If it's binary — helpful or not — divide positive by negative:
756 helpful divided by 230 not helpful gives a helpfulness ratio of 3.29
• Even if have multiple negative options, sum them and do the same, though hang on to source data — it could be useful
• You end up with a list of pages, ranked by their helpfulness ratio
How just a little data analysis can improve your content — Joe PairmanPage ratings
If few rate a page, do the ratings count?
• Response rate may correlate with helpfulness ratio (so don’t ignore pages with low response rate)
• Response rate is a useful metric in itself
Resp
onse
rate
Helpfulness
(simulated data)
Making a dashboard with synthetic metrics
How just a little data analysis can improve your content — Joe PairmanSynthetic metrics dashboard
A combination of metrics could indicate …
• Which pages need to tackle their subject more effectively
• Which pages need to be more findable (similar to above but not the same)
• Which pages need to discourage wrong searches (different again)
• Which pages are candidates for removal
• Which pages work well (so are examples to follow)
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
Dashboard — overview
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
Relative measures are fine
• What’s a good helpfulness ratio? How many page views do we need?
• Very hard to answer these kinds of questions (especially at first)
• Rather, focus on relative measures: which pages are comparatively weak or strong
How just a little data analysis can improve your content — Joe PairmanSynthetic metrics dashboard
Calculating low, medium, & high rankings
• For each metric, create a column to show whether the page is in the bottom third, middle third, or top third
• In Excel, use something like this:
=IF(RANK(AC2,AC:AC)>ROUND(COUNT(AC:AC)*2/3,0),"Low",IF(RANK(AC2,AC:AC)>ROUND(COUNT(AC:AC)*1/3,0),"Medium","High"))
How just a little data analysis can improve your content — Joe PairmanSynthetic metrics dashboard
Synthesizing metrics
• Indicators for Improve searchability: High helpfulness ratio, low page views, and response rate at least medium.
How just a little data analysis can improve your content — Joe PairmanSynthetic metrics dashboard
Ratings with other metrics
• Improve content? — Low helpfulness, and page views are at least medium
• Improve searchability? Low page views, high helpfulness, and response rate at least medium
How just a little data analysis can improve your content — Joe PairmanSynthetic metrics dashboard
Ratings with other metrics
• Unrelated searches — may be indicated by low time on page > check keywords for these (remember tip-type pages may have low time on page too)
• Consider getting rid of — low page views, and low or medium helpfulness
How just a little data analysis can improve your content — Joe PairmanSynthetic metrics dashboard
Ratings with other metrics
• Good topics — high helpfulness, and at least medium response rate and page views
How just a little data analysis can improve your content — Joe PairmanSynthetic metrics dashboard
Further research into a (potential) problem page
• Does it really have a problem? For example the time on page may be low, but ratings very good. Is it a short, tip-style page?
• How do people get there? Where do they go when they leave? (Search terms, navpaths, exits.)
• Is there anything the good pages have in common that the problem ones don’t? (See next section … )
Investigating specific attributes
How just a little data analysis can improve your content — Joe PairmanSpecific content attributes
Ratings ratios for answering specific questions
• Are pages with graphics more helpful?
• Is it better to have more subtopics on a page?
• Does the number of links on a page affect bounce rate?
How just a little data analysis can improve your content — Joe PairmanSpecific content attributes
Looking at relationships
• Excel CORREL function (0.3 or above is respectable)
• Scatter chart, with optional trend line
• But remember that correlation is not causation!
How just a little data analysis can improve your content — Joe PairmanSpecific content attributes
Correlating with XHTML / XML structure
• For example, pages with more graphics:
<img> or perhaps <fig>
• More subtopics on a page:
<h2> or perhaps use information from DITA maps
• Several ways to automate this: Python with LXML library is powerful and not too intimidating
Specific content attributes How just a little data analysis can improve your content — Joe Pairman
Bounce rate
• Why should we try to keep people on the site? Don't we want to give them the answer and then have them leave satisfied?
• However, bounce rate can indicate things like whether links are being used — (correlate links on page to bounce rate)
Specific content attributes How just a little data analysis can improve your content — Joe Pairman
Combining ratings with non-web data
• Assign human-judged ratings and see how they match up. (Is a particular word usage important? Friendly style?)
• (For support content) Matching to support call issues. What types of pages are used more on the web v.s. called about?
Next steps
Next steps How just a little data analysis can improve your content — Joe Pairman
Web data in the whole organisation
• Content teams should have access to the data
• Can not only improve content but provide valuable feedback for other groups in the organization
• Resourcing may require persuasion
• Potential legal issues may need to be addressed
• Once we have the data, we need to treat it responsibly
How just a little data analysis can improve your content — Joe PairmanNext steps
Schedule
• Search terms — every six months
• Synthetic metrics dashboard — every month or two
• Specific questions — as necessary
How just a little data analysis can improve your content — Joe PairmanNext steps
General principles
• Always present data in terms of the question it’s aiming to answer (though it’s good to explore the data first)
• Surprises are good. They indicate that you're not just confirming your prejudices.
• Don't assume that your data answers the question. Be very suspicious. Use all other sources possible. And use common sense.
• Watch your resources.
• Analytics is not going to write your content or guarantee its success. And it's reactive — only measures what's there, not what could be there.
Further information
Further information How just a little data analysis can improve your content — Joe Pairman
Useful resources
• Search Analytics for Your Site, by Louis Rosenfeld — a thorough and thought-provoking investigation of applications for internal site search data (Also see slide deck with some key points at the same link.)www.rosenfeldmedia.com/books/searchanalytics/
• Best Practices for “Was this helpful?” — a discussion about the design of page ratings systems:www.ixda.org/node/24101
• For “Was this page helpful” data, should I take response rate into account? — a question with some useful comments and answers:stats.stackexchange.com/questions/46428/for-was-this-page-helpful-data-should-i-take-response-rate-into-account
Further information How just a little data analysis can improve your content — Joe Pairman
A simple synthetic metrics dashboard — steps
In Excel:
1. Get data from each source such as your analytics tool and your ratings database. Get the data in any format that Excel can open.
2. Combine the data from different sources. Use VLOOKUP formula if the value you’re matching on is to the left of other values; INDEX and MATCH if not. If matching on page title, remember to allow for any underscores / percent encoded characters / garbled characters.
3. Calculate rankings for key metrics. See slide 38. An example formula:=IF(RANK(AC2,AC:AC)>ROUND(COUNT(AC:AC)*2/3,0),"Low",IF(RANK(AC2,AC:AC)>ROUND(COUNT(AC:AC)*1/3,0),"Medium","High"))
4. Set synthetic metric indicators. See slide 39. An example formula:=IF(AND(AC2="Low", OR(L2="High", L2="Medium"), OR(N2="High", N2="Medium")), "1","")
Or, get your data as CSV/TSV, do steps 2-4 with a Python script, write to a CSV file, and then open the result in any spreadsheet package.