seo split tests you should run - will critchlow

Post on 07-Jan-2017






Click to see full reader


SEO split testsyou should be running

Will Critchlow / @willcritchlow


Robert Liston famously:Carried out an operation with a 300% mortality rate

Via: reddit

He amputated:1. The patient’s leg2. His assistant’s fingers

Via: reddit

Before germ theory, 25-50% of patients died from infections

(Speed also used to be a prized surgical skill pre-anaesthetic)

It wasn’t always confidence-inspiring

Liston could amputate a leg in 2 ½ minutes

Liston could amputate a leg in 2 ½ minutes(but in his enthusiasm he once cut off the patient’s testicles too)

“Welcome.I’ll be your doctor today.”

Confidence inspiring stuff

The “Liston” of site migrations

Step 1: fail to put redirects in place

The “Liston” of site migrations

Step 2: rel=canonical every page to the homepage

Good for the patientBad for the patient



Good for the patientBad for the patient



Mercury for syphilis

Good for the patientBad for the patient



Mercury for syphilis

Not washing hands

Good for the patientBad for the patient



Mercury for syphilis

Not washing hands Garlic + Onion

Of course a lot of deliberate things were neither harmful nor beneficial

Cargo cult:During WW2, Pacific islanders who had never seen

manufactured equipment saw modern military planes bring cargo to their remote islands.

Read Richard Feynman’s speech

Cargo cult:After the war, cults developed that tried to recreate

the conditions that “brought” the planes(runways, control towers, military uniforms)

without understanding what had really happened.

Read Richard Feynman’s speech

Do we have our own cargo cults?

Do you recommend changing h2 to h1?

Do you have a good reason why?

Even if it does help, does it help enough to be worth it?

Let’s start washing our hands

The scientific method

Step 1:Generate hypotheses

In medicine, old wives tales are a great place to start. Looking at things that appear to work, but we have

no idea why is a good source of hypotheses.

A great example is this 1,000-year-old “spell” that included garlic, onion and cow’s stomach, and turned

out to kill MRSA.

I guess the SEO equivalent is to ask the old timers.

In all areas of science, you are at an advantage if you can figure from first principles. Richard Feynman

famously used to draw what became known as “Feynman diagrams” to understand sub-atomic

interactions through thought experiments alone.

The SEO equivalent is to stay abreast of information retrieval and ML papers and formulate hypotheses

based on an understanding of how the algorithm likely works.

Finally, you can go mining the data.

The obvious SEO equivalent is the various correlation studies into ranking factors.

In both medicine and SEO, you obviously have to be wary of spurious correlations. Blindly mining data

can get arbitrarily high correlations (the example above has a correlation of 0.993!).

The scientific method

Step 2:Try things in the lab

Problem: results may not hold, or may come with new side effects

In the SEO space, this is work like that done by

IMEC labs.

It involves attempting to run controlled

experiments on test domains and / or with

volunteer participants. The outcomes are

normally not improved rankings or traffic that

participants care about.

Potential pitfalls

What works in tests may not work in the real world

Source: National Institutes of Health

Do you recommend http → https migrations?

All else being equal, secure is better. All else is never equal.

Side effects may include ranking fluctuations, traffic drops, difficult conversations with your boss.

Side effects

may include headache, nausea, vomiting, death, dizziness, dysentery, cardiac

arrhythmia, mild heart explosions, varicose veins, darkened stool, darkened

soul, lycanthropy, trucanthropy, more vomiting, arteriosclerosis,

hemorrhoids, mild discomfort, vampirism, spontaneous dental hydroplosion,

sugar high, even more vomiting, and mild rash.

The scientific method

Step 3:Gold standard scientific trials

TL;DR scurvy bad, science hard

You should read the story of one of the first controlled scientific experiments that proved lemons could

cure scurvy (in 1747!). The incredible story of how the discovery supported British naval supremacy, and

then how compounding errors involving the colonial supply-chain, faster steam-powered ships, and polar

bear offal led to the loss of the knowledge, the death of polar explorers, and the eventual rediscovery of

vitamin C.Source: idlewords

How SEO split tests work

You might have seen @TomAnthonySEO tweeting about the platform we’ve built to make this easy

Excuse a brief diversion into geeky details

Instead of comparing the performance of the control pages directly with the variant pages, we build a

forecast of what’s called the counterfactual which is an estimate of what would have happened if we hadn’t

made the change. We use the control group to make a counterfactual forecast that takes into account

seasonality and site-wide changes.

The black line on the chart above is the actual organic traffic to the variant pages. The blue line is the


More: Distilled blog post and free forecasting tool

It’s easiest to analyse the results by looking at the cumulative difference over time between the actual

organic traffic and the counterfactual.

The pale blue area is the 95% confidence interval.

We can see a (statistically) zero effect for an initial time while Google crawls and indexes the test,

followed by steady growth. A couple of weeks in, the confidence interval goes above zero and we have a

winning test.More: Distilled blog

It’s easiest to analyse the results by looking at the cumulative difference over time between the actual

organic traffic and the counterfactual.

The pale blue area is the 95% confidence interval.

We can see a (statistically) zero effect for an initial time while Google crawls and indexes the test,

followed by steady growth. A couple of weeks in, the confidence interval goes above zero and we have a

winning test.More: Distilled blog

Hashtag winning

What should you be testing?

Add structured dataOne of the easiest tests to run is the addition of structured data - we recommend via JSON-LD.

We got one of the fastest and clearest uplifts we have seen so far with

the addition of structured data to detail pages. This chart shows the

uplift from adding location-based data to individual property pages.

Improve your organic “adverts”Advert testing plays a huge part in PPC. Looking at typical meta descriptions, it appears it’s rarely a priority in organic.

More: Distilled blog

This is the chart I showed you earlier when I was describing the statistics.

It’s actually an uplift from improved clickthrough rate. We didn’t detect

an accompanying ranking improvement during this experiment.

Make your site mobile friendlyI’ve spent a lot of time trying to persuade people to do this without data to back me up.

Now I’m going to carry on with data.

More: @TomAnthonySEO

This chart shows the uplift from making a bunch of category pages

mobile-friendly (with some simple responsiveness) on a holiday site.

Just to help prove that these are real uplifts, we ran a “null” test

designed to have no impact

...and there are tons of tests where we don’t have pretty charts we can share yet

Tabbed versus flatWe know Google in particular is paying more attention to CSS and JS. How much difference does it make it content is visible initially on page load?

Additional contentYou might want to test both adding and removing additional content on category pages.

This would test the benefit of additional text vs. increased focus and possibly-improved usage metrics.

BreadcrumbsHow much difference does it make if you add breadcrumbs to product pages?

Note: this introduces the complexity of testing internal linking. I’ll come back to this.

Canonicals vs. noindexWe’ve often argued about the best ways of keeping certain pages and page-types out of the index.

Argue with data.

We have all kinds of keyword-targeting test ideas

● Simpler messaging○ (what happens if you have less keyword targeting?)

● Timely keywords○ (what happens if you add "2016" in appropriate places?)

Argue with data

We’re running tests like these right now

Follow @distilled to hear the results first

If you’re going to implement split-testing, there are some things you should know

You can’t assume traffic equality between “buckets” of pages

This is why we build a counterfactual comparison using control pages.

Different pages can have different seasonality

For example, “roses” pages on valentine’s day. You need to cut outliers.

One site I looked at had 72 <html> tags on

a single page

You’ll find some of your work more sensitive to amusingly broken


We’re not quite sure how to model cross-section impacts

This will be needed for testing internal linking structures, for example.

You may detect unexplained phenomena

In medicine, this would be things like the placebo effect with no known


We may find that things that “shouldn’t” work, in fact do drive uplifts.

We can speculate that the continuing benefit of changing 302s to 301s (despite Google’s insistence that

302s don’t lose PageRank) is to do with them losing other link signals, but we don’t really know.

I’m not sure this matters.

It’s changing the way we make recommendations

The big one:Business cases

I wrote more about this in my better business documents post

But I’m also seeing more subtle impacts on my recommendations:

● You can recommend small tweaks and see the benefits compound

● You can test wild hypotheses with unknown upsides

● You can try things that might have a downside (more focused targeting, less copy, etc.)

And that’s even before you get the benefits of testing clickthrough rate, and the benefits of pretty charts

to show the boss highlighting the impact of your work!More: blog post

Our work is so much easier than theirs

But still, let’s move past“cut the leg off as fast as you can”


PS - We’re hiring:

● Surgeons - Phalinn Ooi

● Operating chair - Peter Pelisek

● Potions - Sam Simpson

● Old operating theatre - Uglix

● Searchmetrics rankings drop - img_eisy

● Air drop - Wikipedia

● Test tubes - ironpoison

● Syringes - ad-vantage

● Pills - ashleyrosex

● Stethoscope - proimos

● Old wives’ tale - Jon Bunting, pgillard and John


Image credits

● Richard Feynman - Juana la loca, dullhunk and


● Spurious correlation - Tyler Vigen

● Lemon, lime, polar bear - abhijittembhekar,

libraryman, ucumari

● Blackboard - arenamontanus

● Facepalm - brandongrasley

● Buckets - mamarazzi

● Rose - alicelingching

● Girders - JFB119

● Ghost - daveallday

● Bezos - jurvetson

top related