Escape the Groundhog Day of Testing (part 4)

Craig Sullivan
4 min readJul 26, 2015

--

The Hypothesis Toolkit

Because we observed Data evidence [A] and feedback [B]

This is all about causes — something happening, someone finding something:

We saw something.

We measured something in the analytics.

We saw a huge dropoff at step 3 in the checkout.

In the usability test, people couldn’t use the picture controls.

People sign up but they don’t click on the activation email

In essence, there are one or more sources of evidence — the more they triangulate and confirm the problem, the better. The solidity of your hypothesis gets exponentially stronger the better the inputs and evidence you push into that process.

Having this part of the hypothesis statement (“Because we observed”) means you actually are suggesting (gasp) some evidence, reason, suggestion, hint that there would be value in trying this thing out!

In other words, not a total guess but an informed one, a data driven one and not sourced from the inherent randomness of the universe.

And don’t get me wrong here — I’m not telling you that every test needs a hypothesis.

I WILL test stuff like buttons without a hypothesis but that’s because the word ‘Continue’ or ‘Submit’ is so sub-optimal I know it can be improved, from prior testing experience. It’s an easy and lazy test to run — but I don’t really learn much from it.

I’m not saying all your tests have to have a hypothesis — just that you have to have a real ‘no brainer’ change to override my inclination to insist on one.

If you frequently get ‘reason free’ product changes or tests being created, add this part of the statement to your hypothesis kit. Ask people to fill in the blanks in the sentence.

For [A] you need to supply some quantitative data — a measurement of how big the problem is, where it is — the sizing.

For [B] — you need to have some additional feedback, preferably qualitative data. For example, [A] might be the dropoff rate in your checkout steps and [B] might be the usability testing or session replay recordings that highlight what may be driving the quantitative data seen.

“In choosing a hypothesis there is no virtue in being timid. I clearly would have been burned at the stake in another age.”

Thomas Gold

We believe that making change [C] for people [D] will cause outcome [E]

This is it — the main bit! And this is also where it goes wrong, because people arrive at this part before reviewing all the evidence properly.

If you’re optimising a page, process, journey, website, product or service — you need the right inputs to form this. I’ll cover this in more detail in the hypothesis sketching article.

For [C] you need to supply the change you are making. This could be something like “We believe that [upping the font size on our product descriptions]” “We believe that [offering a free trial of our product]” and so on.

For [D] you need to specify the population — the target that will be enclosed and included in the test. If sitewide this will be “All visitors” or “People going through checkout” but it could be a targeted group or a personalisation triggered by a rule.

For [E] you need to specify an outcome which is capable of being measured. You need to set the thing you will measure and how you will measure it BEFORE you do the test. Never ever look AFTER a test and try to ‘find stuff’ that vaguely looks like your test is a winner. Metric first, test second.

“If the facts are contrary to any predictions, then the hypothesis is wrong no matter how appealing. It is absolutely essential that one should be neutral and not fall in love with the hypothesis.”

David Douglass

We’ll know this when we observe data [F] and obtain feedback [G]

And this is the missing bit for a lot of companies. You need to be able to measure a positive impact of the ‘thing you changed or tested’ and you must also be able to measure a negative impact.

If you can’t disprove your hypothesis and only ‘consume’ data that supports it, you’re biased. You need to be able to prove the reverse of your assumption or hoped for test result — so you don’t draw a false conclusion.

[F] is your defined metric, agreed pre test or change and instrumented (if needed to support the test) in your analytics data and double, triple checked for collection accuracy and how it is calculated.

[G] is really important. If you’re doing optimisation and data driven work properly, you have a qualitative feedback loop in place. You’re effectively exposing yourself and the team to as MUCH possible feedback on the change from surveys, polls, post purchase questions, usability tests, session replay or other sources. There is no point in measuring improved revenue if the change is pissing people off — make sure you measure qual as well as quant metrics in your testing approach.

In the next article, I explain how to use this to deflect stupid work requests and present a tweaked version of the Hypothesis Kit.

Read part 5 — Deflecting Stupid Change Requests

--

--

Craig Sullivan

Conversion Optimisation, Usability, Split Testing, Lean, Agile,User Experience, Performance, Web Analytics, Conversion Optimization ,#CRO http://t.co/BSWwzHj00S