jim.shamlin.com

11 - Test Your Hypotheses

This chapter will focus on the test itself - how to approach conducting a test from start to finish, avoiding common mistakes that can corrupt or invalidate test results.

Set Test Goals

The first decision in any test plan is determining what you want to learn - it is not sufficient to make change and see what happens, as there are a universe of possible outcomes that can be observed, and which are not at all meaningful.

Essentially, your goal is to influence customer behavior and increase the number of people who do something very specific. There is generally an ultimate goal and steps along the way. The author presents rather a long list, so to cite only a few, you might wish to motivate the customer to request a quote, complete a contract form, schedule a demo, download a brochure, sign up for a newsletter, and other goals prior to making a purchase. Also, a single purchase is generally not the end of the road, so you may wish to test to determine how often a person returns to you, whether the average order value increases, whether the cost to serve them decreases, whether they advocate to others, etc.

Goals are not unique to the commercial industry: a nonprofit organization has some of the same goals (visit our site, request a brochure, etc.), and generally seeks to motivate the user to support the organization in some way, not only by donating money but by participating in some way.

There is a warning about being too myopic. If you set a goal of "clicking an ad to visit the site" this may be too shallow. It is fairly easy to drive people to a site, but this does not guarantee that these new visitors will do anything meaningful when they arrive. That is, when visitors are ushered to your site by an advertisement, the proportion of them who purchase will be different to that of your existing site visitors - or if the advertisement succeeds in promoting a sale, it may be a one-time sale from a visitor who will not return afterward.

With this in mind, the author asserts that "measuring CTR has done more harm than good" in that it has motivated firms to pursue visitors rather than sales.

Goal Triggers

In order to be measurable, a goal must be expressed in terms of user interactions. For example, we can measure how many users loaded a page, but not how many actually read the content of the page. Measurable events are measured by clicks (EN: if your tracking is granular enough, you can also witness mouse movements, scrolling events, window resizing, keystrokes, and other non-click activity.)

The author suggests that an e-commerce purchase is triggered by the loading of the post-purchase page that thanks the user for the order and tells them what to expect in terms of delivery. (EN: which is generally reliable - but again, more granular metrics would consider the click of the button and the loading of the page to be two separate events to identify malfunctions or impatient users who do not wait for the following screen to fully load.)

The author does get around to mentioning click rather than load events to track when your site visitors clicked a link on your site that led to a different website where you would be unable to track page-load events directly, which can be done by using a redirect or an interstitial page when the user leaves the site.

(EN: I find myself continually returning to the notion of tracking - the author's approach is fairly primitive by modern standards - in spite of the fact the book was published this year. He seems focused on third-party tools like Google Analytics, which collect very superficial data. The ability to do robust analytics requires reporting mechanisms to be coded into the site to capture data that the standard web log does not record.)

One or Many Goals

The author returns to the notion that for any transaction there is one goal, which generally occurs at or near the very end (the customer makes a purchase) and that intermediate steps are meaningful, but do not substitute for the one true goal of a site.

There is also the problem that occurs when multiple "goals" are assessed - to measure when a user schedules a sales call, downloads a brochure, subscribes to a mailing list, or makes an on-site purchase. This level of strategic ambivalence is a characteristic of a firm that has no clear outcome and is happy for any form of engagement - and in casting a broad net neglects its most important objective.

This is slightly different to fragmentation. Tracking a click from an ad to the site, from the landing page to a product, from product to checkout represents steps to reach a single destination, rather than divergent paths to multiple destinations. This represents an organization in which different people or departments have responsibility for different steps - the danger in which is the potential for lack of coordination or conflicting interests (the marketing department feels it's done a great job by getting a huge number of non-buyers to visit the site and bounce out).

It is also noted that the conversion path may be frayed on the front end: visitors come to your site as a result of an email campaign, but also show up with those referred from a search engine, a blog, a banner ad, a billboard, a direct mail piece, etc. Segmenting audience by source will prevent ambiguity and misdirection.

Phone Call Conversions

Conversion metrics are not unique to the digital channels. Firms can (and should) track the performance of other channels, such as phone calls. It is entirely possible to associate the phone number of an inbound or outbound call to interactions of the operator: how long the conversation lasted, what screens the operator viewed on internal systems, whether there was a sale, etc.

There is also the issue of multichannel conversation - knowing that the person who is calling had previously visited the Web site and what pages were viewed. If users enter a phone number as part of the online process, or mention their account number in the conversation, the two can be correlated - but often there is no way to make the connection for prospects or those who do not provide identification or complete a purchase.

The author is unaware of a shrink-wrapped solution for doing this - call center management vendors and ecommerce vendors are generally separate breeds, and while some claim multichannel support, it's superficial and wholly insufficient.

Keep It Simple

For organizations that are new to conversion channel metrics, the author's advice is to start simple and small rather than attempting to leap into the tall grass and take on sophisticated approaches.

He also suggests that the scientific approach can be overdone, to the point where statistical reports "would stand up to a doctoral thesis committee" with outcomes determined to the fifth decimal point and six-sigma accuracy. That's way more than is needed to be productive. Moreover, because all tests have inaccuracies, an outcome that's "plus or minus two percent" means that being accurate to 0.1% is not necessary because the accuracy will be overwhelmed by the variance.

And so, start simple with large goals and don't get fixated on the mathematical certainty.

(EN: It's worth noting that the level of certainty for medical experimentation is 99.9% and most of the hard sciences seek a 95% level of confidence. Psychology and soft science have often settled for less - as I recall a one-sigma spread, which represents a 68% level of confidence, is often considered sufficient to support a claim of causality.)

Choose the Test Area

The scope of the test considers not only the pages that will be changed to create a change in user behavior, but the full path from beginning to end, as you will wish to know whether a change on the home page results in an increased number of orders, and the behavior in-between may be affected,

A change may be made to a single page, a site-wide template (every product page in a catalog), a site-wide design element (the navigation bar on every page), or within a section of microsite.

A change to a template or common element aggregates data from all flows that involve any page containing the template, but the author asserts it is still useful to do a broad test for a number of reasons:

For individual pages, static content that is the same for every user is easier to test than dynamic content that varies per user. These differences must be accounted for in your test results, and will cause the test to require more time if only a fraction of the audience sees a given bit of content. This is particularly important to consider because dynamic content may be the cause of different outcomes, undermining the validity of the test factors.

The author also refers to a "stylesheet test" that can be used to effect changes site-wide, such as changing the font of every page or the color of links. He mentions CSS for the layman, suggesting that a single file can be changed to alter the appearance of elements across the site (EN: which was the idea behind CSS, but in many instances they have not been well executed or the site's coding techniques are inconsistent, such that and changing the appearance of a button requires coding to place containers in each page and ensure that they are all coded in the same manner.)

Choose the Test Type

(EN: The author gives an outstandingly bad description of test types and an even worse set of suggestions on choosing which type to conduct. Essentially, you must choose the type that tests what you want to learn. Specifically, if you're changing only one thing it's an A:B test. If you're changing multiple elements and want to know about their effect in various combinations, it's a multivariate. If you want to test a new design against an old where multiple things are changes and you don't care about testing different combinations, it's champion-challenger. If you arbitrarily choose an inappropriate test type, you won't get the results you would like, but that's a sacrifice that must be made if traffic volume is too low to support granular investigation in a reasonable amount of time.)

Isolate for Insights

The author suggests three different purposes for optimization testing:

  1. To increase revenue, either by selling to more prospects or getting buyers to purchase more of your product. This is highly attractive because a 20% increase in conversion means 20% more revenue (EN: In the short run. Beware the long-term consequences.)
  2. To increase competence, in that the results of a test inform employees of what works so that they can make better decisions in future, and the knowledge gained in one area can be leveraged by others.
  3. To gain market insight, enabling you to learn about your customers and prospects to further refine your testing program and to leverage information for similar interactions.

(EN: This is likely a bit stilted, as competence or insight are both significant only if the knowledge gained is used to increase revenue. Might be a better approach to tie tests to specific ways of increasing revenue: more purchases, more revenue per purchase, more purchases over time, grater longevity of customer relationship.)

Random Tips

The author provides tips for getting valuable results from testing:

  1. Test Boldly. The use of a test should give you the option to experiment with changes that people are too squeamish to suggest under normal circumstances
  2. Be Patient. Testing takes time, and eyes are on the results report an hour later, attempting to draw conclusions (EN: and generally, conclusions that support their opinions only). Get commitment to run the full course of a test before beginning it.
  3. Be Selective. Just about anything can be tested, but not everything should. Make smart decisions to whittle a test down and test a manageable number of changes that seem logical and likely, rather than testing every idea indiscriminately.
  4. Avoid Committees. Getting together a group of people to decide on a test results in very bloated test plans and a long process to decide what to test, and generally gravitate toward the status quo.
  5. Be Practical. Testing to the 99.9% level of confidence is not really necessary for most purposes. A 95% level should be a common target, but even an 86% level of confidence may be meaningful.
  6. Hold Still. You also need commitment to keep the control page as it is for the duration of the test. Changing the control will change the results. There may be some exceptions if the element to be changed is assumed to be irrelevant, or if the change to control can also be replicated in test versions of the page - but even then it's a bad idea.
  7. Focus on Outcomes. The first step in testing is determining what you want to learn. Devoting yourself to a given strategy or a specific tool may leave you in a position to be unable to gather the information you need. So consider outcomes first, methods second, and tools third.

While the author works for a consulting firm that conducts optimization programs, he warns about putting yourself into the hands of an agency and letting them decide what testing to do, as that is also a good way to get useless results - or worse, results that point you in the wrong direction.

But then he sells the benefits of contracting out: to get more experienced hands on the work, to get as-needed help rather than paying for dedicated staff who sit idle, to leverage their efficiency, to get inside details about other clients they have served, to have an objective and unbiased voice, etc.