jim.shamlin.com

11: Moving from Analysis to Site Optimization

Designing the analytics tools and gathering the information is necessary, and it's important to do it well, but ultimately, the value of analytics is to use the information to identify opportunities for improvement that will impact the firm's bottom line.

Testing Methodologies and Tools

The most fundamental optimization test is the AB test, in which two versions of a page are placed online to compare their performance against one another. Typically, the existing page with no changes is used as a control, and another page that has been altered is used as a test. Ideally, only one element has been changed (the verbiage of a headline, for example) in an attempt to isolate the effects of that change.

An ABC, ABCD, etc. test can be performed, in which multiple versions are tested (three different options for the wording of a headline), which can be more expedient than doing multiple AB tests to compare multiple versions. On a smaller site or a page that doesn't draw much attention, it may take a significant amount of time to get enough data to be statistically relevant.

A "multivariate" test is used when multiple changes are made (you change the wording of a headline, the typeface, and the color - or multiple elements such as the headline, a photo, and the placement of a button), which is more practical when it comes to site designs that involve multiple changes. The drawback is covariance: you will not be able to isolate the precise effect of any single element, whether they amplify or counteract one another, so you may accept the results but not have a clear indication of which exact change was the most influential.

(EN: In my experience, the lab types are very interested in isolating each element and studying it in minute detail to determine with painstaking precision what exact changes have what exact effects, and are driven by the desire to have results that can be re-used. Meanwhile, the business types merely want to know if something is better or worse, for their immediate benefit, and don't much care for the granular details. Mitigating such arguments is a thankless endeavor, and the decision is usually made by who's footing the bill.)

The author lists more than half a dozen brand names of testing software. In the end, it doesn't matter much: your testing plan is far more important. However, if you're looking to buy a solution, consider multiple vendors. Some of these tools are expensive and require a considerable commitment, and may not do what you need them to do.

(EN: It's also worth noting that most tools tend to be myopic. They will report the immediate results - how many people clicked a link, and how many made a purchase afterward - but miss the long-term, such as how many of the people become regular customers, as well as the broader vision, such as how many people were already customers.)

Test Design

There is no limit to the things that can be tested: you can see if any page element leads to any behavioral reaction on the part of site visitors. However, "pure" science is not an effective use of your firm's resources, so your tests will be driven by the goals of the company. As in any applied research, it must begin with a reasonable question (hypothesis) that seeks to yield a result that is meaningful and actionable.

In practice, testing generally falls into one of six major areas on interest:

  1. Pricing - Determining the effect of a change in price in the quantity demanded (simple economic elasticity)
  2. Promotion - Determining whether a given promotional message elicited the desired response (to click from ad to site, to purchase a product, etc.)
  3. Design - Testing a change to the design of a page to determine if it had a positive impact (generally in terms of completing an action on the page, or clicking to the next page in a flow)
  4. Content - Testing a change to the wording on a page to determine if it had a positive impact. It's noted that the notion that "nobody reads on the Web" has been disproven by tests of this nature, showing that page content can make a considerable difference.
  5. Site Navigation - Testing to determine if changing the information architecture of the site (the placing and prominence of links, how pages are linked together) has a positive inpact.
  6. New Functionality - Any time a change is made to add functionality, it should be tested before being rolled out to the full audience to ensure that it does not have a negative impact.

The author returns to the notion of prioritizing. Once you "sell" the benefits of testing to an organization, it will want to test everything, and will not provide sufficient budget or resources to do so. The author goes on a while, but it largely comes down to two factors: impact and accuracy. "Impact" refers to a realistic estimation of the potential of the test - that is, the potential profit of identifying and fixing a problem, compared to the cost of the testing. In effect, it is a ROI for testing. "Accuracy" is another assessment as to whether the test in question will be able to yield accurate results - there may be a serious problem, with great profit potential, but the methods available to you for conducting a test may not be adequate to produce reliable and actionable results.

There is also a process for designing a test plan, once you have decided what to test:

  1. Identify opportunities. This is done when there is no-on knocking at your door to test something specific. Looking at site traffic patterns should help identify areas where testing could yield benefits.
  2. Develop a hypothesis. Once you know the problem, consider a possible solution and develop a reasonable supposition about what can be corrected or implemented to lead to a desirable improvement.
  3. Determine methodology. Decide what kind of test can be done to answer the question posed by the hypothesis. (EN: This would also be a good place to do some secondary research and check previous tests to determine if the question has already been answered.)
  4. Define success metrics. Consider what metrics constitute success. This may seem obvious, but it merits a pause to consider if you're looking at the real goal. (Testing click-through rate on an ad is less important than purchases made by those who saw the ad.)
  5. Design test versions. Whether it's an A:B test or multivariate, you must design and develop the Web site interfaces that will be used.
  6. Conduct the test. Seek an opportune time and launch the test. Resist the urge to react to the data you see come in until the test period is complete.
  7. Analyze the results. Take the test page(s) out of the production environment and analyze the results of the test to determine if there is a statistically significant improvement
  8. Implement a solution. Unless the test was inconclusive, or the test versions performed worse than the existing interface,

The author notes that test results often are not repeated when the changes are put into the production environment - though the "general direction" is usually the same, the degree of impact may be different. The author doesn't provide much in the way of an explanation for this discrepancy, but it generally has to do with flaws in test methodology (even the best-designed test is imperfect), or it may have to do with changes in other factors that have nothing to do with the test itself.

It is also suggested that the change, itself, may skew the results. Customers who are familiar with the way your site used to be are going to be a bit unsettled because it will be new and unfamiliar to them. So it's not unusually, even in the test period, for the altered versions of a page to get slightly less successful results at first, and become more successful over time as customers get used to them. (EN: This is only applicable when you are testing something the customer has used in the past. It also counts on familiarity, which is often exaggerated from the internal perspective: the Web team and the product manager have seen the same page many times and know how it functions. A customer who uses it once a month, or once a week, or even once a day, may not actually remember it and a slight change would not be noticed unless you pointed it out to them.)

Optimizing Segment Performance

One danger in site optimization is the assumption that site visitors are homogeneous, or to consider the differences among individuals to be completely irrelevant. The acceptance of this fallacy drives site operators to a "one size fits all" approach that often results in a "one size fits none" compromise.

While it is generally not feasible to identify each individual who visits a site, any more than a company can realistically expect to know every customer and prospect as an individual, it should at least be able to segment audiences into groups with similar interests.

One example segments site users by behaviors, contrasting individuals who have taken a specific action in the past (purchased a product, clicked a given link on the home page, etc.) from those who have not taken that action to determine the difference in behavior on a later page.

Another example segments users by time. It's entirely possible (and probable) that your site receives different "kinds" of visitors at different times of day (during business hours versus later in the evening), different days of the week (weekend versus weekday), etc.

The results may indicate that the behavior of one segment is different from another, and even that a change that has a positive result for one may have a negative result for another. In either case, there may be a solution implemented to impact a specific segment (different for the weekend than on weekdays, a different for people who have ordered in the past versus new customers, etc.)

Planning for Optimization

The next challenge you face, which is a significant one, is getting the resources to actually perform optimization. Most organizations budget for specific projects, which have a tidy conclusion, and funds disappear afterward. Getting an improvement made means launching a new project, with the lengthy planning and budgeting processes required.

One approach is to build optimization into maintenance costs. Typically, there is a set budget to cover ongoing costs for the Web site (hosting, advertising, SEO and marketing, etc.) to which a line-item can be added. The author suggests setting it at 5% of the total budget. Questions may be raised, and it should be fairly simple to convince others of the potential value, and get others agree that it should be done as an ongoing effort. The negotiation is even easier if you can show the results of past efforts, to suggest that, on average, every dollar spent on these small improvements yields $X in additional profit.

In terms of human resources, there are no time-tested job descriptions - but the skills needed are fairly straightforward: your optimization team must have individuals with strong analytical skills (specifically, in statistical analysis); the ability to identify problems and opportunities; the ability to design and conduct experiments; the ability to understand and present financial impacts of the decisions; and the ability to communicate and negotiate within a business organization. Some of these skills are rare, and an individual who has all of them is even more rare, but the skills themselves can be adapted and learned from existing roles. As the practice matures, the talent pool will grow. Until it does, you will need to plan to do a great deal of training.

Overcoming Doubts

Because analytics is a relatively new field, those whose assistance and support you will need will have reservations about it. The author points to some common objections:

The IT department's default answer to anything is "no." They are buried in other work and see testing and analytics as just another headache they don't need, and will use their usual resistance tactics: demanding a lot of budget, insisting it has no value, claiming it jeopardizes system performance or security. All of these are fairly easy to overcome if you gain some basic knowledge and can speak to these specific concerns.

Executive resistance is a more severe problem, as analytics is something they haven't done before and may not see a reason to start. If you can provide a good case that illustrates the return on investment, "you'll be amazed how fast barriers come down."

The author then provides an anecdote about dealing with an IT department that threw up a lot of blockades, which they overcame by getting executive backing by demonstrating the potential cost-benefit and the difference in cost between using the internal IT resources versus hiring an outside firm to do the testing.

(EN: This seems a bit too straightforward - a facile solution to overcome straw-man objections. The real motivation for shooting things down is often hidden: you are competing for budget dollars with other issues in which people are more vested, it may be seen as a power-play to gain influence, etc. To blithely assume that the reasons for objection are limited to the obvious and that it's a simple matter of presenting a financial analysis risks getting blind-sided by factors you fail to consider.)

Learning from Your Successes and Mistakes

In most instances, business defines metrics with the intention of proving that they succeeded (and if they don't get this proof, they will want to question the methods or cook the numbers). But the goal of testing is merely to observe what happens, with no other objective than to provide an accurate measurement. The results may not be positive, or they may not be meaningful, but the information provided by testing is a driver of future action, which is ultimately more important than assessing the "success" of past action.

It's also worth noting that mistakes can be made in the conduct of a test: one may identify bad metrics, or fail to collect the right data, or fail to capture if for technical reasons. Or the test may be executed perfectly, but the results may not be particularly useful, or the difference may not be statistically relevant.

With this in mind, setting expectations is important. The notion of a "test" implies clinical perfection and mathematical certainty. A new testing program must seek to obtain the latitude to make a few mistakes during the learning period, and even an established program must avoid setting the expectation that every one of its tests will be perfect and yield highly relevant and actionable information.

The author suggests that hiring experienced testing consultants to help with a new program can be a quick way up the learning curve, as their experience will help bypass the trial-and-error process of setting up a new program. (EN; Remember that the author works for such a firm, so this advice, which seems entirely reasonable, may also be driven by commercial motives.)

An important caution: do not oversell. A common mistake for new programs is to set out to prove their value quickly, which may lead to designing or interpreting tests in order to provide strong positive conclusions that are actionable. This is next of kin to dishonesty, and if a company invests significant resources in a plan based on a stretched or distorted version of the truth, the test will likely be blamed and, on closer scrutiny, the deceit will be revealed. Not only will the testing department be dismantled, but heads will roll.

Test Examples

The author provides examples of tests on six functional areas: pricing, promotion, messaging, page layout, new functionality, and navigation:

Price

Price tests are fairly simple: upon visiting a page, some users see a different price than others, and you can measure the difference in the percentage of users who add the product to their cart and, eventually, by the item. The math is also quite simple: if you drop the price by 10%, you must get at least 10% more sales to make as much money - any more than that, and the change in price can be predicted to be successful.

(EN: Two points on this: first, you can also test the effects of a price increase as well as a decrease. Second, you should also consider the impact of increased sales volume on associated costs - e.g., to pick, pack, and ship more items takes more labor, and costs more if the seller offers "Free" shipping to buyers. These are accounting rather than testing concerns, but worth mentioning.)

Promotional

Promotional testing measures the effectiveness of a message on a given audience. For example, do more people respond to an ad that offers an item at 20% off of a $10 price, or to one that merely offers a low price of $8, or to one that claims to offer $2 off (which is mathematically the same).

It's worth noting that the design and copy of the ad should be otherwise identical, or else your test results may be influenced by covariance For example, if you're comparing a promotion of two different items, any part of the promotion may have caused the difference in behavior": the item itself, the image, the verbiage, the prices, etc.

Message

A message test changes the wording, but no other element on the screen. For example, whether a button indicates "get a price quote" or "get a quote on a new car" or "get a quote on a new vehicle" or whatnot will have a difference on the number of site visitors who click through.

It's also noted that difference in imagery is also a difference in "message": a car dealer may find that individuals click the "locate a dealer" button more or less frequently if the image shown above it is of a pickup truck, a convertible, or a minivan.

Page Layout

Page-layout tests consider the placement of images on a page: whether placing an advertisement on the right or left, top or bottom of a page makes a difference in the click-through rates.

Note that layout and design are often closely related, so it's important to recognize where a layout test is actually a multivariate test - comparing a wide banner at the top of the page with a narrow one on the right side tests both of those factors, not just layout.

New Site Launches or New Functionality

The test of a "new" page against an "old" one is often by its very nature a multivariate test, but a necessary one. You may be able to isolate each element that changes and test it individually, but when the full page with all changes is launched, it performs differently than the individual tests would have led you to expect. Such is the nature of covariance.

(EN: I would also submit that a significant change requires casting a broader net, especially if a change is made with a specific intent. A new site design might succeed in getting more purchases of a given category of products, but also fewer in another category; or it may result in more sales to new customers, but fewer to existing ones. The net result must be both positive and desirable.)

Navigation and Taxonomy

The way that the information is organized on the site may be a help or hindrance to its users. A good example is a clothing site: whether to enable the user to navigate to casual-shirts-men or men-shirts-casual or shirts-casual-men (etc.) will make it more or less likely the user can succeed in finding what they are looking for and ultimately purchase it on the site.

(EN: It's worth noting that site navigation becomes habitual - a person who has shopped the site a few times in the past will be flummoxed by any change in navigation simply because he is accustomed to the existing path. The test may need to be conducted over an extended period of time to get a more accurate sense of the reaction of more experienced users.)