8: Getting the Right Data

This chapter considers the various "types" of data that should be considered and, by considering their nature, to suggest the ways in which the data can be properly interpreted to yield more reliable conclusions.

Before getting into details, the author cautions about "data smog" - he will cover a lot of ground in discussing various kinds of data that could be considered, but by no means should you feel the need to collect and analyze all these forms of data as this will create a cloud of information that is largely useless. Refer to the earlier advice on starting small, deciding what's meaningful, measuring only what you intend to act on, etc.

Primary Data Types

The author slices data into four basic categories:

(EN: This is truly horrible. The difference between the first two "types" is clear, but the third can include any of the others and the fourth is a catch-all. A total wreck of a taxonomy, but there it is.)

The author expounds a bit more on behavioral data: it is readily available, in great quantity, and there are a lot of tools to gather and analyze it. It tells you what people are doing, but not why they are doing it, and the latter is far more important to determining what changes will be effective in getting the results you want. Even so, behavioral data is the basis of further analysis - it indicates where problems exist, and suggest questions you can ask to find the cause.

To get to the attitudinal data, the "why" that explains what is demonstrated by behavioral, you have to ask questions. Typically, this can be done by surveys of focus groups that provide information that immediately relevant, in and of itself, as well as a method of comparison against past results. Most commonly, comparing data from before and after a change is made is (assumed to be) an indication of the impact of the change.

(EN: The author doesn't go into much detail, which elides the "how to" and common pitfalls of market research - turn to other sources for that. But most significantly, he omits rational inference, which is often seen as arguable, but which also removes posturing from the equation - what people say in focus groups or in response to surveys is often not a realistic depiction of their genuine motivation.)

The author suggests there's some conflict in the analytics profession over whether behavioral or attitudinal data should be preferred, but it should be clear that they are both useful, and that they serve very different functions: behavioral data indicates historical actions, attitudinal the motivation behind those actions. Behavioral is likely useless without attitudinal and attitudinal is likely inaccurate without behavioral.

Competitive Data

Obtaining competitive data for its own sake (just to know how the competition is doing) is an exercise in trivia. It should be sought for comparative purposes, largely to set the benchmarks to which your site should be expected to perform because it is the "standard" for the industry. The main weaknesses of competitive data are completeness and accuracy: your competitors aren't likely to share all of their information with you and may misrepresent what they state to others to misdirect or create a false impression of their success.

The author provides a warning before going further: your competitors are not perfect. Seeking to copy what others doing is not a valid substitute for strategy.

(EN: Copycat strategy is bad for much more valid reasons - but most important of all is that a firm doesn't succeed by imitating competitors, but by satisfying customers. If imitation results in success, that's coincidental - what you are copying happens to be something customers value about your competition - but if you place your focus on the customer, as it should be, you are more directly and precisely addressing the problem. Perhaps the only valid use of competitive posturing is defensive: you are seeking to negate competitive advantage by ensuring your competitor's value proposition is not unique, even if you don't understand what it is based upon.)

There are various sources for competitive information on the Internet, which "rank" Web sites by popularity, and even provided deeper insight about the way in which users interact with the sites in question. Many companies, even those that would seem to be large enough to know better, utilize "free" services that provide usage statistics, hit counters, and other information - part of the terms of service for which is the information will be made available to others.

Even without their participation, there are user-centric methods of collecting statistics for sample audiences (a browser plug-in that tracks and reports browsing habits) in which a sufficient number of people participate to derive meaningful statistics (Nielsen uses this method, and has a base of "millions" of users). And referral services often disclose detailed statistics to show the value of their own services (a search engine reports, directly or indirectly, the number of visitors it sends to other sites).

The author suggests three ways to leverage competitive data to derive insight, but notes that these are examples, and certainly not a comprehensive list of potential applications. First, you can compare specific metrics between your site and others (to tell if your conversion rate is better or worse than a competitor). Second, you can determine what is working well or poorly for your competitors (by analyzing what is done on multiple sites and finding correlation to meaningful statistics). Finally, you can compare demographics of your site visitors to those who use competitors' sites (very important if you serve a different market segment).

But before you even begin to gather competitive data, you should have a plan. Again, "fog" is an issue, and collecting details without any sense of what they mean or what you plan to do results in a cloud of meaningless data that may misdirect you. So the frist step is to know what question you want to answer, then go out and find the data you need to answer it, and then set up a monitoring process to see how this changes over time.

Secondary Data Types

(EN: s a reminder, "secondary data type" is a contrivance of the author and does not correlate to "secondary research" - it's just a catch-all name for anything that the author considers to be of minor importance.)

It's important to see the customer as a person who interacts with the firm through various channels, so web analytics shouldn't be done in a vacuum. You should consider the customer data available from other interactions and consider data from transactions in other channels (e.g., a Web user who late calls to place an order). Without this information, Web analytics is a partial picture that may be misleading.

There is also a plethora of third-party research - available for free or for-fee - that can save you some effort. For example, many studies have been done on usability, and would be applicable to your own site (no need to replicate the studies unless there's a compelling reason to believe your audience is different). The drawback is that this information is available to all, so it does not give you a sustainable competitive advantage.

The author also cautions about over-reliance on third-party research. One example is in the automotive industry, where multiple firms relied upon J.D. Powers to provide market research and, over the past several years, the industry has seen a homogenization of car manufacturers' Web sites - they are a all taking their cues from the same source. And while standardization is often a good thing, it creates a bland user experience that gives no-one a distinct advantage, which can undermine the uniqueness of their brands.

(EN: This is a slippery argument. If you're "too different," then you may be excluded by comparison shoppers who have a hard time figuring out your site If you're "too similar" you lose your distinctiveness and provide a humdrum and disappointing brand experience. So where is the balance to be struck?)

The author mentions usability as an area of interest. Usability studies focus on low-level concerns (is it possible for a person to complete a task, given the design of the Web site), but this is keenly important - the harder it is to complete a task, the more people will not complete it. Ease of completion is only one factor in motivation, but it's a very important one to consider, and areas where users are encountering difficulty, for usability reasons, are easy targets for improvement once they are identified.

The author refers to "heuristic analysis" - which is one of many approaches to getting an "expert" review of your site. This entails asking an expert outside your organization for their opinion about your site. It's faster and cheaper than getting feedback from users, but the danger is that the "expert" may not have an accurate sense of the perspective of a real user (EN: many such experts don't put themselves in the user's shoes, and their opinions therefore don't pertain to the actual users of the site. It's much like Web awards: the opinions of expert aren't necessarily what customers care about)

The author also suggests mining the social web. People are vocal about what they like and dislike, but more often share this information with one another that providing direct feedback to the company. So looking at review sites, blogs, discussion boards, online communities, and other sources will give you a wealth of information about what people think. Be prepared to hear some harsh criticism, and understand that what you get may be the opinion of one person, and do what you can to verify what you see rather than taking it at face value.


The act of analysis yields objective information about past performance - which represents the status quo - but without some standard of comparison, it's not possible to assert with confidence that the "numbers" you see are good or bad. For example, a 1.4% conversion rate seems fairly low, but if similar sites have a 1.2% conversion rate, your performance is actually quite good (and dos not merit a panic or the devotion of a lot of resources to "fix"). In effect, knowing if a metric is indicative of a problem or opportunity requires a sense of what that metric ought to be.

As such, you will need to look outside your organization for benchmarks - but to be reasonable in what you use as a standard of comparison. Firms that are outside your industry, serve a different market segment, etc. will all have performance that varies by factors other than their Web site and its performance. So choose your competition wisely, and avoid apples-to-oranges comparisons.

Customer Engagement

The notion of "engagement" is central to commercial Web site operations. Specifically, companies seek to engage individuals to commit to undertaking an action that creates profit. People who visit a site are more "engaged" than those who do not - but if they are merely browsing and not doing what the business wants the to do (buy stuff), they are not sufficiently engaged, and business success relies on increasing the level of engagement.

Web analytics should be focused on this notion, and should consider behavior that is relative to engagement. Increasing the usage of a site search engine, for example, is meaningless unless the customers who do so tend to become (or are already) more deeply engaged with the firm. As such, you should seek to monitor the behaviors that indicate an increasing level of engagement.

It's also suggested that engagement is a long-term process rather than a short-term goal. It is not that a customer responds to a sales promotion immediately, but that they become increasingly more involved (non-buyers are not discarded, but treated as being in the early stages of engagement; and a person who makes a purchase is seen as someone who will continue to engage after the purchase).

As an example, the author looks at the blog of a professional speaker, who has clear goals for customer engagement. He targets the following behaviors: a person should visit the site once a month, view a "relatively large" number of pages per session, bookmark the site, view a variety of content, and purchase one or more items through the site. Each of these requirements is measurable by web analytics.

(EN: An extended example is provided, followed by a case study. This adds incidental details to illustrate the principles already discussed.)