jim.shamlin.com

5 - Generating Metrics

The author asserts that there is no "big red metrics button" that spits out a report that is meaningful to every Web site. There are various tools and services that do various things - but in each instance, you have to consider what they are measuring and whether it is at all important to you.

Gartner: Levels of Ambition

Gartner divides metrics into four categories or "levels":

  1. Monitoring - Investigating user behavior, with an emphasis on the number of people who visit and how long they stick around, and wit ha goal of getting more attention from more people.
  2. Feedback - Considering user behavior, with an emphasis on tasks performed on the site, with a goal of "improving" the value users get from the site (and the business gets from its users)
  3. Leverage - Measuring the amount of profit generated on each visitor and methods of improving it (up-selling, cross-selling, repeat visits, customer satisfaction) with the ultimate goal of getting more profit from the customer
  4. Strategic - Long-term measurements of customer behavior with the goal of tailoring your site (and other aspects of the business) to serve the most profitable segment (and fobbing off less-profitable customers on the competition)

These are general statements about the goals of a site - which metrics matter depends on your company's level of ambition.

Server Log Analysis

The author gives some basic information about server log analysis: what information is contained in each "log line" and what the parts of the line are. I'm skipping that.

He also points to some of the problems with log files: it doesn't tell you when a user views the site from a cache (on their computer or a proxy server), nonhuman visitors (robots and crawlers) generate log data, multiple people may share a computer or a gateway, etc.

Using "sessions" or "cookies" can address some of these problems, but not completely. It solves some problems, creates others. And taking more extreme measures can improve your metrics, but harm your performance (tricking the browser into loading a fresh copy rather than using cache creates an unnecessary load on the server) or the customer satisfaction (forcing users to log in so you can track them drives away users).

A point worth preserving is that log-crunching requires significant horsepower. A log line is about 120 bytes of data (omitting referrer and user agent data). A site that gets a million visitors a day, who load ten files apiece, will generate 1.2 terabytes of log data. Every day.

To economize, some IT departments strip out essential data, such as referrer. However, without this data, you lose valuable insight: you cannot track click-trails through your site. You cannot tell what other sites visitors are coming from. You cannot tell what search keywords are being used to find your site. It is NOT a worthwhile trade-off.

In addition to the access log, there is also an error log. Too often, this is ignored in metrics, but it is critical data fro ensuring the Web site performance, and meaningful information is lost by ignoring this.

Choosing a Tool

When selecting a solution, some key considerations are:

  1. Who is it designed for? The metrics provided by different tools are geared to different purposes. Data that helps the IT team monitor server performance is significantly different than data needed by the marketing manger to improve customer satisfaction.
  2. Is it flexible? Does the tool provide standard reports, or does it allow you to customize the reporting to your specific needs?
  3. Does it archive and perform trend analysis? Most solutions provide good "snapshot" statistics, but few measure the most impotent thing: how metrics vary over time. In some cases, monitoring trends takes a lot of manual work. In others, it simply isn't possible.
  4. How are reports formatted? Is the report a mass of numerical data, or does it provide information graphics? Are the information graphics meaningful?
  5. What data is hidden? In selecting what information to display, some solutions hide data (a good example: if a report shows the "top ten" pages, you cannot tell how any other page on the site is performing)
  6. Is it scalable? Because of the massive amount of data involved, many solutions cap out at a certain level. When you outgrow a solution and switch to another, this creates a barrier, where old metrics are not comparable to new metrics because the new solution works differently.
  7. Is it auditable? If it's important for others to verify your numbers (e.g., for ad sales), the output must satisfy their requirements.
  8. Is it speedy? In some instances, speed is essential, and having up-to-the-minute data is essential. If it takes six hours to run the reports, you're dealing with six-hour-old data.

A handful of anecdotes are provided from major Web sites who were using the wrong tools because they failed to consider one or more of the factors above.

EN: I think the author overlooks a few things: using more than one tool, or developing custom tools for generating metrics you need.