jim.shamlin.com

Appendix: Web Analytics Definitions

The author asserts that there are three main metrics that most companies seek to define: visitors, sessions, and pages. As important as they are, none of these can be accurately measured. However, if the same criteria are used to define a visitor, session, or page between two sections of a site (or the same one, over time), the degree of error should affect all periods equally, and comparisons should be reasonably reliable.

Term: Unique Visitors

The count of unique visitors is an attempt to provide a count of people in a channel in which people are invisible. The goal is to get an estimate of actual persons who visit the site over a certain period of time, and to count each person once and only once during that period.

It is strictly impossible to determine the actual number of people who visit a public Web sites with any degree of accuracy, though various methods have been attempted (most notably, the "cookie"). Private Web sites often use a login to an account to "tag" and track an individual user, which still has the drawback of individuals who share accounts.

And given the way in which "Web 2.0" presents and uses content, there's further debate on whether a person whose computer requested content from a site should actually be counted as a visitor to that site at all - a "reader" application pulled some content to be viewed in a separate site, or a separate application.

Term: Visits/Sessions

This metrics attempts to expand upon "visitors" by determining the number of times they visit the site. The "visitor" count will indicate that one person visited the site in a given month, but not how many times they visited the site (maybe just once, maybe on ten different occasions).

In addition to the problems of identifying the visitor, the "visit" is compounded by the necessity to estimate a reasonable amount of time a person will spend on a site. For example, most sites will assume that a person who was active, then inactive for thirty minutes, then became active again has made two separate visits to the site. This same pattern may be a single session, interrupted by something else (the phone rang).

For rich-media sites, the interval may be insufficient: a person who watches three 45-minute videos in an afternoon has not necessarily made three visits to the site to do so.

Term: Page Views

A "page view" counts the aggregate number of times that a given Web page was viewed, and can be cross-tabulated to indicate how many pages a given visitor viewed, what pages were viewed during a session, etc.

The foible of page-views is in the arbitrary definition of a "page" on the Internet. Traditionally, it was any HTML document (as opposed to graphics) - but not all HTML documents contain content that would constitute being a "page" (some contain no visible content at all), and other types of files that would normally be considered to be graphics might be counted as pages (a JPEG that is a fact sheet a "page", but another that is a picture of the product might not be).

Term: Hits

A "hit" is any request for any file. It is the most objective, bur least meaningful metric that can be considered, and is the basis on which the "big three" above are derived.

Typically, Web server responses that return error codes are recorded in the server log as "hits," but it is generally acceptable to throw this information away for purposes of analyzing the behavior of site visitors (they may be meaningful for maintenance and troubleshooting).

It's also common practice to sort out and discard any hit generated by a robot or spider. These, too, have nothing to do with the behavior of human users - merely search engines crawling pages to index a site.

Another common discard is hits that deliver only partial content - the amount of bytes transferred does not match the actual size of the file requested. This decision is less straightforward as software will progressively render the content - for example, a person may "stop" the load of a PDF file because he found what he needed on the second page and did not see the need to download the rest. His needs were served, though the file was not completely downloaded.