The Math of Tuning

This chapter will discuss the statistics involved - it can be arduous, but it's important, so the author encourages the reader to grin and bear it.

Throwing Away Part of the Data

There is no absolute answer in statistics - outside of the hard sciences, most studies are geared toward a level of confidence that's around 95% - and even that is based on cleaning the data before analyzing it. Of importance: data cleaning does not mean throwing out data that does not support a foregone conclusion - but in the "real world," there's dirt and grit that needs to be set aside.

One example of data cleaning is dealing with a biased sample. You may want to get rid of all data pertaining to traffic from a specific ad in order to get a better idea about the general audience - or you might want to throw away all data from the general audience to concentrate on the impact of a specific ad. Or if your company is mentioned in a news program one day, you may want to remove that data as being unusual and unrepresentative.

Some data collection also excludes certain responses due to technology ro behavior. One example is using JavaScript to determine monitor sizes. Users who turn JavaScript off (about 4% of the universe) will not be counted. This is an inherent bias. Likewise, an analysis of search terms shows only topics of interest to visitors who prefer to use search engines as their primary mode of navigation.

Basics of Statistical Analysis

The author provides some basics about statistical analysis. I don't think he quite understands the topics he's meaning to teach others, because he's getting a lot of things wrong here, and probably misinforming the reader.

However, some key concepts are worth preserving:

Of key importance to marketing is probability theory: the creation of a model that results in a number of predictable outcomes ("events") and assigns a probability to each outcome based on statistical inputs.

Inputs to the equation are quantifiable measurements that are (more-or-less) objective. Some statistics are based on qualities (a person's level of income) and others are based on preceding events (a person entered a store).

Statistical analysis consists of:

In the context of conversion, the most important goal of statistical analysis is to discover causality - such that altering some of the factors of the equation can be expected to produce a desirable outcome.


Reliability is the goal of statistical modeling, but there are a number of common problems that can undermine that reliability:

Variable Interactions

Generally, the outcome of a statistical study is a conclusion, or at least the implication, that by doing X, we will influence outcomes in a favorable way, and that the outcome is mathematically predictable.

The problem is that this is based on an assumption of stasis: if nothing else changes. This is seldom the truth - and when you change multiple factors, or there are external influences, the interactions between changes in each factor may have a combined result.

Application to the Web

Some random notes about applying statistical methods to the Web, important because he will use these terms later, I suppose: