Tuning Methods
As a prolix: every method of tuning has its own advantages, limitations, and hidden assumptions. And what's more, every method of tuning supposes that it will work.
Common Tuning Issues
One problem is the research question: you need to be very specific in defining what you are trying to optimize, and very specific about the factors your will adjust to achieve your goal. You also have to keep an eye out for unintended consequences.
Bad measurements are another issue: if you count a "sale" as a true/false condition, you're missing some important details (value of the sale, profit margin on items sold), and could be gearing your site toward the behavior of those who make small and less profitable purchases.
There can also be conflicts among multiple goals: if your goals are to increase purchases and get people to sign up for mailing lists, trying to do both at once could be detrimental.
There can be conflicts among audiences. If you "tune" based on the behavior of new visitors, it could negatively affect the experience for established users.
Constantly tuning the site may also lead to a fluctuating benchmark, and it undermines consistency of experience.
Also, realize that your sample is a convenience sample - the people who happen to visit your site during the test period. There may be instances in which this audience is not representative of your "normal" audience (for example, there are a flood of college graduates looking for jobs in May, and this demographic could skew any tests performed at that time)
Tuning Methods
The two key activities are deciding what to change and determining the impact of these changes on audience behavior.
Worth nothing - you should not test ad tune on your public site, in view of all users. Instead, divert some small percentage of the traffic to the test pages, and implement upgrades in a more scheduled and orderly manner.
A:B Testing
The author describes this form of testing as putting up two versions of a page and measuring behavior, with the assumption that any difference in behavior is because of the differences in the page.
Usually, version "A" is the current page, to use as a baseline, and the "B" pag is an altered page. It's possible to have multiple versions: three or four instead of just two.
This test is simple to orchestrate and the results fairly simple to analyze, though it limits you to a small number of "recipes" to test.
EN: The author seems to take a liberal view here. In some instances, A:B testing is restricted to a single element variation - e.g., only the color of the button is changed - with an eye toward granularity. If you change the color, placement, font, and wording on a button, you don't really know which of those changes effected the results. But then, you may not care which precise factor was the root cause, merely in the results of the change.
Multivariate Testing
Does a comparison of the performance of pages in which multiple things are changed, or in which the differences between two versions are not as simple as the change in a single factor (if you completely redesign an order flow t one-page ordering, it's too different from a three-page order process to conduct an A:B test).
The benefit of multivariate testing is that you can test broader changes than with A:B testing, and you can test multiple factors in a single go rather than having a long series of individual tests. The drawback is that you get only a net result of all changes, not a clear indication of which particular change had the greatest effect.