Usability Testing

The author's perspective is that usability testing should collect data on the users' success at specific tasks, speed of performance, and general satisfaction. He also recommends an iterative approach to test the effectiveness of changes made in the site.

(EN: He seems to be discussing usability as applied to a production site rather than part of the design process, which overemphasizes testing as a method of keeping score after the fact.)

Of note, the author indicates that methods such as heuristic evaluations and expert reviews tend to identify a "large number" of issues or problems that do not bear out in testing.

1. Use an iterative design approach (4:5)

The author defines the "iterative approach" as creating prototypes, testing them, then making changes based on the results, before implementing them. Then, using the present site as a benchmark against which future improvements can be measured, with a goal toward improvement.

EN: He cites various studies that support the value and success of this approach, but I am a little dubious. If a site does not have an iterative approach to design, how is it to be evaluated?

2. Solicit comments from test participants (3:4)

The author recommends having test participants verbalize their cognitive processes, but suggests there seems to be no difference in the efficacy of whether this is done while the task is being performed ("think aloud" approach) or after the task is completed ("reflective" approach).

However, he notes that the "think aloud" approach may gather more data, but it slows down users (foiling performance measures), and has a net effect of causing them to consider factors that they would overlook outside a laboratory environment (generating false feedback on inconsequential aspects of the site).

EN: The author does not indicate the reason for soliciting comments, just compares the methods. From other research, I surmise that the reason for soliciting comments is to capture qualitative data that explains the cause of phenomena measured by more objective and less intrusive means.

3. Evaluate a site before and after making changes (3:3)

Measurements before and after a change provides a comparison and a measure of the effect of the change.

4. Prioritize tasks (3:2)

Citing a 2004 study, the author suggests asking users to rate the difficulty of tasks before performing them, then again after performing them. They key category where improvement is needed is tasks that users assumed to be easy, but turned out to be difficult.

EN: There are three other categories that aren't examined in detail: tasks perceived as easy that are easy (least important), tasks perceived as difficult that are easy (the task is fine, need to work on perception), and tasks perceived as difficult that are difficult (second category of importance).

5. Consider frequency and severity (3:3)

In identifying problems, consider both frequency and severity: an issue that a small number of users consider to be a major obstacle is more critical than one that a large number of users consider to be a mere nuisance.

6. Select the right number of test participants (3:4)

Using too few participants reduces the accuracy of results; using too many wastes resources.

The author suggests using a smaller sample at the onset, then increasing to a larger sample as a project becomes closer to being completed.

EN: No specific suggestion is given as to a precise number. Consulting statistics, a test set of 30 is considered a good balance.

7. Use the appropriate prototyping technology (2:3)

Paper-based prototyping appears to be as effective as computer-based (at a fraction of the cost and time) when trying to identify most usability issues, though it cannot be used to accurately measure performance (using an actual prototype is necessary for these measures).

The author recommends using paper testing early in the process, as major decisions are being hammered out, and advancing to prototypes after the "major " decisions have been made and it's time for fine-tuning (EN: this should not make designers reluctant to go back to the drawing board if the prototype shows more than minor flaws).

8. Use inspection evaluation results cautiously (2:4)

This class of testing includes heuristic evaluations, expert reviews, and cognitive walkthroughs to generate information.

Studies show that these kinds of evaluations are very specious. When measured against test results, it was found that 56.5% of "problems" identified by these methods turn out to be false positives (identifying issues that aren't really issues causes a lot of wasted and possibly counterproductive effort) and 33% of the problems encountered by actual users were entirely overlooked.

Evaluations were more accurate in identifying errors in a single screen, viewed out of context, than they were at identifying errors in a process that requires the user to click through a sequence of screens.

9. Recognize the "evaluator effect" (2:4)

This effect describes the tendency of individual evaluators to focus on specific problems (area of expertise or experience) to the exclusion of others, and to evaluate based on opinion alone.

EN: An "expert" evaluator seems, in the end, to be little better than a user with no expertise, strong opinions, and an attitude problem.

10. Apply automatic evaluation methods (1:3)

There are software programs that can be used to identify common problems, such as the load speed of pages, broken links, reading comprehension level, proper use of code to promote accessibility, etc. These should be used to save wetware testers the time, frustration, and distraction of catching errors that could have been identified by other means.

11. Use cognitive walkthroughs cautiously (1:4)

A cognitive walkthrough consists of asking users to describe the approach they would use to solving a problem or accomplishing a task. Studies have shown this method to be only 25% effective, and are similar to other evaluative methods (see 8, above) in inventing false problems and missing real ones.

EN: In my own experience, I've noticed subjects who are far more deliberate in their approach to solving problems in theory than they are in practice. That is, there is a major disconnect between what a person says they would do and what they actually end up doing - it shouldn't take a lot of research to "prove" this.

12. Choosing a testing venue (1:4)

Studies demonstrate that results of testing in a laboratory environment versus testing in a remote environment do not yield substantially different results in terms of task completion ratio, performance measures, or satisfaction scores.

EN: it's not clear what is meant by a "remote environment" - whether this pertains to testing in a natural environment (users in their office, home, or other location of their own choosing) or a remote testing environment (setting up a computer in a booth) - further investigation is necessary when making a decision pertaining to testing venue.

13. Use severity ratings cautiously (1:4)

Severity ratings by usability specialists (expert opinions as to which problems are most severe and merit the most immediate attention) are specious: there appears to be little to no consensus among experts as to which kinds of problems are the most severe, and they have little correlation to the assessments made by users as to the actual difficulty imposed by a problem.

Neither experts nor users seem to be particularly good at assessing severity when their assessments were weighted against actual performance measures. Experts were misjudged a level of security (more or less severe in terms of impact on performance) 72% of the time, where completely unqualified users misjudged severity 78% of the time.