12. Evaluating Web Usability
Traditional usability testing seeks to measure various factors, including efficiency, accuracy, ease of learning, and repeatability. Web usability also considers the quality of experience (EN: I wonder if that shouldn't be applied to other product development testing as well.)
Quality of experience is difficult to gauge, and the importance of user testing is paramount to verify research applied from other media, as well as to overcome assumptions and speculation based on opinion or the personal experience of a few individuals.
12.1 Traditional Usability Testing
Methods for testing include heuristic techniques ( xpert reviews and cognitive walkthroughs) as well as formal user testing, (usability lab tests, human factors experiments, and prototype testing). In addition, usability evaluators have used ethnography, field evaluation methods, and data collection, using questionnaires and interviews for task and audience analyses.
The goal of this testing is to determine where users may encounter difficulty of use, and to test improvements to determine their impact on usability factors (most notably, speed and precision)
12.2 Usability Testing for the Web
The author avers that traditional usability testing methods "apply equally well to the Web" and that, by those standards, the Web is "not very usable." His conclusion is that Web developers are ignorant of usability factors.
(EN: No support is provided to back the applicability of the methods to the medium; anecdotal evidence is presented regarding Web usability; and his conclusion is an assumption. I don't dispute that this is likely to be true, but he doesn't present his case very well at all.)
12.2.1 Web-Focused Issues and Testing
Some of the issues that differentiate the Web from traditional GUI testing are enumerated here:
Device diversity: There are significant differences in the user's experience depending on the computer (or other device) and browser software they are using to access a site. Test plans should include a variety of devices, systems, browsers, monitors, and connections.
User-controlled navigation: Because the user experiences the site inside the context of a browser application, there are browser-based controls (back/forward buttons, URL window, etc.) that enable the user to circumvent, to some degree, the navigation options presented in the context of the Web site itself.
Mental models of Web usage: Users perceive a Web site as a whole, rather than considering each page as a separate interface (Nielsen, 1997). This is distinctly different from software applications, where each window is considered to be a finite conceptual set.
Lowered switching costs: The user is able to leave a Web site and go to another with the click of a mouse (much easier than switching Word processing programs or computer operating systems).
Point of entry: A Web site is an open structure that can be entered at any point (via links/bookmarks), as compared to an application in which the programmer can control the user's initial entry when the application is launched.
Personalization: This is more of a practice than a quality, but personalization is much more extensively utilized on the Web than with GUI applications. It can be a blessing or a curse.
Primacy of content: GUI applications focus on functionality (enabling the user to accomplish tasks) whereas Web sites focus on content (enabling the user to access information. EN: This is a dated comment, as many Web sites are transactional.
Context of use: This is a bit vague, but I think that the author means to convey that the purposes for which visitors enter a Web site tend to be more focused on a specific need than those who launch a software application. (My example: a visitor to a finance site are focused specifically on investment, whereas a person may launch Excel for a wide variety of purposes).
12.2.2 Web-Specific Test Plan Issues
Enumerates some of the fundamental considerations for formulating a test plan:
Types of users to test: The use of a convenience sample (e.g., snagging passers by in a shopping mall) is less productive that structuring the sample according to the target market demographics. It may even be counterproductive.
Number of users to test: One study (Nielsen, 2000) suggest using five representative users, and the author extends this to mean five users of each market segment.
EN: The margin of error in a five-person group is up to 40%, which is far too much variance in for my tastes. A standard sample size of 30 reduces the variance to 6% - seems that this is a better approach. Also, where multiple segments are tested, it might be good to provide weighted aggregate results, such that a segment representing 5% of a target audience doesn't receive equal treatment as the segment representing 70%
Location of test: While the preference is for formal laboratory testing, the author suggests that more accurate results can be obtained if the user can be observed in their "normal" context.
Tasks to use in testing: A Web site can be used to perform a myriad of tasks. The author suggests enumerating them and focusing on the "top" tasks first.
EN: There is no mention of the criteria are used to assess which tasks are more important. Do we survey the users to determine this? Does the business set the priorities?
Simulating the conditions of use: The author suggests basing tests on a real task rather than a simulated procedure when possible. For example, give the user a gift certificate to make an actual purchase (and keep the merchandise)
12.2.3 Web-Specific Evaluation Issues
The author provides a random collection of questions to ask. I don't think this is worth preserving, as there are better resources that do a more comprehensive job. The only thing worth noting is that these questions are geared toward very granular aspects of very specific interfaces.
12.3 The Process of Web Evaluation
Defines a four-stage process of evaluating a site: Usability evaluation and goal setting, early paper testing, storyboard testing, and interactive prototype testing, each of which is discussed in further detail below.
12.3.1 Usability Evaluation Goal Setting
Two types of goals are defined: those that are absolute (find a given page within five minutes) and those that are relative (find the information in less time than on a similar site), the latter being critical for comparing a site with competitors or comparing a planned improvement to the current site.
Also stresses that the goals should be both concrete and measurable in order to be meaningful ("the process should be easy to use" is abstract, "75% of users should be able to complete the process without referring to contextual help" is measurable).
12.3.2 Early Paper Testing
Paper testing provides an image of a page (can be a sketch on paper) to ascertain basic facts: whether the user knows what kind of site it belongs to, or what tasks they feel they can perform based on the design. Of note, this looks at each interface in isolation.
The author suggests providing multiple sketches to the same person and asking comparative questions (which makes it easier to find X). This seems vague to me - better to show sketches individually and measure accuracy or speed of responses, then compare those mathematically.
EN: It occurs to me that if you test multiple pages on a single site, the user might have some carry-over knowledge that would skew later answers.
12.3.3 Storyboard Testing
Storyboard testing looks at pages in the context of other pages. It can be done on paper or electronically, but should present pages with "complete design details" - that is, including all the elements that will be visible in the final page.
The testing involves giving the user a specific task, usually related to navigating the site, and asking them to verbalize what they are thinking as they perform that task. Ideally, this captures the user's evaluation of each page and the decision-making process they follow when taking each action.
12.3.4 Interactive Prototype Testing
Interactive prototypes (probably not a paper process, by definition) provide users with a working model on which they may be tested to identify navigation problems and process bottlenecks.
The author suggests using expert reviews, with "experts" in two areas; subject matter and usability. My sense is that this will be of some utility, and will identify issues from a theoretical perspective - but an expert is not a good representative of the user (unless the site is being built for experts) and their feedback will be of a general nature, so it will lack both accuracy and specificity.
Another (better) method of testing is conducted using "potential users" (presumably, individuals who match the characteristics of the target audience) who perform specific tasks. There are two approaches to this testing: (1) Observation and measurement, which uses objective measurement (time, number of clicks, errors, etc.) and evaluates them statistically, and (2) "Think aloud," which asks users to verbalize their thought process to capture more qualitative data that cannot be identified by observational methods.
12.4 Frequently Asked Questions about Usability Evaluation
EN: Skipping this - the "FAQ" is a jumbled rehash of the information that's already been presented.