jim.shamlin.com

5 - Human Error or Bad Design?

Reports indicate that between 75% and 95% of industrial accidents are caused by "human error." But the author feels that this is terribly wrong. He could accept a number more around 5% - but to suggest that the vast majority of people are inattentive, clumsy, and inept is clearly a case of something that is flawed by design. And he means this quite literally.

It seems like the only time that engineers are willing to accept the blame for the failure of their devices is when an incident occurs when an item was untouched by human hands. When a bridge collapses (and there was no traffic upon it), it's clear that there was a flaw in the design - but if even one person is walking across the bridge, engineers are quick to suggest that it was something that person did that caused the catastrophe.

It is also because the human user is the least considered part of the equation: if there's anything that the device cannot do on its own, the burden of doing it is placed on a human operator - and it is ascribed to human error when the operator is unable to do something that the engineer demanded of him, in precisely the right way.

In fairness, engineers began to consider the physical capabilities (and limitations) of human beings decades ago. Frederick Taylor's principles of scientific management are largely based on observing the strength, speed, and stamina of human workers in industrial jobs and designing tasks to suit their capacities. The mental capabilities of human beings, however, remains greatly misunderstood and largely ignored. (EN: Hugo Munsterberg raised the very same point shortly after Taylor, but his work garnered little attention.)

Physical limitations are easy to understand because they can be physically demonstrated: you can tell that the handles on an item are too far apart simply by asking a person (or people of different heights) to grasp both of them at once and witnessing that they are unable. It is much more difficult to assess mental capacities in a way that is convincing: you can conduct usability tests that show people are unable to figure out which switch operates which system, but observers can simply suggest that the people testing the device are simply too stupid to recognize out what is obvious to them.

Testing alertness is even more difficult, particularly because people in a "test" situation that lasts a few hours at most, during which they are aware that they are being observed at all times, are generally isolated in a lab environment, etc. This does not match the situation of a worker who will perform the task unsupervised, during a shift lasting 8 or 12 hours, while being distracted by other tasks and things in his environment, and is under pressure to perform.

(EN: It's become even more obvious to me when testing cell phone applications. In the lab, users are in a quiet environment, no distractions, no time pressures, holding a cell phone in both hands while sitting comfortably. In the real world, the application is used by a person while standing, holding the device in one hand, in a noisy environment, and distracted/interrupted because they were waiting for something else to happen. It's not at all a realistic test.)

He mentions discussion with people who, in a relaxed mood, plainly admit that they make stupid mistakes, like forgetting to turn the stove off, nodding off at their desks, and not paying attention - but these same people are highly critical of other people for making "human errors" for not being fully alert and attentive at all times. It's simply an unreasonable expectation.

Understanding Why There Is Error

The most common reasons for an error is a poorly-designed task. We expect people to behave in ways that accommodate our designs, rather than designing to accommodate the way in which people behave. We expect too much of them (intense concentration and alertness over long periods of times) or make mistakes about their environment (that it will be as quiet and serene as a testing lab).

Another serious problem is the way people approach errors: their first instinct is to attempt to avoid being blamed. They are afraid that if they accept the blame for an error, there may be fired or sued or punished in some other way. So they look to place the blame on someone else, or make someone else responsible for making sure it never happens again. And in being so interested in the blame, we totally miss the real cause of the problem, and fail to find a real solution.

ROOT CAUSE ANALYSIS

As the name implies, "root cause" analysis seeks to identify a single cause for an accident. The root cause is something that sets in motion a chain of events that leads to a disaster, such that we believe that if truth one thing did not happen then the disaster would not have happened.

In engineering, analysis takes as an assumption that the machine works a certain way - and if the mechanical operation of the device matches the way in which it was intended to work, and if no defects are found in the equipment, then the problem must have been the human operator.

One significant problem is that engineers are simply looking to scapegoat human beings for flaws in their design. As soon as we find a human being to blame it on, the analysis stops. If the user failed to press a button to release excess pressure, then it's the person's faulty - even if the machine provided no gauge or indicator to tell him that the pressure was dangerously high. And where a machine fails because of a broken part (clearly a mechanical defect), we then say it is someone's fault for not inspecting and maintaining the machine properly.

The author offers up the example of the F-22, a fighter aircraft that had been involved in a number of crashes. There was no apparent mechanical cause for the problem, and it was ascribed to "pilot error" in that the pilots were shown to have made a variety of mistakes in different incidents. It wasn't discovered until years later that that pilots were suffering hypoxia because they were not getting enough oxygen through the plane's ventilation system and were becoming incapacitated.

Another problem with root-cause analysis is that it presumes only one thing is to blame for an accident, when in reality many things may have gone wrong. Many accidents are caused because multiple things went wrong at the same time - and while the designers may have taken care to provide a remedy if one or another thing happened, they did not consider how a person might be able to react if both happened at the same time.

THE FIVE WHYS

The author refers to an annoying investigative technique called the "five whys" - which encourages investigators to ask "why" five times once a reason is found.

The author acknowledges that five is an arbitrary number - in some instances it may be too many, in others too few. But it is nonetheless an effective technique for getting to the root cause of a problem - in spite of the fact that it falters when there are multiple factors that contribute.

BLAMING THE USER

The author has observed that most people stop investigating once they have found someone (a person) to blame - and that in most instances, it is fear of punishment that causes them to seek someone else to lay the blame upon.

He also mentions the ego mania of engineers in general. Their job is to fix the problems of the world and many feel that they can do no wrong. Their inventions are perfect and do not need to be changed, but the users must change their behavior to accommodate their inventions.

This also tends to correlate with the readiness of individuals to accept the blame when dealing with technologies they don't understand. When a person who is unsure of themselves when working on a computer does something that generates an error, they are quick to say "it's my fault." When a user who is willing to accept blame is in a conflict with an engineer who's unwilling to accept it, the result is obvious.

It often takes an objective outsider to assess fault. Did the user fail to notice a problem, or did the device fail to alert them to it? Did the user do something wrong or did the device misguide him?

A good system should provide constraints to prevent the user from making errors at all. A bad system enables and even encourages users to do things that create errors - or provides no clear indication of what is expected of the user, or which places too much burden on the user to know or learn.

UNREPORTED ERRORS

We cannot fix problems unless we are aware they exist - and we cannot be aware that they exist until people report them. Unfortunately, the standard practice of blaming the user has led people to attempt to cover up their errors, deny them, or at least not admit them and hope no-one else notices.

Another reason for unreported errors is that people don't understand how things work. When asked what went wrong, they are at a loss to describe exactly what went wrong.

TINKERING AND EXPERIMENTATION

Engineers want to control the people who use their devices, controlling and directing their every action as if they were part of the machine. But this is not human nature. People are creative and exploratory beings, who enjoy figuring things out (so long as we are successful at doing so) and dislike being told what to do.

As a result, people don't read instructions. They attempt to figure things out, and will not look to the documentation until something has already gone wrong. And so, the devices we provide to people should take into account their exploratory nature, providing indicators and constraints that can be figured out along the way, rather than expecting them to train in advance for most tasks.

People also tend to want to discover innovative ways to do things - finding new or more efficient ways to use a device by ignoring the intentions of the designer. (EN: While this is a fact, I would dispute that the designer should, or even can, develop a device to accommodate this - as it's likely not possible to predict the unusual ways in which someone might try to hack a device.)

DISTRACTIONS AND PRESSURES

The user's natural habitat - which is to say, in situations where a device is going to be used - is not a quiet testing lab in which there are no distractions and no stress.

The world is full of things that will distract a person from a task, or cause them to turn away "just for a moment" from a running machine without switching it off. As has been stated, it is not at all reasonable for an engineer to expect that the user will give complete and constant vigilance to his device.

Time stress is another factor: users may be under time-pressure to complete a task - either because the task itself has to be done quickly, or because they need to complete it and then jump on something else. Particularly in commercial scenarios, people are pressured to do as much as possible as fast as possible, as well as to press forward and get things done even when it is clear that doing so will compromise the quality of the outcome and even endanger the people involved.

Deliberate Deviations

Errors are not always the result of neglecting to do things according to the instructions, but can also arise when people deliberately ignore the instructions. They believe that they can adjust the efficiency or effectiveness of the outcome by doing things differently - whether this means they want to do things better, or are willing to accept a less-than-perfect outcome for the sake of efficiency and expediency.

Consider the case of speeding: people will drive faster than the posted limit when they do not believe that the limit is reasonable. In certain situations, such as a speed limit for a curve, they may recognize that the engineer is making a suggestion for the sake of safety, but believe that they or their vehicle is capable of safely making the curve at a faster speed.

(EN: It doesn't help that limits are set too low. The fact that they were able to take a curve rated for 30 mph at 35 emboldens them to ignore warnings and try going even faster next time, as well as to ignore warning signs on other curves given that the posted warning was wrong. The idiotic practice of "let's say 30 knowing that people will drive 40 anyway" is self-defeating.)

In the workplace, rules and procedures often become far more convoluted than necessary. In many instances workers are slowed down to accommodate those who wish to monitor and control their work - such that paperwork takes longer than the task itself. Workers soon learn that if they didn't bend or break the rules, they would get very little done.

And worse, the pressure to meet quotas and goals compels them to ignore the rules - they will be punished either way. Worse still, they may be rewarded or see someone else rewarded for behavior that accomplished results by breaking rules.

In other instances, rules are ignored when the user perceives there to be a special situation. To roll through a red light because it's 2 am and there are no other cars on the road, or to speed because they are late for an appointment. Accidents happen because they were not sufficiently attentive (they merely assumed there were no other cars on the road) or where the "exception" behavior becomes the norm.

In general, violations are a valid form of human error - but it must always be questioned whether the "rules" are communicated clearly and are sensible and reasonable.

Slips and Mistakes

In attempting to classify things that can validly be ascribed to human error, the author found there to be two categories that cover the vast majority: slips and mistakes.

This doesn't necessarily mean the user is at fault - merely that the user did not do what the device designer wanted them to do. Whether the designer's desires were valid, reasonable, and clearly communicated is still subject to debate.

A slip occurs when the user intended to do the right thing, which can be further categorized:

A mistake occurs when a person fails to take the proper action, for which the author provides three bases:

Where it is unclear whether an error is to be considered a slip or a mistake, it helps to consider that mistakes occur when planning to do something, whereas slips occur during the process of taking action.

The Classification of Slips

Most everyday errors are slips, in which a person means to do one thing but finds themselves doing another.

He mentions verbal slips - which are routinely experienced when people are attempting to speak too rapidly, or utter a phrase that requires a bit of lingual acrobatics, and end up saying the wrong word or even garbling a phrase. He mentions that Sigmund Freud made much ado about such verbal slips - probably too much ado.

An interesting observation is that experienced people tend to slip more often than novices. The reason for this is that novices tend to pay closer attention to the task they are doing, whereas skilled people often perform tasks in a more automatic/unconscious manner and may fail to pay sufficient attention to what they are doing.

CAPTURE SLIPS

A capture slip occurs when a person has done one repetitive activity and then switches to another that has a nearly-identical action sequence. Typically, a more familiar or recently-performed sequence will replace a new or less familiar one.

For example, if taking an inventory of items that are packaged in pairs (counting 2-4-6-8-10) and then switching to items packaged in sets of three (count 3-6-9-12-15) it would not be entirely unpredictable for a him to blend the two (3-6-8-10-12) and bungle the count. Simply stated, similarity of actions should be considered when designing a task or device.

Another common example is driving a rental car whose headlights and windshield wiper knobs are in the opposite position than those in your normal car - such that when it begins to get dark, you reach over and turn on the windshield wipers.

DESCRIPTION-SIMILARITY SLIPS

Description-similarity slips occur when affordances look alike to a person - when two switches that do entirely different things look the same, people will often mistake one for another, particularly when acting out of motor memory.

In such instances, consistency works against itself and a stronger visual distinction must be made when there is a potential for error. Consider that near many sinks, the light switch and garbage disposal switch look the same and are placed side-by-side - resulting in disaster when someone wants to switch on the light to better see something that accidentally slipped into the drain.

The items don't necessarily need to be identical to be mistaken for one another at a glance. The author mentions a former student who had the problem of accidentally tossing his clothing into the toilet because his laundry basket was round and white. Even though the two were in different rooms, there was enough similarity that when the student removed a garment near something round and white, he reflexively tossed it in.

Consider proximity in more dimensions than physical space when designing to avoid similarity slips: proximity in time (one thing is done shortly after another), proximity in function (they do things that are similar), etc.

The author suggests that this is part of the reason airplane cockpits are so difficult to learn - every item looks significantly different than others that are similar. In this instance, it is because it could be disastrous if two things looked the same (if the switch to raise the landing gear looks like the switch to lower the wing flaps, pilots might mistake one for the other).

MEMORY-LAPSE SLIPS

Memory lapses are very common errors, and the author lists a few of them:

Some of the various causes of memory lapse are listed:

In general, memory lapses reflect a lapse in short-term memory: there are a very limited number of slots and the data held in them is highly perishable.

There are various methods for people to avoid memory lapses (writing things down, having a checklist, setting an alarm, etc.) but there are also instances in which a machine can be designed to remind a person of something they neglected to do (the chime that sounds if a car is starting and the seat belt is not fastened) or to avoid common lapses (an ATM that allows the user to swipe his card rather than insert it).

MODE-ERROR SLIPS

A mode error occurs when a device has different states of operation and the user does not observe (or makes an assumption) about the mode in which the device is presently set. Common examples would include stepping on the gas when the car is in reverse rather than drive, attempting to use a "universal remote" to turn on the television and turning on the stereo, trying to pop corn in a microwave that's set on "thaw," and the like.

The author finds this to be an increasing problem as electronics proliferate and designers want to provide the "convenience" of a single controller for many things, to imbue one device with multiple functions, or to create a wider array of settings for a single device.

From an engineering perspective, it seems very simple and elegant to create a control panel with a dial and a button (set the dial to a number and then press a single button) rather than a panel with ten different buttons, and it certainly saves on the cost of the hardware and wiring. But it can become very complicated for the user to figure out how to use the result of this shortcut.

In general, a device with "modes" is unnecessarily difficult to use and increases the number of errors. It's particularly problematic when there is no indication of which mode the device is in, or when the indicator is so subtle it can be easily missed. It can be made worse when the user doesn't perceive a difference between the modes (for example, equalizers with presets of "jazz" or "rock" music - few people can tell the difference, and demanding audiophiles insist on controlling the settings themselves). The problem can be made worse, still, if the device automatically changes modes sometimes.

The clock is a very old example of a device whose modality is poorly considered because the same 12-hour dial is used for both AM and PM settings - so the user must know (from some other source) if it is before or after noon to know which "mode" the clock is in. This flaw was picked up by digital clocks, which also show 12-hour cycles, with a small LED indicator to indicate that it is AM or PM (with some clocks, "on" means PM and "off" means AM, and with others it is the opposite). One of the most common problems with these clocks is being unable to set it correctly - you meant to set an alarm for seven in the morning and it goes off at seven at night instead.

And while he's on a tare, he gripes about watches. A watch may function as a clock, a stopwatch, and a time; it may tell the time in multiple zones; it may enable the user to set multiple alarms. And it attempts to do all of this with four small buttons that must be pressed in various combinations in order to access these functions. People who own such watches either don't use all of those functions, or commit far too much information to long-term memory in order to learn how to use them.

He mentions an instance in which mode-error slippage became deadly: one of the models of Airbus airplane had an instrument that assisted pilots in landing the plane (or in this instance, assisted them in crashing it) - a single display enabled the operator to enter the degree of descent or the decrease in speed, in two different modes. Naturally, it was only a matter of time before a pilot made a mode-error slip and crashed a plane.

(EN: I looked into this and didn't find an account that matches this description, but it may exist buried in all the clutter. "The clutter" is a lot of accounts of Airbus crashes due to problems with instrumentation and controls - particularly of the 300-model plan that introduced a lot of technical sophistication to the cockpit.)

The problems with modes of operation are such that the author declares that any mode error "is really a design error." The fundamental error is having modes - at all. Various attempts by designers to overcome the inherent problems of modes have not been successful in making a bad idea work.

The Classification of Mistakes

Whereas slips occur in execution, mistakes occur in planning - or at least in the moment in which a decision to take an action is made. These are often classified as "human error" because a person made the decision to do the wrong thing ... but what led him to make that decision? Or more to the point, what failed to guide and enable him to make a good decision?

All decisions are made based on human judgment, and judgment has been divided into three modes: skill-based, rule-based, and knowledge-based.

In all three forms of decisions, the most common and serious mistakes are the result of misdiagnosis of a situation. Each relies on the assumption that taking action X will have result Y - based on experience (skill), procedure (rules), or extemporaneous reasoning (knowledge) - but an action will produce a result only under certain conditions, and all three approaches anticipate that the conditions will be "normal," hence failure occurs when the conditions are not as they are assumed.

RULE-BASED MISTAKES

"Rules" consist of procedures and instructions that guide people to take action without thinking. It is presumed that the user is a person of little experience or intelligence and must be told what to do - and that the individual or group that defines rules is capable of clearly communicating the correct action in a given situation. Both assumptions are often wrong.

As a result, there may be a number of rules-based mistakes:

Another mistake commonly made by rules-makers is assuming that everything will be normal. The author speaks of a nightclub fire that killed over 200 people in Brazil. Safety procedures for the pyrotechnics were followed to the letter (the procedures did not account for the low ceiling and acoustic tiles), exits were locked (standard procedure to keep people from leaving without paying their tabs), and the staff followed procedures that did not account for an emergency situation.

People generally follow patterns of behavior that have worked for them in the past - while this is not a documented procedure, their adherence makes it a self-imposed one. This becomes problematic when the situation is not adequately analyzed to recognize a difference. Hence, people still get into accidents because they do not brake properly - they are using the gentle, pumping motion they were taught for standard brakes in vehicles with anti-lock brakes for which the vehicle manages the on-of motion and the driver should merely press the pedal firmly. Since hard-braking is unusual, many drivers even misinterpret the vibration of the anti-lock system as indicating a malfunction.

Vagueness is a particular problem for rules-based procedures. Those who formulate rules will include phrases such as "when necessary" and assume that users will know when it is necessary to take the prescribed action. (EN: I have a sense this is done to avoid culpability - when things go wrong the rule-maker escapes blame by placing responsibility on the rule-follower.)

Complexity is another problem with rules-based procedures. Where the rule is written in convoluted language, often to provide specificity, it becomes so complicated that users cannot possibly understand what it actually means.

Memory is another common problem. Where rules or long or elaborate, the user cannot recall the right rule to follow when it becomes necessary - whether they forget the rule or the situation in which it should be applied. This is particularly problematic for emergency procedures, as emergencies very seldom happen and, when they do, the rules have been forgotten.

When rules have been committed to memory, it becomes very difficult to change them: people will remember the old rule-set and the changes will not register in their memory.

From a design perspective, advice is ...

Hindsight is always more accurate than foresight. There is much to be learned in the wake of an accident, where things did not go as expected - knowing what went wrong is necessary to preventing similar accidents in future. In many organizations, the troubleshooting team has no ability to inform the design team - which ensures flaws are perpetuated.

(EN: This is a very significant problem in our litigious society - in which the legal team often gets involved in writing product instructions in the wake of a lawsuit. This is never a good thing, as it results in manuals the size of phone books, counterproductive procedures, warning labels galore, and other foolishness that cause people to disregard rules, but which succeeds in giving firms the ability to blame users for any accidents that may occur.)

However, when procedures are augmented in the wake of accidents, they tend to become extremely convoluted and highly burdensome. A simple task becomes very complex when the user is made to behave in an unusual or seemingly unnecessary manner to avoid an accident that occurs only in rare situations. When this happens, the rules may be incomprehensible, or they will be forgotten, or they will simply be ignored.

KNOWLEDGE-BASED MISTAKES

Knowledge-based behavior takes place in novel situations, when an individual must discover a solution to a problem he has never before encountered by applying his knowledge of similar situations and general principles of action.

This kind of behavior tends to be slow and methodical, with a great deal of conscious thinking and contemplation. Where a user acts quickly and without thought, he is generally applying skills or following rules ... and not really thinking about what he is doing.

(EN: It's worth noting that knowledge is the basis of all action. Skills are developed over time, and a person applies knowledge or follows rules when doing an unfamiliar task. Rules are written by people who have knowledge, to convey it to those who do not.)

Knowledge-based behavior tends to follow a trial-and-error pattern - and as such any mistakes an individual makes are simply "errors" that seemed to be plausible actions, but which turned out to be unproductive in accomplishing his desired goal.

The best way to decrease knowledge-based mistakes is to either grant the user knowledge (documentation) or to rely on the knowledge the user has of similar tasks (make the steering mechanism of a forklift similar to that of a car).

The author speaks a bit wistfully of artificial intelligence and expert systems - as it has long been a goal to provide assistance to novices (or more aptly, to take away the task of thinking of a solution from them), but there are no particularly good examples of how this has actually been accomplished for complex tasks.

MEMORY-LAPSE MISTAKES

A memory lapse occurs when a person forgets to do something - they may forget to attend to a task, or they may remember a task but remember a step in the procedure.

The author suggests that a memory slip occurs in action, whereas a memory lapse occurs prior to taking action. (EN: The distinction seems a bit dodgy to me. Does it matter whether I forgot five minutes or five seconds before I should have done something?) He does concede that both mistakes and slips of memory can be addressed by the same design "cures" - instructing the user (in advance or in the moment) or constraining his actions.

SIDEBAR: CHECKLISTS

The author muses on the topic of checklists, which he feels to be powerful tools in guiding users because the presence of a checklist enables them to remember the steps in a task and the items in the list provide the opportunity to provide details and instructions.

He mentions that having two-person teams work a checklist and instructions is often more effective: one person reads the instructions and another performs the actions. A single person is more likely to miss items or decide to do things different to instructions as he proceeds. The person reading the checklist gives the list his entire focus and will question the logic of deviating from the instructions, such that the other person must consider their behavior more carefully. It is more effective when this occurs in real time rather than using a checklist after the action has been taken.

However, adding more than two people to a task may cause it to be less likely of being done right. The author suggests that the more people will check and inspect work after it is done, the less attentive a worker will be because he does not bear responsibility for the outcome. The outcome is tragic when everyone things the same thing - assuming that someone else will catch their errors - and nobody is being particularly attentive to their work.

The author mentions that checklists have been highly successful in aviation - such that all flights use a "pre-flight checklist" that is run through by pilot and copilot before taking off. An interesting observation is that standard procedure is for the junior person to read the list while the senior person does the task - which is highly unusual and somewhat disrespectful for the person who feels his status is diminished by having a junior person check his work - but in practice this proves to be more effective.

He also notes that there is a sense of personal pride in professional work, and being made to conform to a checklist is seen as demeaning - that they are for "other people ... but not me." For that reason, senior people are often fond of checklists, as it enhances their sense of authority and control over others to compel their peers and subordinates to follow a checklist they provided, and which they often ignore in their own work.

Checklist design, meanwhile, is extremely difficult to do well: the designer of a checklist must be able to envision a task (one which he may never have performed) and account for all possibilities and contingencies in creating a comprehensive list that is at the same time not burdensome or inefficient.

The checklist also casts every task into a sequence with a single thread of execution. When tasks can be done in any order, or even at the same time, the checklist forces them into an unnecessary order. It also makes optional steps seem required, and has very little fault tolerance: when a person is unable to complete a step, the task fails.

Users may work around these limitations, but this gives rise to memory-lapse errors. For example, if a plane is being fueled, the pilot must either pause and wait before checking off the "fully fueled" item on the preflight checklist, or check it off presuming it will be done (which violates the entire point), or skip it at the risk of not remembering to come back later.

Social and Institutional Pressures

Social pressure is an important factor that has a very strong influence on everyday behavior. Particularly when the user is an employee of a company, he is likely under tremendous pressure to meet performance goals, and if he succeeds at meeting those goals this year they will be set more aggressively in the following year, to the point at which catastrophes become inevitable.

Consider the logistics industry and truck drivers in particular. They falsify log files so that they can drive more hours per day than the law allows (for the sake of ensuring they get enough rest to be alert on the road) and disable or circumvent devices that control their speed (for the sake of ensuring they can make a delivery on time when schedules are inflexible).

There are many instances in which equipment is running for longer and faster than is safe, or when employees cut corners because safety regulations make it impossible to work as quickly as management demands. And very often, when the front line workers point out the danger to their superiors, they are threatened into following orders.

This is not entirely related to the workplace, as it happens in other situations. Consider that the reason students often cheat on school examinations comes from pressure to make good grades. It may be for fear of disappointing parents, or because they have been told that grades are critical to their future, or merely because they wish to avoid ridicule or ostracism from peers for making poor grades.

The author also mentions laziness and procrastination as the cause of problems. People run out of gas because they do not feel they have the time to stop when the gauge is getting low, or notice it but think they can put it off. Teens and even adults may shun the use of goggles or a safety helmet for fear of looking silly while wearing them - or because it boosts their self-esteem to be able to do without them.

He briefly mentions the self-reinforcing nature of stupidity. A person who leaves his house key under the welcome mat may think it is a perfectly safe and sensible thing to do because he hasn't (yet) been burglarized.

Economics is another factor. The author mentions SCUBA divers, specifically, who wear a combination of weights and air bladders to maintain they buoyancy in deep water. Divers are supposed to drop their weights before they emerge from the water, but many do not because the weights are costly to replace.

There is often little that design can do to overcome social pressures, and neither is training or awareness. It requires people to change their attitudes - because they will alter their devices and ignore their training if they feel pressured to do or refrain from doing something.

Reporting Error

When errors can be recognized and diagnosed, devices and processes can be improved by implementing changes that will decrease errors in future. Observing and collecting information about the occurrence of errors is critical to improving the design.

Unfortunately, not all errors are easy to detect. But even when they are, many people are anxious about making errors (or reporting the errors of others) because they fear being punished or ridiculed. Organizations are just as bad as people in this regard: they fear litigation or a diminishment of their esteem if they admit to making an error. Additionally, there is pressure in many workplaces to avoid or diminish errors, which increases the incentive to deny or cover up.

In all, the author believes that before we can worry about detecting and analyzing errors, the cultural attitude toward them needs to change of one of greater openness and acceptance - rather than seeking to hide or to punish, seek to learn.

The author mentions the attitude called "jidoka" promoted within the Toyota Motor Corporation, which encourages workers to pull a cord to stop the assembly line when they notice something is wrong. A team of efforts is dispatched when this occurs, not merely to do what is necessary to fix the problem, but to investigate its cause and consider how it can be avoided. In this culture, people are punished for failing to report errors rather than for making them.

There is a Japanese term "pokayoke" which refers to measures taken to prevent errors from occurring, and also refers to the device used to do so. A doorstop that prevents the door from being opened too wide (causing the knob to damage the wall) is a pokayoke , as is a label that reminds a user that a given lock must be turned counter-clockwise to open the door, as is the cover that must be flipped open to press a button that cuts the power to a piece of equipment. The author notes that this blends the various techniques he discusses in this book, as some pokayoke are labels, other constraints, and still others are reminders.

Detecting Error

If an error can be discovered quickly, it can be rectified to avoid or minimize harm. So how can we facilitate error detection?

First, errors can be detected only if there is feedback, and the more immediate the feedback, the more quickly the error can be detected. Consider an assembly line operation in which a batch of screws is not properly threaded.

This gives good reason, and cost-justification, for the screws to have been inspected when they were received in the factory, before putting them into supply inventory in the first place. To proceed as if there never will be a batch of bad parts is plainly ignorant.

If the result of an action is not visible, it is much more difficult to detect an error. For example, a memory lapse error occurs when something that should have happened did not happen. In the previous example, we can see that bad screws were loaded into a hopper - but are not able to see when someone forgot to check the parts before sending them to the production line.

Other mistakes are difficult to observe because they are not systematic. Many mistakes happen only once and are not repeated - so by the time they are discovered one cannot observe how they came to be, and everything appears to be running smoothly (because it is running smoothly, and was even running smoothly when the mistake occurred).

Faulty diagnoses compound the problem of unobserved mistakes. In addition to neglecting the cause of the problem, they cause changes to be made in part of the system that was running fine to begin with. The error is not solved, and more errors may now be made as a result of acting on a bad diagnosis.

IGNORING MISTAKES

It's a psychological tendency for people to ignore mistakes when they see them - much as "selective hearing" causes a person to hear what they expect, so does "selective observation." Because they expect everything to be fine, they may not notice when something is going wrong, or to be puzzled by it, or to assume there's something wrong with their senses rather than that which they observe.

Consider that when people hear a sound like a pistol shot, the first thing they think is that it is likely a car's exhaust backfiring. Never mind that backfiring cars are extremely rare, but it seems like a more plausible explanation of a loud sound. They do not wish to consider an option that makes them feel unsafe.

There is also the irrational belief that things will work themselves out. The worker who sees a single malformed product come off an assembly line assumes that there must have been something unusual, or the machine went off-kilter for a second and is probably still OK. He has to see several malfunctions before he will accept that there is something wrong with the equipment and what he witnessed was not merely a temporary glitch in the system.

THE CASE OF THE WRONG TURN ON A HIGHWAY

The author tells an anecdote of a family trip in which they took a wrong turn and ended up on a highway that went to Las Vegas rather than to their destination. They noticed a number of billboards advertising casinos and found it curious that they would be advertising so far away to people driving in the opposite direction - and it was nearly two hours later when they stopped for gas that they recognized they had taken a wrong turn over 100 miles before.

The moral to the story is that people generally assume that they have done the right thing and are proceeding on track to the right destination - and when confronted with signs that they are headed in the wrong direction, they assume the signs are wrong and continue on their merry way. It takes a dramatic indication for them to accept, or even to consider, that they have done something wrong.

IN HINDSIGHT, EVENTS SEEM LOGICAL

The author refers to a psychological study (Fischoff) which contrasted two groups of participants:

This is a natural response, as we count on experience of the past to guide us in predicting what we should expect of the future, but it has a number of drawbacks - chiefly, that we expect the future to be like the past, even if the past was abnormal.

This is the basis of phobias: a person who was once bitten by a dog believes that all dogs are going to bite him, or at least assess the risk that any given dog will bite to be greater. A person who was struck by a car that ran a red light will have heightened anxiety about crossing that street. This also underscores the tendency toward pessimism - as a person who has crossed the same street a thousand times without incident develops anxiety because of what happened once. He may even admit that it is highly improbable, but he fears it.

Insofar as design is concerned, this points to a weakness in planning contingencies. Our first assumption is that the user will do everything correctly and get the outcome he intended. If we try to imagine what might happen in future, our predictions are no better than random chance. If we base our analysis on what happened in the past, we will give undue weight to improbable scenarios that rarely occur.

The author's only advice is to be careful and invest time in doing proper analysis. He specifically mentions news reports and speeches - journalists, politicians, and executives make hasty analyses and make definitive statements based on very little evidence and superficial analysis. As designers, we should recognize that panic is the cause of bad decisions, and exercise patience.

Designing for Error

It takes no skill to "design" a process when you can assume that the user will do exactly what you want them to do in interacting with your device. A better approach is in considering what people might do, and design not only for those actions that will reach the goal directly, but to accommodate choices a user might make that do not lead to success.

Consider the flow of conversation between people - it is seldom like the contrived script of a play in which every statement flows naturally and predictably from the one before and people efficiently progress through a series of exchanges to a desired outcome. Much of conversation is error-handling: we misinterpret things and have to be brought back on track, or need to correct what we've said to keep the other person on track, or find something that they say to be questionable and debate - and this is all still necessary even when we have made no grammatical errors (which are often irrelevant because we listen for meaning and often do not notice minor mistakes in speech).

While machines and computers are very good at responding consistently to commands, they often require the human user to learn to speak their language or to perform an action that feels unnatural or awkward in order to get the device to do what was wanted.

Because machines are not intelligent enough to understand people, people are expected to adapt to machines. However, this is not entirely true: a machine was built by a human being to be used by other human beings. So this is tantamount to stating that the people who build machines are not intelligent enough to understand the people who use them - and so users must adapt to compensate for the incompetence of the builder. In that sense, the purpose of design is to bring a little intelligence to the way in which products are built.

Critical to making devices more intelligent is in making them conversational - giving them the ability to interpret what users want, and to communicate and react to them with some consideration of the way in which they are naturally inclined to act, including the possibility they may be inclined to do things when they wish to achieve that goal.

There is some indication that those who build machines should take responsibility, largely in consumer lawsuits. When a medical technician overdoses a patient with an infusion pump, the manufacturer of the pump does not escape liability for the technician's inability to understand the awkward and haphazard design of their device. They are expected to make their devices understandable and to provide warnings and failsafe measures should someone decide to do the wrong thing.

WARNINGS AND ERROR MESSAGES

Error messages are common in computer systems, and the degree to which users have become familiar with error messages is a testament to how often errors occur as a result of bad design. However, even non-computerized devices have the ability to indicate an error occurred: it is almost universally recognized that a flashing red light means something has happened - generally something bad.

The problem is that "something bad happened" is completely insufficient. It doesn't tell the user what went wrong, how they can continue with what they wanted to do, or how to avoid making the same mistake in future. A good error message does all of these things - because the only thing that matters to the user is getting the task done.

A flashing yellow light is even worse. It means something bad is about to happen - but gives no indication of what it is or what the user might do to avoid it. All that a flashing yellow light does is cause the user to experience anxiety and feel helpless.

The author encourages designers to regard these situations as contingencies, not as errors, and help users to do what is necessary to achieve their goals. With luck, the user will come to recognize it as the same thing - an additional task to accomplish their goal, not a barrier to doing so.

Another common problem with warnings, particularly audio ones, is that they are not sufficiently unique. He mentions the control board of a nuclear power plant, the cockpit of an airplane, or a surgical theater where there are many devices. If everything makes the same (or similar) sound to issue a warning, there is a period of uncertainty as to what needs attention. And if two alarms go off at the same time, there is no sense of which needs to be attended first - it's merely a competition that distracts and confuses the user.

Unnecessary alarms are another design flaw: when a machine "cries wolf" the user ignores it when there is a real problem. In many instances, the author has seen that users disable warnings - disconnect the alarm, unscrew the bulbs, pad the bells, and otherwise attempt to disable warnings that are more often an unnecessary distraction than a real warning.

As such, the design of warning signals is a very complex and delicate matter. You must decide when the user's attention should be called (and not too often), create a stimulus that is adequately intense to attract attention (but not too intense), enable the user to recognize what has gone wrong, and inform the user of what must be done to continue.

There's a brief mention of "machine speech" in which the alerts are delivered in a simulated (or recorder) human voice. That seems like a good idea because "the printer is out of toner" is certainly more informative than a flashing yellow light. But the same concerns apply: a machine should not talks too often, say things that can't be understood, speak too softly to be audible in a noisy environment, etc.

THE CAUSE OF ERRORS

The author regurgitates some of the previous material on mistakes and slips: whether the user chose to do the wrong thing or tried to do the right thing but failed in the process. While many engineers prefer to stop with "the user did something wrong" the truth is that the machine generally encouraged him to do something wrong, or failed to provide sufficient indication of what he should have done that would have been right.

(EN: I've used a conversational model to illustrate this, and have had some success. However, much of the conversation is nonverbal. "I have three buttons. Guess what they do," is a line a machine "tells" the user simply by having three buttons. "I think the first button does what I want" is what the user "tells" the machine by pressing the button.)

ADDRESSING INTERRUPTIONS

A major source of errors is interruption - a person who was in the process of doing something is distracted for a time by something else and forgets where he left off. He probably remembers what he was trying to do (though he may forget it entirely) but does not know how far along he was in the task and where he left off.

A task that is designed to require a user's constant attention is virtually guaranteed to fail if there is an interruption. The user is required to remember exactly where they were in the process of the task and what they needed to do next. Aside of the limitations of short-term memory, this is an unreasonable expectation because most interruptions are sudden and unexpected, and the user doesn't have time to make a mental note of their progress.

A machine that only provides information about the present status does not do enough - particularly because the indication of status tends to be rather vague: "running" or "waiting" does not indicate what has been done and what remains to do. There's no reminder of what the goal was, what was already done, and what the next step happens to be.

One suggested approach is to consider what might happen if a person were to walk away from the task, and ask someone else to "finish this for me" - can the person who stepped in to help out see what needs to be done? Do they have an indication of the goal? Do they have an indication of the status?

The author refers to multitasking, in which people deliberately try to do several things at the same time, which is erroneously believed to be a more efficient way of getting things done. The problem is that multi-tasking isn't actually doing two things at once, but switching attention from one thing to another: you start to do task A, then switch to task B, then go back to A, then go back to B, etc. Each time you switch, you must remember where you were in the task you are returning to - and more, you have to keep the steps of the two tasks separated, so you don't perform a step in task A when you return to task B.

(EN: There have been a number of psychological experiments in this area which come to the same conclusion - that it is more efficient and effective to do A from start to finish and then to address task B rather than an ABABABAAB kind of pattern. Still, some people are not convinced and believe that they are capable of multitasking well.)

(EN: Another thing that the author fails to mention is that processing time encourages people to multitask. If they have to wait five minutes for the device to do something, that's five minutes they can spend doing something else. It seems sensible, though it's touchy. For example, you can safely vacuum the living room floor while waiting for the washing machine to run - but you should not try to vacuum the floor while waiting for the stove to heat oil to frying temperature. I'd agree that this is stupid and arrogant humans, but devices should consider whether their lag time encourages this behavior and what might be done to prevent disaster if the person fails to return soon enough.)

Interruptions and multitasking can be disastrous in critical moments. The FAA identified that many of the accidents that occurred during takeoff and landing were because the pilot was distracted or attempted to do too many things at one time. Creating a process that was linear and insisting that the pilot be undisturbed during takeoff and landing (except in case of emergency) significantly reduced risk.

This can also be practiced in real life. The author mentions that for family trips, there is a "rule" that conversation in the car must stop when the driver is attempting to merge onto a busy freeway or driving in traffic - and can resume when the period of risk has passed. (EN: It's for the same reason many people will switch off the radio when driving in dangerous conditions - to prevent being distracted.)

ADDING CONSTRAINTS TO BLOCK ERRORS

The author previously mentioned constraints that prevent the user from doing the wrong thing - such as making one prong of an electric plug (and socket) slightly larger so that it cannot be plugged in backwards.

Some constraints are merely to make it more difficult to do something incorrect. For example, it would be an easy matter for automobile manufacturers to provide a row of ports to add fluids (brake fluid, washer fluid, radiator coolant, transmission oil, etc.) but the have not done so because this would increase the risk of putting the wrong fluid into a given reservoir, which could be disastrous.

It's suggested that control panels often keep controls for a given operation grouped together, but also separated from other operations. Also, computer programs will often hide or disable controls that are not necessary to the present step of a task.

UNDO

An "undo" function is very powerful and helpful in enabling users to back out of errors (or even confusing situations) and restore a device to its previous or original setting. It's most common in computer software, but even consumer electronics such as television sets have a "back" button in case the user accidentally changed the channel.

In his opinion, the best design is to have multiple levels of undoing along with redoing so that the user can back up, one step at a time, as far as he wants, then go forward if he has gone too far back.

"Undoing" is not always possible - particularly with devices that take actions in the physical world (you cannot "undo" burnt toast) - and there are situations in which you may not wish the user to undo their action (if it was right), but it's a good idea in principle.

(EN: Particularly in digital systems, "undo" should always be possible. It may be difficult to accommodate because it requires the previous states to be held in digital memory, which consumes resources. Or it is not in the interest of the provider - like letting a user easily "undo" ordering a product. It's likely useful to point out that "it is possible, but we don't want them to do it" to clarify requirements.)

CONFIRMATION

Another constraint put in place by some systems is the requirement to confirm an action before it is undertaken, particularly when it will result in an action that is not easily undone (deleting a file, placing an order, etc.)

There is some debate over whether confirmations are essential safety checks or needless nuisances - particularly when a user is alert and attentive, it seems that the system is demanding that they repeat the command they just intended to give. But remove the confirmation, and users will be upset that you didn't warn them that their action would have consequences.

Confirmations are valuable when there is the possibility of making a mistake. Consider working on a computer and closing a window without saving first - all work done is lost. And given that users may have multiple windows open, that they are closed by an action that can be accidental, etc. it makes sense to warn on closing a window that has unsaved changes.

However, if that were taken to an extreme of warning whenever a window is closed, the user would constantly be "confirming" the close of every window, and would become trained to simply click "OK" without paying attention. Confirming too often essentially undermines the value of a confirmation.

The author mentions that few actions should require confirmation - or better still, make more actions reversible. Particularly on computers, many things are made intentionally impossible by programmers who are simply avoiding making an effort. It should be no trouble at all to save a file when a window is closed and keep a list of recently-deleted items that can be restored. Likewise, a user should be able to cancel an order after placing it without having to call (especially when the item hasn't left the building and the "order" is computer data). In the physical world, "undo" is often not possible - but in the digital world, it always possible provided the system has not been rigged to prevent it.

SENSIBILITY CHECKS

A "sensibility check" is a confirmation that is based on conditions: when something seems a bit unusual, the system raises a concern and requires the user to confirm.

For example, if a person keying a financial transaction were to miss a decimal point, a request to transfer $1,000.00 from checking to savings could be read as a request to transfer $100,000 - which is highly unlikely for most people. An ATM receiving such a request would do well to raise an eyebrow, digitally speaking, and ask if the amount is correct.

In the case of a bank account, there is likely a failsafe that would prevent the transaction from going through because there are insufficient funds. But even in that case, a warning before doing something foolish is better (psychologically) than an error message afterward.

MINIMIZING SLIPS

It is generally the assumption that people will always pay close and constant attention to the task for which a device is designed - and that the user is to blame if he fails to do so. Unfortunately, people do not, and in some instances cannot, give their undivided attention to a device.

People who perform a task often also become "skilled" - which means they can do the task without conscious effort, and the ability to do so is linked to a high level of efficiency and ease when a person can "effortlessly" perform a task. It also leads to overconfidence and disregard: consider the things that people are seen to be doing while driving a car - because they feel that they are so accomplished at the task of driving that they need not give it much attention.

(EN: The example of driving is rather fortunate here, because it brings to mind that every other thing in a vehicle must be designed to require little attention. The GPS system, stereo, climate control, or any other feature cannot be designed to distract the driver from driving.)

Some mention is made of feedback - particularly audio feedback that indicates something has been done correctly. Consider the scanners at grocery stores that "beep" as each item is properly scanned - were it not for that sound, cashiers would have to lock their gaze on the display to make sure items are scanned properly.

Cross-checks can also be helpful. The example here is a hospital nursing staff dispensing medication - who must scan both the prescription and the patient's armband and the system will make sure that the nurse is dispensing the right medicine to a specific patient.

THE SWISS CHEESE MODEL OF HOW ERRORS LEAD TO ACCIDENTS

The author mentions the "Swiss cheese" model of accident causation, which suggests that each layer of security or prevention, like holes in a slice of cheese - and where the holes align, there is the opportunity for an accident to occur.

This often leads to problems in troubleshooting, particularly when there is different ownership of the "slices." Those who own slice A suggest it was the responsibility of slice B to prevent an error that got past them, and those who own slice B blame slice A for letting it get through to their layer in the first place.

The notion that there is only one possible cause of an accident (or the tendency to stop investigating when one cause is found) gives rise to erroneous thinking ... "if only" one thing had happened differently, the accident would not have occurred. There may have been several things that could have been done differently to avoid the accident - and there may be several things that need to be corrected to avoid it from happening again.

When Good Design Isn't Enough

For the sake of thorough design, the author advocates making the assumption that people are never at fault for accidents and failures, and to accept that the design is responsible for facilitating success and preventing failure. But the truth is, sometimes people really are at fault.

His long and rambling consideration of this boils down to a few key points:

In general, a person who has the knowledge and skill to perform a task, who takes adequate time and gives adequate attention, and who follows established procedures for working safely and efficiently, is highly unlikely to be the cause of an error.

Resilience Engineering

The concept of "resilience engineering" involves designing to accommodate unusual circumstances. A complex system such as an oil refinery, chemical plant, electric power grid, hospital, or nuclear power plant cannot be built merely to work under ideal conditions but must also continue to function under extreme situations, including disaster scenarios.

In some instances, resiliency requires the product to be altered - a roof is built to withstand hurricane winds up to a certain force and has that property even when there is not a storm. In other instances special procedures are to be used in emergency situations - the same home has shutters the owner must close when a storm is approaching.

The same is true of services. Airport security personnel practice certain procedures daily which are meant to prevent hijackings, but have enhanced procedures when they believe the risk to be higher than usual.

The author speaks of disaster planning, which simulates and emergency situation to test that people remember to follow emergency procedures. It's a good idea that can be helpful in many instances, but a test is never as complex, stressful, or unpredictable as a real disaster - so no recovery plan should be regarded as perfect or foolproof.

The Paradox of Automation

Automation is often embraced for its efficiency and consistency because a machine , unlike a person, does not get tired or become inattentive - it does what it was designed to control without deviation, and it works well whether or not it is being watched and monitored. Or so it is believed.

Unfortunately, this is not true. Machines do what they are designed to do without thinking. This presumes that the design was correct, and that conditions will not change. They will do the wrong thing just as efficiently as they will do the right thing. And they will keep doing what they are designed to do even if conditions make their actions counterproductive or dangerous.

Another drawback to automation is that people place such faith in its perfection that there are often no plans for what to do in the case of failure. The machines go haywire and no-one knows what to do. This is the reason that failures are so difficult to prepare for in automated systems, and why recovery takes a significant amount of time. There are fewer failures, but those that occur tend to be rather huge.

The author also mentions the age-old concern that automation makes people dumber. Schoolteachers warned that using calculators would result in people who could not perform simple calculations in their heads - and this has largely become true, such that cashiers cannot make change when there is a jam in the automated change-dispenser. The potential is a great deal more concerning when there's much at stake: if landing a plane or performing a surgery is automated, such that pilots and doctors forget how to do these tasks, what happens when the machine breaks down?

Fortunately, automation is very poor at replacing knowledge workers for complex tasks, and is generally leveraged for doing menial work - in much the same way as "cruise control" in an automobile relieves the driver of a need to keep pressure on the gas pedal, but does not take over steering the car, and a human driver can easily disable it and take control.

When considering automation, keep in mind the unique capabilities of man as well as machine. People are flexible, versatile, and creative - and machines cannot replace these qualities. However, machines are better than people in terms of their ability to do the exact same thing repeatedly, quickly, and accurately over long periods of time.

Design Principles for Dealing with Error

When people use devices to perform tasks, the net result is a complex man-machine system - and the most common mistake is simply ignoring the human part of the system, assuming it will operate as expected and is exactly as predictable and controllable as the machine part of the system - and often fail to recognize the biological capacities of the human part of the system.

What we call "human error" is most often the result of bad design, in that the engineer failed to consider the qualities and capacities of the human component. And in that sense, the human who made the error was not the operator of a device, but its designer.

The author reiterates some of what was said in the chapter: you must understand the capabilities and capacities of human beings and design in a way that accounts for them. You must use constraints to prevent mistakes. You must plan for contingencies. And so on.

Ultimately, it means that the designer must take responsibility for designing a device that will be used by a person - and ensure that the man-machine system has the best chances of success.