Chapter 4: Perception

Perception is a set of processes by which we recognize and make sense of sensations we receive from environmental stimuli, which is a process that encompasses many psychological phenomena.

The author presents a few optical "illusions" as a means of suggesting that perception is based on the interpretation of stimuli - what at first seems like an amorphous blot becomes recognizable as a familiar shape representing an object we have seen.

From Sensation to Representation

The author relates James Gibson's work in the area of perception: Gibson introduced a number of key concepts:

While seemingly tedious, Gibson's approach draws a clear distinction between reality and our perception of reality.

Consider the hackneyed question of "if a tree falls in a remote forest does it make a sound?" Gibson's model clearly distinguishes the tree, the emission of sound waves, the reception of sound waves, and a perceptual object, enabling us to better consider what we mean by "a sound" - or for that matter "a tree."

In more practical terms, it also enables us to consider the experience of communication media such as radio or television, in which the viewer experiences proximal stimulation and conceives a perceptual object without the actual presence of a distal object, because data is forged in the informational medium.

This approach also speaks to the problem of defining where perception and cognition are divided - specifically that perception is the experience of proximal stimulation and cognition is the formation of a perceptual object.

Perception itself is ephemeral and individual. A person can never again experience the same sight, sound, taste, scent, or feel of an item. They can sample it again, but it is a different instance of perception. We may find it to be a close reproduction of the original sensation, but it is not the same sensation, only a similar one. Nor can it be said that two people experience the same sensation - even if they are receiving the same information at the same time, each of them has an individual sensation, similar in nature to one another in most regards but still essentially distinct.

Perceptual Constancies

The notion that objects grow larger as we approach speaks to raw perception without interpretation, but by means of perceptual constancy we are aware that the object itself has remained the same size, it has merely occupied more of our field of vision as we moved toward it.

Laymen seem to grasp this more readily with the visual sense than with other senses. People express that a sound became louder or a smell became stronger as they approached the distal object, but seem to more readily grasp that the sensation remained unchanged when referring to objects they perceive through the visual sense.

This visual sense is often played upon by optical illusions that place objects of the same width between converging diagonal lines - to the eyes, one object looks larger than the other, but when measured it is found they are exactly the same size.

In addition to size constancy, there is also shape constancy: recognizing that as we move around a three-dimensional object, it is our perception of the object that changes rather than the shape of the object itself. Shape constancy is less developed by size constancy, as evidenced by experiments that ask subjects to match a shape against three choices, identifying the one that is the same shape as the original object, only rotated.

Depth Perception

As creatures that move within an environment, our brains are accustomed to assessing the distance of an object from the body by various means - namely, recognizing the size consistency of an object, we perceive smaller objects to be more distant. This is particularly true when there are two objects of the same size in our field of vision, and we perceive the larger one to be closer to us.

A common misconception is that binocular vision gives us the ability to perceive distance. Even when looking at an environment with one eye closed, subjects are able to correctly assess the relative distance of each object from the body. Monocular depth cues include the relative size, fineness of texture, clarity, location, and motion parallax (the speed at which they increase in size as we approach). There are binocular depth cues (such as the minute way in which each eye must position itself to focus on an object) that are only perceivable when the images seen by two separate eyes are compared and resolved into a perceptual object, but they are by no means our sole method for assessing distance.

Object and Form Perception

The perception of forms can be considered from two fundamental perspectives: viewer-centered representation considers the appearance of an object relative to the viewer, and object-centered representation considers the appearance of the object itself, regardless of the distance and angle from which it is viewed. Both of these feed into the mental representation of the object itself.

Descriptions of objects from a viewer-centered perspective detail their appearance and position from the perspective of the speaker: an object is "three feet away" and "horizontal" or "titled at about a twenty-degree angle downward to the left." Meanwhile, descriptions from an object-centered perception consider the object itself (the desk is rectangular and rounded on one end), in relation to other objects (the pencil is in the middle of the desk), or in relation to its own parts (on each side of the desk are two drawers, a shallow one on top of a deeper one).

People may combine these two approaches in describing the shape and form of an object, though they tend to become more object-centered when describing an imaginary object and more viewer-centered when describing a real one.

In addition to considering the size and shape of an object, our perception tends to organize objects into visual groups. That is to say when we look upon a car, we see the tires, fenders, lights, glass, and other objects that comprise the car, but group all of these objects into a single "car" gestalt and regard them as a singular thing. The "Law of Pragnanz" suggests that it is our tendency to attempt to group items into larger forms so as to perceive fewer objects with more properties. This requires some cognitive function, as the grouping of objects is neither arbitrary nor random, but based on patterns that the mind has stored, and against which it compares what we perceive at any given moment.

Some distinction is made between our consideration of figures and backgrounds. That is, we notice upon entering a room that there is a chair, a window, a table, a lamp, and curtains. We perceive the chair, table, and lamp to be figures within the room, but consider the window and curtains to be part of the background.

This plays into the optical illusion that shows a vase, with the silhouettes of two people facing one another on either side. Subjects often fail to notice the silhouettes because they are perceived as part of the background, which is often ignored when attention is given to the figure of the vase.

Gestalts are commonly applied to familiar environments, such as when we regard the space in which we work as "my desk" and not a collection of the various items that rest upon it. People also apply gestalts in unfamiliar environments to simply the task of becoming oriented. Walking down an unfamiliar street, we see cars (not the individual parts) along the side of the road, people (not distinguishing bodies from clothing from objects they may be carrying), and the sidewalk (not distinguishing each square of cement or the ventilation grates and the like).

While the application of gestalt to the environment is a simple matter that is second nature, and hardly bears consideration for practical purposes, it represents a great complexity in our perception of our environment an the objects within it, in which cognition is a factor: the mind must be active to recognize a group of shapes as being a single object.

Pattern recognition is another factor of perception that bears consideration. W recognize the stamen, pistil, and petals of a plant to recognize it as a daisy, and we recognize the eyes, nose, mouth, hair, and shape of the face to recognize someone we know. This requires not only organizing shapes in the environment into gestalt, but then matching that gestalt against a pattern that resides in memory.

An experiment is mentioned in which subjects associate faces with names, and are then asked to recall the names when shown different images. Some images are of only part of the face (a close-up of an eye), other images are the entire face. Not surprisingly, people are easily able to recall the name when shown the whole face but struggle when they are shown only part of it. When the experiment was repeated using houses instead of faces, people were far more likely to be able to name the house from seeing a part of it than they were to recognize a face - and even did slightly better at recalling names from a part of a house than to seeing the whole image.

Much of our ability to recognize patterns is based on experience - which is little wonder, given that recognition requires previous exposure - but what is noted, particularly in the task of reading, is that experience teaches us to recognize shapes of entire words. While we originally learn to pronounce a word phonetically, letter by letter, we eventually learn the shapes of entire words through experience in seeing them often: we do not need to "sound out" a word or read it letter by letter, but recognize it at a glance.

(EN: I recall there was some experimentation in elementary education that attempted to skip the phonetic approach and instead teach students entire words based on shape - which failed horribly to the detriment of public literacy.)

There is a brief mention of prosopagnosia, a condition in which subjects are unable to recognize faces, which has been associated to damage in the lower temporal lobes. There is some debate as to whether the condition is more closely related to perception or memory.

Theoretical Approaches

Theories of perception fall into two general categories: there are bottom-up theories that begin with each perceptual stimulus which hare combined into higher order perceptions, then there are top-down theories that consider perception as a whole, then reduce it to component stimuli

Bottom-Up Approaches

Perception is a process of pattern-recognition: when we receive stimuli from the environment, we first attempt to match it against an existing mental model based on its properties and context. Mental models are loose enough so that if something does not perfectly match a pattern, it may still be close enough to our model that we can identify it.

For example, consider seeing an engineer's prototype of a concept car - it may not exactly match any car we have ever seen, but it has enough semblance to our understanding of what a car is that we can tell very quickly that it is a car of some kind. This holds true even if there are obvious mismatches, such as lacking headlamps or a windshield.

By one theory, our mind attempts to be efficient by quickly identifying the general nature of things, effectively to exclude them from the things we need to think about, so that we may better focus on the things that we do.

Laboratory experiments are heavily contrived to elicit a desired response - for example, an image is carefully designed to suggest a shape by means of negative space. There is the valid concern that the natural environment is not as structured or contrived - that no context is intentionally superimposed by reality to guide us to a predetermined perceptual goal.

One set of theories maintains that we maintain a mental store of templates or patterns, based on experience, that enable us to group stimuli related to a given phenomenon. Where there is a template in storage, we can recognize a group of stimuli at once rather than having to analyze the sensory stimuli we receive to recognize it. That is, we recognize a given letter such as "A" because we have seen it before, in varying styles and positions, and can remember what it is when we see it again.

There are also prototype theories that maintain what we have is not a template, but a set of general principles by which we define a thing, which integrates the most frequently observed features of a class. For example, we can immediately recognize the letter "A" in a variety of fonts and scripts because each instance of the letter matches the basic features (two vertical lines intersecting at the top, with a crossbar) even when the details change (serifs are added) or the figure is slightly distorted (the letter is italicized).

It's also noted that a prototype can be entirely theoretical - that a person who has not actually seen an example of an object can still recognize it if they have conceptualized it and understand the prototype. (EN: Though I do believe that this depends on the quality of the description. Consider the experience of a child who has been told what a giraffe looks like but has never actually seen one, or an adult who visits a place they have been told about but have never been. There does seem to be a moment of shock and uncertainty when confronted with the genuine article, and the comment that things are not as expected likely reflects the accuracy and granularity of the information that was used to build their mental prototype.)

Yet another approach to perception are feature-matching theories, in which we attempt to match specific aspects of a perceived phenomenon rather than the entire form. By this theory, raw perception notices features, formulates a suggestion of what that object might be, then gives greater attention to additional details to conclude that it is so. In the first match, features that define an object are considered in assessing whether it might be, and in the second match, features that would exclude an object from a class are checked to disqualify it.

For example, two vertical lines with a crossbar lead us to suspect we might be looking at the letter "A", but this conclusion is discarded when we recognize the vertical lines are parallel and we are actually looking a the letter "H."

The recognition-by-components theory suggests that we recognize objects in three-dimensional space by reducing them to geons (geometric shapes such as spheres, cubes, cones, cylinders, and the like) and recognizing the way in which they are interconnected (EN: Much in the way that an "Art 101" student learns to draw.) It's suggested that this is very handy in explaining how we recognize generic objects (a human face) but not very good at explaining how we differentiate within broad categories (knowing one person's face from another's)

Top-Down Approaches

In contrast to the bottom-up approaches, there is a separate school of theory that suggests perception begins with higher-order thinking: that our perception is not merely the assembling of sensory data into patterns, but draws on other sources of information - conceptual constructs that exist within the mind. In that way, perception is not a perfect reflection of reality, but a compromise between what we perceive and what we know, with our mental fiction filling in the gaps in sensory facts. That is to say that what we perceive is the result of a negotiation between what we sense and what we think.

This approach explains the way in which we form perceptions based on incomplete data. The example given is a red octagonal placard with the letters "ST", then a gap where a vine has overgrown the placard, then the letter "P." The reason we recognize this as a stop sign is that our conceptual model fills in the cap in sensory data.

Another example is the use of night vision, when the lighting is so low that we cannot discern the color of things, but see only their physical forms. While we often describe things in terms of their color, we can overcome the lack of sensory data with the conceptual construct: we know a given object is a banana because of its shape, even though we cannot see that it is yellow.

By this theory, the observer quickly forms and tests various hypotheses about the information he receives from the environment, based on three factors:

  1. The sensory data the observer receives from the environment
  2. The knowledge stored in the observer's memory
  3. Inferences that fill the gaps in the data

Perception is therefore an cognitive process, though it occurs at the unconscious level. Were it based on external data alone, without the application of experience and intelligence, we would be dumbfounded my most of what we see in the natural environment.

The bottom-up approaches are rightly criticized for their failure to consider the context of perception - when in reality there are blatantly obvious connections between the context in which an object is perceived (both the physical environment surrounding the object and the mental state of the observer) resulting in significant differences in perception.

Even more striking is a phenomenon known as the configural-superiority effect, in which objects presented in certain configurations are more readily identifiable than if they were viewed in isolation, even though the configuration adds complexity.

By the constructive approach, intelligence is a critical part of perception: we do not perceive what is "out there in the world" with an uninformed eye, but instead apply our intelligence to understand the things we perceive.

Synthesizing the Approaches

Both the top-down and bottom-up theories have been able to generate empirical support through experimentation. But instead of considering whether one or the other is correct, instead consider how they work together:

Taken to extremes, the top-down position would underestimate the value of sensory data. It maintains that perception relies upon memory, but the memory itself has to be resident in order for this to occur. As such, the top-down approach explains how we recognize an object, but not how we come to the knowledge that enables us to recognize it.

That is to say that when we encounter a wholly unfamiliar object, we revert to a bottom-up approach in order to create the construct or memory to which future perceptions will be compared. The first time a person sees an elephant, they observe the details they sense in order to understand what it is, and each subsequent encounter with an elephant or similar creature can then reflect on previous experience in a top-down manner.

In a similar vein, the bottom-up approach may also be useful in refining the construct. In essence, the top-level model may include a construct that indicates all fish have scales, until we observe fish-like creatures that do not have scales. Our choice is either to create a separate class of object, or to allow our bottom-up perceptions to modify the top-down ones - that is, to adjust the model to reflect that many species of fish, but not all, have scales.

Computational Theory of Perception

Marr's computational theory of perception is another compromise between the top-down and bottom-up approaches. IT considers the bottom-up approach without entirely dismissing the importance of prior knowledge in interpreting sensory data.

Working with visual data, Marr considered shape to be discerned by their edges (the borders of the shape), contours (nuances of their visual texture), and regions (areas within a shape that are undifferentiated). His reckoning is that the eyes perceive a two-dimensional shape that the brain translates into three dimensions.

An object is perceived by the eyes in two dimensions, its edges and its fill. The brain then recognizes that there are levels of shading and distortion in the area within the shape, organized into regions and then contours, to imply that it is a three-dimensional form.

Deficits in Perception

Additional understanding of perception can be gleaned from individuals whose perceptual processes differ from the norm.


Agnosia is a sever deficit in the ability to perceive sensory information, and there are various kinds that pertain to different senses, which is attributed to lesions in specific parts of the brain.

People with visual agnosia are believed to have normal sensations of what exists in their field of vision, but cannot recognize what they see. They can describe the shapes before them, and even seem to discern entire objects, but cannot recall the name of the object.

One subject was presented wit ha pair of spectacles - he was able to recognize that there were two round objects joined together by a bar, but struggled to say the name of the object. With some deliberation, he guessed the object was a bicycle. This response is interesting in that it does demonstrate the bottom-up practice of observing individual shapes and their conjugation, and in a sense the response was close (in the sense that a bicycle also consists of two round shapes joined by a bar).

Other agnosias render subjects unable to recognize more than one object at a time, unable to recognize familiar environments, unable to recognize faces, etc. There are even selective agnosias in which a subject recognizes the faces of other people but does not recognize a specific few, including his own.

Auditory agnosia pertains to the ability to recognize sounds, such as the inability to discern between the voices of different people or the inability to remember or even to perceive a specific piece of music as anything but random noise.

In some instances, agnosia has an extreme level of specificity: an individual with otherwise normal skills may routinely be unable to recognize a specific thing (they may recognize most people but blank on a specific one) or a specific class of things (to recognize the difference in faces of animals but not in the faces of humans), which gives evidence to the notion that memories are stored in very specific parts of the brain. But while this would explain the inability to recognize a face for the first time (the place that face is recorded was damaged), it does nopt explain the ability to recognize it a second time (a new recording should in theory have been made outside the damaged area).

While damage to certain parts of the brain seems to explain why some people have agnosia, it fails to suggest why most people do not. That is, we understand people have the ability to recognize specific faces, but no sense of why this should be so.

Color Perception

Various deficiencies in the perception of color, including "color blindness," are more common in men than in women. The author details a number of conditions, such as red-green and blue-yellow colorblindness, all the way to complete colorblindness. He does not, however, relate these conditions to the brain rather than the eye, or expound upon the matter further than to acknowledge it.


A final condition is a selective loss of motion perception, and it is reckoned that individuals with this perception perceive motion as if it were a series of snapshots. This condition is associated to severe bilateral damage to the temporoparietal cortexes.