jim.shamlin.com

9: How Scientific Is It?

The claim of scientific validity is made by many authors and consultants who use "a patina of scientific language in order to create a false credibility," so the authors feel it necessary to disclose the science that they draw upon for the basis of the system they are selling.

What Makes Assessments Scientific

In order to be considered scientific, a methodology must follow the scientific method of investigation. It must begin with a theory, or an open question about a topic of interest and take a logical and objective approach to investigation.

Science also must have no fear of inspection: it should be open and public, not only in its conclusions but in its methods and the study data available for inspection so those to whom it is presented can make an informed judgment about its value, validity, and applicability.

With specific regard to the scientific validity of behavioral assessments, the author suggests for critical qualities:

  1. Objectivity. The assessment must use a method that will not produce biased results. The tools used to gather information should not be biased (e.g., a survey consisting of leading questions) and the sample should be representative of the population (a survey of magazine readers is skewed to the demographics of its readership, and if it is voluntary, it is skewed to people who have strong opinions and wish to exert influence).
  2. Reliability. A scientific experiment should have results that are consistent, at least insofar as the phenomena being studied are assumed to be consistent. That is, a study of one group of subjects should be statistically similar to the study of another group. However, the study of one group over time may reflect changes that occur in the factors of investigation over time.
  3. Validity. A scientific experiment should measure the factors that it purports to measure, which can be demonstrated by the statistical correlation of cause and effect, to the same degree that other methods used to study similar phenomena are statistically valid.
  4. Demographics. Particular to the social sciences and psychology, the authors suggest that a scientific study should not generate different responses based on gender, race, age, or other qualities. Though there may be differences in results due to demographic factors, there are regulatory and ethical dangers of using a method of assessment that is biased to a specific group, particularly when an assessment is used to evaluate employee performance.

The authors suggest that the system they sell satisfies these four criteria.

The Predictive Index (PI)

The PI is intended to measure personality, though there are two distinct approaches to personality itself. Personality is considered to be a matter of reputation (the way that others perceive a person to be),identity (the way that a person perceives himself), and aspiration (the way a person wishes to be perceived)

The PI measures four primary qualities of personality: dominance, extraversion, patience, and formality (the definitions from chapter 7 are repeated here).

The PI considers two behavioral elements that are derived from each of the four primary qualities:

The PI scoring system considers behavior to be derivative of three factors:

Since its inception in 1955, the PI has been used by over 7800 organizations of varying sizes, industries, and locations and the tool has been refined over time. Over one million subjects have undergone a PI assessment, and it has been used for a variety of management purposes: recruiting, development, succession planning, team building, organizational culture change, etc.

Objectivity

The PI method is "generally considered to be scientifically objective." It presents the subject with a two lists of descriptive adjectives and asks them to choose those they feel to describe themselves (self domain) and those that describe how others expect them to behave (self-concept). Comparing these two yields the third domain (synthesis).

Reliability

The PI has often been administered to the same test subjects repeatedly and has demonstrated "acceptable levels" of test-retest reliability.

Statistical measurements of internal consistency have rated the PI as 0.85 - which falls in the generally accepted range of 0.7 and 0.9. The authors concede that scores in the upper end of the range often suffer from some degree of redundancy.

Validity

The PI has been benchmarked against other personality assessments, particularly the Cattell 16PF assessment, and the results have been found to be statistically correlated.

The results of PI have been compared to quantifiable real-world phenomenal (sales performance) and likewise found to be statistically correlated.

Demographics

The PI has been administered to a large number of individuals of different races, genders, and age groups, as well as being used across various countries and cultures. Demographic analysis of test scores indicates that there is no pronounced correlation to high/low scores for any group, and the authors consider their system to be neutral to age, gender, and race. As such it is safe to use in decisions regarding employment and promotions.

The Selling Skills Assessment Tool (SSAT)

The SSAT is an assessment designed to measure skills related to sales through the five stages of customer-focused selling: opening, investigating, presenting, confirming, and positioning (described in chapter 8). The assessment itself consists of 25 items in which the subject is presented a sales scenario and asked to indicate which course of action they would actually take (not necessarily what they feel to be the right answer).

Objectivity

The authors do not use the same standards of objectivity they used to assess the PI. Instead, they merely speak to the number of respondents who have taken the assessment (4,216 salesmen at 216 different firms).

Reliability

The test-retest reliability estimates for the SSAT "typically exceed 0.8", which again falls within the generally accepted range of 0.7 to 0.9. However, the authors assert that the instrument is used in order to change behavior - and once salesmen learn and apply the skills, it changes their SSAT scores

The authors again avoid using the same criteria for internal consistency that were used to assess the PI. Instead, they speak to the bivariate correlation of the test, which is 0.29 (no indication of acceptable range) and suggest that the small number of items in the test (25) are all strongly mathematically correlated to the overall score.

Validity

There is no comparison to similar instruments to validate the SSAT in the same manner that the PI was assessed. Instead, it is asserted that the test was developed by a graduate student pursuing a master's degree in education and who had experience developing similar tests for other professions. It is further asserted that the firm that sells the SSA to clients "has conducted several studies that prove validity" but no details as to those tests or the comparison are provided.

Demographics

"A study" has shown the SSAT to be free from bias according to the criteria used by the EEOC. The test has been administered to mixed groups in terms of gender, ethnicity, age, and education and the authors assert that there is no more than "a trivial amount of variance" among the groups: statistically, there is only a 2.2% correlation between scores and any of these factors.

(EN: In all, the facts presented even by a proponent of the SSAT suggest that it does not meet these criteria as well as the PI - but my sense is that it merits a bit of leeway. IT may seem that the SSAT is less scientific than the PI, but this is likely top be a byproduct of its being smaller, less established, and highly idiosyncratic.)