By T.R.
Introduction
What is meant by the meaning of intelligence? This has ultimately been the most debated question about intelligence throughout the last century or so. By definition, in the English language, intelligence is seen as "the ability to learn, understand, and deal with novel situations." (Kline; Intelligence: The Psychometric View; 1991; pg.
1). Intelligent people are seen as keen, sharp, bright , witty, or astute while the people who are considered unintelligent are stupid, dull, or dim-witted. These views of intelligence are based on the capacity of the individuals intellect. What about the ability of someone to learn, or incorporate intelligence outside of their field of specialization?
Psychologists define intelligence slightly different than does the English language. Psychologists such as Lewis Terman felt that intelligence should be defined in a broader sense to include such things as abstract thinking, self-criticism, and adaptability. Only then would you be able to see whether an individual was truly intelligent or if they were just a scholar in their field.
Many psychologists have developed methods for determining the degree of intelligence within an individual but two types of intelligence tests stand out. They are the Stanford-Binet Intelligence Scale and the Weschler Scales of Intelligence. The Stanford-Binet test originated when Alfred Binet and Theophile Simon developed the Binet-Simon test in 1905. Then, in 1916, Lewis Terman and his colleagues at Stanford extended the test to incorporate Stern's notion of intelligence quotient: MA/CA * 100 = IQ (Which is a ratio of mental age to chronological age which is then multiplied by 100). Thus the Stanford-Binet Intelligence Scale. It has undergone several revisions, the last being in 1986, resulting in the SB IV. There are many differences between the original Stanford-Binet tests and the current ones but the biggest is the use of the deviation I.Q. which will be examined later on. (Sternberg; Encyclopedia of Human Knowledge, vol. II; 1994; pgs. 1033-1034).
The Wechsler Scales of Intelligence currently has three separate scales. One for adults (the Wechsler Adult Intelligence Scale - Revised, or WAIS-R), one for children (the Wechsler Intelligence Scale for Children, third ed., or WISC-III), and one for preschoolers (the Wechsler Preschool and Primary Scale of Intelligence -Revised, or WPPSI-R). The Wechsler Scales originated in 1939 and were used by the U.S. Military in World War II. Since then they have been broken down, adapted, and revised to include the previously mentioned examples.
Another important aspect of psychological intelligence testing involves the psychometric properties such as validity and reliability. Psychologists have spent most of the last century focusing on making tests as valid and reliable as possible. If tests were not valid or reliable there would not be much point in administering them.
1) Measuring Intelligence
A) Stanford-Binet Intelligence Scale
The Stanford-Binet tests originated in 1916. Revised from Alfred Binet's and Theophile Simon's tests, it basically caused the discontinuation of other tests that had been devised by Goddard, Kuhlmann, Wallin, and Yerkes. The focus of Terman's work was mostly on children. For example, tests were administered to more than 1000 children below the age of 14 with a fairly good representative sample. This is compared to a sample of "30 businessmen, 50 high school students, 150 adolescent delinquents, and 150 migrating unemployed men" (Kaufman; Assessing Adolescent and Adult Intelligence; 1990; pg. 6).
The Stanford-Binet tests were revised in 1937 in which two new forms were created: Form L and Form M. In 1960 another revision took place where the best items from Form M and Form L were combined to create Form L-M. In this revision the deviation I.Q.'s were introduced for ages 2-18. In 1972, Form L-M was moderately revised but there were no changes of significance. Then, in 1986, the SB IV was introduced.
The SB IV was fundamentally different from its predecessors in that it was based on a point scale rather than an age scale. Points are awarded for how correct or incorrect an individuals answer is. These tests are selected to measure specific functions rather than to find out how developed a child is for his or her age, as is the case in age scales.
The administration and scoring of the SB IV involves the examiner being very familiar with the guidelines outlined in the technical manual because the responses of the children must be properly interpreted by the examiner for the test to be valid. The SB IV uses an adaptive testing design, which means that after each subtest, the score of the individual is used to route the exam in the appropriate direction for the next subtest. The first subtest is vocabulary, so the entire test is dependant on the individuals vocabulary. This will work fine on other subtests that are closely correlated with vocabulary, but with other subtests, such as copying, where the correlation is low, it will not be as accurate. With each subtest the end is determined when the individual fails to answer three of the last four questions.This is called the ceiling level. (Sternberg; The Encyclopedia of Human Intelligence, vol. II; 1994; pg. 1035)
There are fifteen subtests in the SB IV but only six of these are administered to all ages. They are Vocabulary, Comprehension, Pattern Analysis, Quantitative, Bead Memory, and Memory for Sentences. The other nine tests are administered to certain age groups an not others. For example, Copying is only administered to children 2-13, while Equation Building is given to examinees twelve and older.
The SB IV has its strengths as well as its weaknesses. For example, the tests are rated strongly along lines of reliability and validity, yet at the same time they are difficult to use to monitor changes in individuals over time because each age group performs different subtests. The tests are also time consuming to administer. On the other hand though, the scoring of the test is simpler than on other tests such as the Wechsler tests. (Sternberg; The Encyclopedia of Human Intelligence, vol. II; 1994; pg. 1038)
B) The Wechsler Scales of Intelligence
As noted in the introduction, the Wechsler scales have been divided into three separate tests: the WAIS-R, the WISC-III, and the WPPSI-R. Although the three Wechsler tests are all used today and are all valid and reliable, it is the WAIS-R that stands out above all others. The Wechsler Adult Intelligence Scale-Revised was devised by Wechsler himself (whereas the other two revised Wechsler scales were devised by The Psychological Corporation after his death) and it stands out way above any other intelligence test designed towards adults.
The WAIS-R is divided into eleven subtests, six of which are verbal, and five performance tests. The verbal tests include Information, Digit Span, Vocabulary, Arithmetic, Comprehension, and Similarities. The performance tests include Picture Completion, Picture Arrangement, Block Design, Object Assembly, and Digit Symbol. These eleven elements were selected from numerous tests which were "available in the 1930's, many of which were developed to meet the needs of World War I." (Kaufman; Assessing Adolescent and Adult Psychology; 1990; pg. 65). The results of the WAIS-R must be influenced by educational opportunities and experiences, and thus, ultimately social class. This may appear to cause a bias in the results, but unfortunately in adult life this is often the case when it comes to development of knowledge. (Kline; Intelligence: The Psychometric View; 1991 pg. 53).
Some of the weaker parts of the WAIS-R include the fact that there are few real- life problem solving situations, little relationship between the test and vocational and career interests, and the fact that it does not measure the mature decision making capacities of the adult.
Some of the most positive features of Wechsler's tests for adults were that the content was relevant to adults, interesting, developed on sound theory, and representative of the standard sample. The tests also offer guidance to the examiner on how to interpret the results. The most important part of the tests to us today was the method of scoring. Rather than simply comparing mental age to a standard for the chronological age, Wechsler introduced the deviation I.Q. to the scoring of his tests.
C) The Deviation I.Q.
The deviation I.Q. was introduced with the WISC in 1949. The mean, or average of these scores is 100, with a standard deviation of 15. Approximately 68 percent of the population falls within one standard deviation of the mean (I.Q.'s between 85 and 115); about 95 percent falls within two standard deviations; and almost 99.9 percent fall within three standard deviations (I.Q. between 55 and 145). The implications of this is that if you are rated as having an I.Q. over 130 you are in the top 2.5 percentile of the population, and if your I.Q. is over 145 you are in the top 0.13 percentile. (Kline; Intelligence: The Psychometric View; 1991; pgs. 47-49). The deviation I.Q. allows for comparison between individuals of the same age. Therefore, it is a measure of relative ability rather than absolute ability.
2) Psychometric Properties: Reliability, Validity, and Specificity
A) Reliability
There are two different meanings associated with reliability in intelligence testing. The first refers to reliability over time, and the second refers to the correlation of the items within the test to each other. The first part, reliability over time, refers to the correlation between results if the same individual were to take the same test twice. There are an infinite number of questions to choose from to ask the individual, and if a separate set of questions were asked how close would their scores be? If the reliability here were perfect the correlation would be exactly 1, but in reality a correlation of 0.7 is necessary. The second part, correlation of test subject material, is very important in intelligence testing because obviously if they are not measuring the same thing there must be a flaw in the test. The drawback to having high reliability in this sense is that it comes at the expense of decreased validity. If the test is highly reliable in this respect it means that the subject material is highly specific, and thus does not achieve a broad scope. Therefore it is necessary to watch for the amount of internal reliability in a test to ensure good validity. (Kline; Intelligence: The Psychometric View; 1991; pgs. 44-45).
B) Validity
Validity is the extent to which the outcome of a test relates to what the desired results were, not necessarily quantitative but rather qualitative. In other words: Does the test measure what you want it to measure?
The description of validity within intelligence testing is currently being revised according to Sternberg (1994) so I will offer two different views on it, one from 1985, and the other which is currently emerging and being reviewed.
1985 Standards
The description for validity that is represented in the 1985 Standards for Educational and Psychological Testing refers to "the appropriateness, the meaningfulness, and usefulness of the specific inferences made from test scores. Test validation is the process of accumulating evidence to support such inferences" (Sternberg; The Encyclopedia of Human Intelligence, vol. II; 1994; pgs. 1101-1102). There are three important approaches to the assessment of validity. They are concurrent validity, predictive validity, and construct validity. Concurrent validity is indicated by its similarity to another test taken at the same time (for example, if the test correlates well with an already established test it is considered concurrently valid). Predictive validity is correlated with some future criterion. This is a potentially powerful way of proving validity. For example, if the testing of a child can predict the future performance of that child in university it is a valid test. Construct validity considers a variety of possible hypotheses and tests them all. The more hypotheses that are supported, the more valid is the test. (Kline; Intelligence: The Psychometric View; 1991; pgs. 45-46).
Emerging Consensus
The main idea about validity that is emerging today focuses on construct validity. Many researchers today are linking validity more towards building an argument rather than a means of supporting the argument.
C) Specificity
Specificity, or specific variance, is the amount of material that is both reliable and unique to that subtest. The specificity should also exceed the amount of error variance in a test. How much specificity is considered to be sufficient is hard to determine but according to Cohen (1959) "25% was the amount of specific variance (as long as it exceeds error variance) to warrant subtest-specific interpretation." (Kaufman; Assessing Adolescent and Adult Intelligence; 1990; pg. 255).
Conclusion
In conclusion, it is evident that there are many different issues involved in intelligence testing that must be addressed in several different types of tests. The issues of reliability and validity play an integral role in determining whether or not a test is going to be used do determine I.Q.. Obviously, by the amount that the Stanford-Binet Intelligence Scales are used for children, and the Wechsler Adult Intelligence Scale is used for adults, the two tests must be considered to be the most reliable and valid tests today. Hopefully, in the future there will be tests that define intelligence with more reliability and validity than what is currently being used. Psychologists that are interested in intelligence are, and will always be, striving to achieve better testing procedures.
Send submissions, comments, questions,
and anything else that occurs to you
to me via e-mail at Lord Caliban's House.
This page last updated on March 25, 1997