Chapter 12 of the Book "The 'g' Factor", by Arthur Jensen

Chapter 12 entitled Population Differences In Intelligence: Causal Hypotheses from Arthur Jensen's latest book, The g Factor: The Science of Mental Ability published 1998.

The relationship of the g factor to a number of biological variables and its relationship to the size of the white-black differences on various cognitive tests (i.e., Spearman's hypothesis) suggests that the average white-black difference in g has a biological component. Human races are viewed not as discrete, or Platonic, categories, but rather as breeding populations that, as a result of natural selection, have come to differ statistically in the relative frequencies of many polymorphic genes. The "genetic distances" between various populations form a continuous variable that can be measured in terms of differences in gene frequencies. Racial populations differ in many genetic characteristics, some of which, such as brain size, have behavioral and psychometric correlates, particularly g. What I term the default hypothesis states that the causes of the phenotypic differences between contemporary populations of recent African and European descent arise from the same genetic and environmental factors, and in approximately the same magnitudes, that account for individual differences within each population. Thus genetic and environmental variances between groups and within groups are viewed as essentially the same for both populations. The default hypothesis is able to account for the present evidence On the mean white-black difference in g. There is no need to invoke any ad hoc hypothesis, or a Factor X, that is unique to either the black or the white population. The environmental component of the average g difference between groups is primarily attributable to a host of microenvironmental factors that have biological effects. They result from non-genetic variation in prenatal, perinatal, and neonatal conditions and specific nutritional factors.

The many studies of Spearman's hypothesis using the method of correlated vectors show a strong relationship between the g loadings of a great variety of cognitive tests and the mean black-white differences on those tests. The fact that the same g vectors that are correlated with W-B differences are also correlated (and to about the same degree) with vectors composed of various cognitive tests' correlations with a number of genetic, anatomical, and physiological variables suggests that certain biological factors may be related to the average black-white population difference in the level of g.

The degree to which of each of many different psychometric tests is correlated with all of the other tests is directly related to the magnitude of the test's g loading. What may seem surprising, however, is the fact that the degree to which a given test is correlated with any one of the following variables is a positive function of that test's g loading:

* Heritability of test scores. * Amount of inbreeding depression of test scores. * Heterosis (hybrid vigor, that is, raised test scores, due to outbreeding). * Head size (also, by inference, brain size). * Average evoked potential (AEP) habituation and complexity. * Glucose metabolic rate as measured by PET scan. * Average reaction time to elementary cognitive tasks. * Size of the mean W-B difference on various cognitive tests.

The one (and probably the only) common factor that links all of these non-psychometric variables to psychometric test scores and also links psychometric test scores to the magnitude of the mean W-B difference is the g factor. The critical role of g in these relationships is shown by the fact that the magnitude of a given test's correlation with any one of the above-listed variables is correlated with the magnitude of the W-B difference on that test. For example, Rushtont1' reported a correlation (r = + .48) between the magnitudes of the mean W-B differences (in the American standardization sample) on eleven sub-tests of the WISC-R and the effect of inbreeding depression on the eleven subtest scores of the Japanese version of the WISC. Further, the subtests' g loadings in the Japanese data predicted the American W-B differences on the WISC-R sub-tests with r = .69-striking evidence of the g factor's robustness across different cultures. Similarly, the magnitude of the mean W-B difference on each of seventeen diverse psychometric tests was predicted (with r .71, p < .01) by the tests' correlations with head size (a composite measure of length, width, and circumference).

This association of psychometric tests' g loadings, the tests' correlations with genetic and other biological variables, and the mean W-B differences in test scores cannot be dismissed as happenstance. The failure of theories of group differences in IQ that are based exclusively on attitudinal, cultural, and experiential factors to predict or explain such findings argues strongly that biological factors, whether genetic or environmental in origin, must be investigated. Before examining possible biological factors in racial differences in mental abilities, however, we should be conceptually clear about the biological meaning of the term "race."

THE MEANING OF RACE

Nowadays one often reads in the popular press (and in some anthropology textbooks) that the concept of human races is a fiction (or, as one well-known anthropologist termed it, a "dangerous myth"), that races do not exist in reality, but are social constructions of politically and economically dominant groups for the purpose of maintaining their own status and power in a society. It naturally follows from this premise that, since races do not exist in any real, or biological, sense, it is meaningless even to inquire about the biological basis of any racial differences. I believe this line of argument has five main sources, none of them scientific:

o Heaping scorn on the concept of race is deemed an effective way of combating racism-here defined as the belief that individuals who visibly differ in certain characteristics deemed "racial" can be ordered on a dimension of "human worth" from inferior to superior, and that therefore various civil and political rights, as well as social privileges, should be granted or denied according to a person's supposed racial origin.

o Neo-Marxist philosophy (which still has exponents in the social sciences and the popular media) demands that individual and group differences in psychologically and socially significant traits be wholly the result of economic inequality, class status, or the oppression of the working classes in a capitalist society. It therefore excludes consideration of genetic or biological factors (except those that are purely exogenous) from any part in explaining behavioral differences among humans. It views the concept of race as a social invention by those holding economic and political powers to justify the division and oppression of unprivileged classes.

o The view that claims that the concept of race (not just the misconceptions about it) is scientifically discredited is seen as a way to advance more harmonious relations among the groups in our society that are commonly perceived as "racially" different.

o The universal revulsion to the Holocaust, which grew out of the racist doctrines of Hitler's Nazi regime, produced a reluctance on the part of democratic societies to sanction any inquiry into biological aspects of race in relation to any behavioral variables, least of all socially important ones.

o Frustration with the age-old popular wrong-headed conceptions about race has led some experts in population genetics to abandon the concept instead of attempting candidly to make the public aware of how the concept of race is viewed by most present-day scientists.

Wrong Conceptions of Race. The root of most wrong conceptions of race is the Platonic view of human races as distinct types, that is, discrete, mutually exclusive categories. According to this view, any observed variation among the members of a particular racial category merely represents individual deviations from the archetype, or ideal type, for that "race." Since, according to this Platonic view of race, every person can be assigned to one or another racial category, it naturally follows that there is some definite number of races, each with its unique set of distinctive physical characteristics, such as skin color, hair texture, and facial features. The traditional number has been three: Caucasoid, Mongoloid, and Negroid, in part derived from the pre-Darwinian creationist view that "the races of mankind" could be traced back to the three sons of Noah-Shem, Ham, and Japheth.

The Cause of Biological Variation. All that is known today about the worldwide geographic distribution of differences in human physical characteristics can be understood in terms of the synthesis of Darwinian evolution and population genetics developed by R. A. Fisher, Sewall Wright, Theodosius Dobzhansky, and Ernst Mayr. Races are defined in this context as breeding populations that differ from one another in gene frequencies and that vary in a number of intercorrelated visible features that are highly heritable.

Racial differences are a product of the evolutionary process working on the human genome, which consists of about 100,000 polymorphic genes (that is, genes that contribute to genetic variation among members of a species) located in the twenty-three pairs of chromosomes that exist in every cell of the human body. The genes, each with its own locus (position) on a particular chromosome, contain all of the chemical information needed to create an organism. In addition to the polymorphic genes, there are also a great many other genes that are not polymorphic (that is, are the same in all individuals in the species) and hence do not contribute to the normal range of human variation. Those genes that do produce variation are called polymorphic genes, as they have two or more different forms called alleles, whose codes differ in their genetic information. Different alleles, therefore, produce different effects on the phenotypic characteristic determined by the gene at a particular chromosomal locus. Genes that do not have different alleles (and thus do not have variable phenotypic effects) are said to have gone to fixation; that is, alternative alleles, if any, have long since been eliminated by natural selection in the course of human or mammalian evolution. The physiological functions served by most basic "housekeeping" genes are so crucial for the organism's development and viability that almost any mutation of them proves lethal to the individual who harbors it; hence only one form of the gene is possessed by all members of a species. A great many such essential genes are in fact shared by closely related species; the number of genes that are common to different species is inversely related to the evolutionary distance between them. For instance, the two living species closest to Homo sapiens in evolutionary distance, chimpanzees and gorillas, have at least 97 percent of their genes (or total genetic code) in common with present-day humans, scarcely less than chimps and gorillas have in common with each other. This means that even the very small percentage of genes (<3 percent) that differ between humans and the great apes is responsible for all the conspicuous and profound phenotypic differences observed between apes and humans. The genetic difference appears small only if viewed on the scale of differences among all animal species.

A particular gene's genetic code is determined by the unique sequences of four chemical bases of the DNA, arranged in the familiar double-helix structure of the gene. A change in a gene's code (one base pair), however slight, can produce a new or different allele that manifests a different phenotypic effect. (Many such mutations, however, have no phenotypic effect because of redundancy in the DNA.) Such changes in the DNA result from spontaneous mutation. Though mutations occur at random, some gene loci have much higher mutation rates than others, ranging for different loci from less than one per million to perhaps more than 500 per million sex cells-not a trivial number considering that each male ejaculation contains from 200 to 500 million sperm. While natural or spontaneous mutations have largely unknown causes, aptly referred to as biological "noise," it has been shown experimentally that mutations can result from radiation (X-rays, gamma rays, cosmic rays, and ultraviolet radiation). Certain chemical substances are also mutagenic.

The creation of new alleles by spontaneous mutation along with the recombination of alleles in gametogenesis are essential conditions for the evolution of all forms of life. A new allele with phenotypic effects that decrease an individual's fitness in a given environment, compared to the nonmutated allele that would normally occupy the same chromosomal locus, will be passed on to fewer descendants and will eventually go to extinction. The gene is driven out of existence, so to speak, by losing in the competition with other alleles that afford greater fitness. Biological fitness (also known as Darwinian fitness), as a technical term in evolutionary genetics, refers only to an individual's reproductive success, often defined operationally as the number of surviving fertile progeny of that individual. (A horse mated with a donkey, for example, might produce many surviving offspring, but because they are all sterile, the horse and donkey in this mating have a fitness of zero.) The frequency of a particular gene in all of an individual's relatives is termed the inclusive fitness of that gene. The inclusive fitness of a gene is a measure of its effect on the survival and reproductive success of both the individual bearing the gene and all of the individual's relatives bearing the identical gene. Technically speaking, an individual's biological fitness denotes nothing more than that individual's genetic contribution to the next generation's gene pool relative to the average for the population. The term does not necessarily imply any traits one may deem personally desirable, such as vigor, physical strength, or a beautiful body, although some such traits, to the extent that they are heritable, were undoubtedly genetically selected in the course of evolution only because, we know in retrospect, they enhanced individuals' reproductive success in succeeding generations. The survival of any new allele and its rate of spreading through subsequent generations is wholly a function of the degree to which its phenotypic expression enhances the inclusive fitness of those who inherit the allele. An allele with any advantageous phenotypic effect, in this respect, spreads to an ever-larger part of the breeding population in each successive generation.

New alleles created by mutation are subject to natural selection according to the degree of fitness they confer in a particular environment. Changed environmental conditions can alter the selection pressure for a certain allele, depending on the nature of its phenotypic expression, thereby either increasing or decreasing its frequency in a breeding population. Depending on its fitness in a given environment, it may go to extinction in the population or it may go to fixation (with every member of the population eventually possessing the allele). Many polymorphic gene loci harbor one or another allele of a balanced polymorphism, wherein two or more alleles with comparable fitness values (in a particular environment) are maintained at equilibrium in the population. Thus spontaneous genetic mutation and recombination, along with differential selection of new alleles according to how their phenotypic expression affects inclusive fitness, are crucial mechanisms of the whole evolutionary process. The variation in all inherited human characteristics has resulted from this process, in combination with random changes caused by genetic drift and gene frequency changes caused by migration and intermarriage patterns.

Races as Breeding Populations with Fuzzy Boundaries. Most anthropologists and population geneticists today believe that the preponderance of evidence from both the dating of fossils and the analysis of the geographic distribution of many polymorphic genes in present-day indigenous populations argues that genus Homo originated in Africa. Estimates are that our direct distant hominid precursor split off from the great apes some four to six million years ago. The consensus of human paleontologists (as of 1997) accept the following basic scenario of human evolution.

Australopithecus afarensis was a small (about 3'6"), rather ape-like hominid that appears to have been ancestral to all later hominids. It was bipedal, walking more or less upright, and had a cranial capacity of 380 to 520 cm3 (about the same as that of the chimpanzee, but relatively larger for its overall body size). Branching from this species were at least two lineages, one of which led to a new genus, Homo.

Homo also had several branches (species). Those that were precursors of modern humans include Homo habilis, which lived about 2.5 to 1.5 million years ago. It used tools and even made tools, and had a cranial capacity of 510 to 750 cm3 (about half the size of modern humans). Homo erectus lived about 1.5 to 0.3 million years ago and had a cranial capacity of 850 to 1100 cm3 (about three-fourths the size of modern humans). The first hominid whose fossil remains have been found outside Africa, Homo erectus, migrated as far as the Middle East, Europe, and Western and Southeastern Asia. No Homo erectus remains have been found in Northern Asia, whose cold climate probably was too severe for their survival skills.

Homo sapiens branched off the Homo erectus line in Africa at least 100 thousand years ago. During a period from about seventy to ten thousand years ago they spread from Africa to the Middle East, Europe, all of Asia, Australia, and North and South America. To distinguish certain archaic subspecies of Homo sapiens (e.g., Neanderthal man) that became extinct during this period from their contemporaries who were anatomically modern humans, the latter are now referred to as Homo sapiens sapiens (or Homo s. sapiens); it is this line that branched off Homo erectus in Africa and spread to every continent during the last 70,000 years. These prehistoric humans survived as foragers living in small groups that frequently migrated in search of food.

GENETIC DISTANCE

As small populations of Homo s. sapiens separated and migrated further away from Africa, genetic mutations kept occurring at a constant rate, as occurs in all living creatures. Geographic separation and climatic differences, with their different challenges to survival, provided an increasingly wider basis for populations to become genetically differentiated through natural selection. Genetic mutations that occurred after each geographic separation of a population had taken place were differentially selected in each subpopulation according to the fitness the mutant gene conferred in the respective environments. A great many mutations and a lot of natural selection and genetic drift occurred over the course of the five or six thousand generations that humans were gradually spreading over the globe.

The extent of genetic difference, termed genetic distance, between separated populations provides an approximate measure of the amount of time since their separation and of the geographic distance between them. In addition to time and distance, natural geographic hindrances to gene flow (i.e., the interchange of genes between populations), such as mountain ranges, rivers, seas, and deserts, also restrict gene flow between populations. Such relatively isolated groups are termed breeding populations, because a much higher frequency of mating occurs between individuals who belong to the same population than occurs between individuals from different populations. (The ratio of the frequencies of within/between population matings for two breeding populations determines the degree of their genetic isolation from one another.) Hence the combined effects of geographic separation [or cultural separation], genetic mutation, genetic drift, and natural selection for fitness in different environments result in population differences in the frequencies of different alleles at many gene loci.

There are also other causes of relative genetic isolation resulting from language differences as well as from certain social, cultural, or religious sanctions against persons mating outside their own group. These restrictions of gene flow may occur even among populations that occupy the same territory. Over many generations these social forms of genetic isolation produce breeding populations (including certain ethnic groups) that evince relatively slight differences in allele frequencies from other groups living in the same locality.

When two or more populations differ markedly in allele frequencies at a great many gene loci whose phenotypic effects visibly distinguish them by a particular configuration of physical features, these populations are called subspecies. Virtually every living species on earth has two or more subspecies. The human species is no exception, but in this case subspecies are called races. Like all other subspecies, human races are interfertile breeding populations whose individuals differ on average in distinguishable physical characteristics.

Because all the distinguishable breeding populations of modern humans were derived from the same evolutionary branch of the genus Homo, namely, Homo s. sapiens, and because breeding populations have relatively permeable (non-biological) boundaries that allow gene flow between them, human races can be considered as genetic "fuzzy sets." That is to say, a race is one of a number of statistically distinguishable groups in which individual membership is not mutually exclusive by any single criterion, and individuals in a given group differ only statistically from one another and from the group's central tendency on each of the many imperfectly correlated genetic characteristics that distinguish between groups as such. The important point is that the average difference on all of these characteristics that differ among individuals within the group is less than the average difference between the groups on these genetic characteristics.

What is termed a cline results where groups overlap at their fuzzy boundaries in some characteristic, with intermediate gradations of the phenotypic characteristic, often making the classification of many individuals ambiguous or even impossible, unless they are classified by some arbitrary rule that ignores biology. The fact that there are intermediate gradations or blends between racial groups, however, does not contradict the genetic and statistical concept of race. The different colors of a rainbow do not consist of discrete bands but are a perfect continuum, yet we readily distinguish different regions of this continuum as blue, green, yellow, and red, and we effectively classify many things according to these colors. The validity of such distinctions and of the categories based on them obviously need not require that they form perfectly discrete Platonic categories.

It must be emphasized that the biological breeding populations called races can only be defined statistically, as populations that differ in the central tendency (or mean) on a large number of different characteristics that are under some degree of genetic control and that are correlated with each other through descent from common ancestors who are relatively recent in the time scale of evolution (i.e., those who lived about ten thousand years ago, at which time all of the continents and most of the major islands of the world were inhabited by relatively isolated breeding populations of Homo s. sapiens).

Of course, any rule concerning the number of gene loci that must show differences in allele frequencies (or any rule concerning the average size of differences in frequency) between different breeding populations for them to be considered races is necessarily arbitrary, because the distribution of average absolute differences in allele frequencies in the world's total population is a perfectly continuous variable. Therefore, the number of different categories, or races, into which this continuum can be divided is, in principle, wholly arbitrary, depending on the degree of genetic difference a particular investigator chooses as the criterion for classification or the degree of confidence one is willing to accept with respect to correctly identifying the area of origin of one's ancestors.

Some scientists have embraced all of Homo sapiens in as few as two racial categories, while others have claimed as many as seventy. These probably represent the most extreme positions in the "lumper" and "splitter" spectrum. Logically, we could go on splitting up groups of individuals on the basis of their genetic differences until we reach each pair of monozygotic twins, which are genetically identical. But as any pair of MZ twins are always of the same sex, they of course cannot constitute a breeding population. (If hypothetically they could, the average genetic correlation between all of the offspring of any pair of MZ twins would be 2/3; the average genetic correlation between the offspring of individuals paired at random in the total population is 1/2; the offspring of various forms of genetic relatedness, such as cousins [a preferred match in some parts of the world], falls somewhere between 2/3 and 1/2.) However, as I will explain shortly, certain multivariate statistical methods can provide objective criteria for deciding on the number and composition of different racial groups that can be reliably determined by the given genetic data or that may be useful for a particular scientific purpose. But one other source of genetic variation between populations must first be explained.

Genetic Drift. In addition to mutation, natural selection, and migration, another means by which breeding population may differ in allele frequencies is through a purely stochastic (that is, random) process termed genetic drift. Drift is most consequential during the formation of new populations when their numbers are still quite small. Although drift occurs for all gene loci, Mendelian characters (i.e., phenotypic traits), which are controlled by a single gene locus, are more noticeably affected by drift than are polygenic traits (i.e., those caused by many genes). The reason is purely statistical.

Changes in a population's allele frequencies attributable to genetic drift can be distinguished from changes due to natural selection for two reasons: (1) Many genes are neutral in the sense that their allele frequencies have remained unaffected by natural selection, because they neither increase nor decrease fitness; over time they move across the permeable boundaries of different breeding populations. (2) When a small band of individuals emigrates from the breeding population of origin to found a new breeding population, it carries with it only a random sample of all of the alleles, including neutral alleles, that existed in the entire original population. That is, the allele frequencies at all gene loci in the migrating band will not exactly match the allele frequencies in the original population. The band of emigrants, and of course all its descendants (who may eventually form a large and stable breeding population), therefore differs genetically from its parent population as the result of a purely random process. This random process is called founder effect. It applies to all gene loci. All during the time that genetic drift was occurring, gene mutations steadily continued, and natural selection continued to produce changes in allele frequencies at many loci. Thus the combined effects of genetic drift, mutation, and natural selection ensure that a good many alleles are maintained at different frequencies in various relatively isolated breeding populations. This process did not happen all at once and then cease. It is still going on, but it takes place too slowly to be perceived in the short time span of a few generations.

It should be noted that the phenotypic differences between populations that were due to genetic drift are considerably smaller than the differences in those phenotypic characteristics that were strongly subject to natural selection, especially those traits that reflect adaptations to markedly different climatic conditions, such as darker skin color (thought to have evolved as protection from the tropical sun's rays that can cause skin cancer and to protect against folate decomposition by sunlight), light skin color (to admit more of the ultraviolet rays needed for the skin's formation of vitamin D in northern regions; also because clothing in northern latitudes made dark skin irrelevant selectively and it was lost through random mutation and drift), and globular versus elongated body shape and head shape (better to conserve or dissipate body heat in cold or hot climates, respectively).

Since the genetic drift of neutral genes is a purely random process, and given a fairly constant rate of drift, the differing allele frequencies of many neutral genes in various contemporary populations can be used as a genetic clock to determine the approximate time of their divergence. The same method has been used to estimate the extent of genetic separation, termed genetic distance, between populations.

Measurement and Analysis of Genetic Distance Between Groups. Modern genetic technology makes it possible to measure the genetic distance between different populations objectively with considerable precision, or statistical reliability. This measurement is based on a large number of genetic polymorphisms for what are thought to be relatively neutral genes, that is, genes whose allele frequencies therefore differ across populations more because of mutations and genetic drift than because of natural selection. Population allele frequencies can be as low as zero or as high as 1.0 (as there are certain alleles that have large frequencies in some populations but are not found at all in other populations). Neutral genes are preferred in this work because they provide a more stable and accurate evolutionary "clock" than do genes whose phenotypic characters have been subjected to the kinds of diverse external conditions that are the basis for natural selection. Although neutral genes provide a more accurate estimate of populations' divergence times, it should be noted that, by definition, they do not fully reflect the magnitude of genetic differences between populations that are mainly attributable to natural selection.

The technical rationale and formulas for calculating genetic distance are fully explicated elsewhere. For present purposes, the genetic distance, D, between two groups can be thought of here simply as the average difference in allele frequencies between two populations, with D scaled to range from zero (i.e., no allele differences) to one (i.e., differences in all alleles). One can also think of D as the complement of the correlation coefficient r (i.e., D= 1- r, and r=1- D). This conversion of D to r is especially useful, because many of the same objective multivariate statistical methods that were originally devised to analyze large correlation matrices (e.g., principal components analysis, factor analysis, hierarchical cluster analysis, multidimensional scaling) can also be used to analyze the total matrix of genetic distances (after they are converted to correlations) between a large number of populations with known allele frequencies based on some large number of genes.

The most comprehensive study of population differences in allele frequencies to date is that of the Stanford University geneticist Luigi Luca Cavalli-Sforza and his coworkers. Their recent 1,046-page book reporting the detailed results of their study is a major contribution to the science of population genetics. The main analysis was based on blood and tissue specimens obtained from representative samples of forty-two populations, from every continent (and the Pacific islands) in the world. All the individuals in these samples were aboriginal or indigenous to the areas in which they were selected samples; their ancestors have lived in the same geographic area since no later than 1492, a familiar date that generally marks the beginning of extensive worldwide European explorations and the consequent major population movements. In each of the Stanford study's population samples, the allele frequencies of 120 alleles at forty-nine gene loci were determined. Most of these genes determine various blood groups, enzymes, and proteins involved in the immune system, such as human lymphocyte antigens (HLA) and immunoglobulins. These data were then used to calculate the genetic distance (D) between each group and every other group. (DNA sequencing was also used in separate analyses of some groups; it yields finer genetic discrimination between certain groups than can the genetic polymorphisms used in the main analysis.) From the total matrix of (42 X 41)/2 = 861 D values, Cavalli-Sforza et al. constructed a genetic linkage tree. The D value between any two groups is represented graphically by the total length of the line that connects the groups in the branching tree. (See Figure 12.1.)

The greatest genetic distance, that is, the largest D, is between the five African groups (listed at the top of Figure 12.1) and all the other groups. The next largest D is between the Australian + New Guinean groups and the remaining other groups; the next largest split is between the South Asians + Pacific Islanders and all the remaining groups, and so on. The clusters at the lowest level (i.e., at far right in Figure 12.1) can also be clustered to show the D values between larger groupings, as in Figure 12.2. Note that these clusters produce much the same picture as the traditional racial classifications that were based on skeletal characteristics and the many visible physical features by which non-specialists distinguish "races."

It is noteworthy, but perhaps not too surprising, that the grouping of various human populations in terms of invisible genetic polymorphisms for many relatively neutral genes yields results that are highly similar to the classic methods of racial classification based on directly observable anatomical features.

Another notable feature of the Stanford study is that the geographic distances between the locations of the groups that are less than 5,000 miles apart are highly correlated (r ~.95) with the respective genetic distances between these groups. This argues that genetic distance provides a fairly good measure of the rate of gene flow between populations that were in place before A.D. 1492.

None of the 120 alleles used in this study has equal frequencies across all of the forty-two populations. This attests to the ubiquity of genetic variation among the world's populations and subpopulations.

All of the modern human population studies based on genetic analysis (including analyses based on DNA markers and sequences) are in close agreement in showing that the earliest, and by far the greatest, genetic divergence within the human species is that between Africans and non-Africans (see Figures 12.1 and 12.2).

Cavalli-Sforza et al. transformed the distance matrix to a correlation matrix consisting of 861 correlation coefficients among the forty-two populations, so they could apply principal components (PC) analysis to their genetic data. (PC analysis is similar to factor analysis; the essential distinction between them is explained in Chapter 3, Note 13.) PC analysis is a wholly objective mathematical procedure. It requires no decisions or judgments on anyone's part and yields identical results for everyone who does the calculations correctly. (Nowadays the calculations are performed by a computer program specifically designed for PC analysis.) The important point is that if the various populations were fairly homogeneous in genetic composition, differing no more genetically than could be attributable only to random variation, a PC analysis would not be able to cluster the populations into a number of groups according to their genetic propinquity. In fact, a PC analysis shows that most of the forty-two populations fall very distinctly into the quadrants formed by using the first and second principal components as axes (see Figure 12.3). They form quite widely separated clusters of the various populations that resemble the "classic" major racial groups-Caucasians in the upper right, Negroids in the lower right, Northeast Asians in the upper left, and Southeast Asians (including South Chinese) and Pacific Islanders in the lower left. The first component (which accounts for 27 percent of the total genetic variation) corresponds roughly to the geographic migration distances (or therefore time since divergence) from sub-Saharan Africa, reflecting to some extent the differences in allele frequencies that are due to genetic drift. The second component (which accounts for 16 percent of the variation) appears to separate the groups climatically, as the groups' positions on PC2 are quite highly correlated with the degrees latitude of their geographic locations. This suggests that not all of the genes used to determine genetic distances are entirely neutral, but at least some of them differ in allele frequencies to some extent because of natural selection for different climatic conditions. I have tried other objective methods of clustering on the same data (varimax rotation of the principal components, common factor analysis, and hierarchical cluster analysis). All of these types of analysis yield essentially the same picture and identify the same major racial groupings.

African-Americans. The first Africans arrived in North America in 1619 and for more than two centuries thereafter, mostly between 1700 and 1800, the majority of Africans were brought to America as slaves. The end to this involuntary migration came between 1863 and 1865, with the Emancipation Proclamation. Nearly all of the Africans who were enslaved came from sub-Saharan West Africa, specifically the coastal region from Senegal to Angola. The populations in this area are often called West African or North West and Central West Bantu.

Steadily over time, the real, but relatively low frequency of cross-mating between blacks and whites produced an infusion of Caucasoid genes into the black gene pool. As a result, the present-day population of black Americans is genetically different from the African populations from whom they descended. Virtually 100 percent of contemporary black Americans have some Caucasian ancestry. Most of the Caucasian genes in the present-day gene pool of black Americans entered the black gene pool during the period of slavery.

Estimates of the proportion of Caucasoid genes in American blacks are based on a number genetic polymorphisms that have fairly high allele frequencies in the European population but zero or near-zero frequencies in the West African population, or vice versa. For any given allele, the estimated proportion (M) of white European ancestry in American blacks is obtained by the formula M =(qB-qAf)/qW-qAf) where qB is the given allele's frequency in the black American population, qAf is its frequency in the African population, and qW is its frequency in the white European population. The average value of M is obtained over each of twenty or so genes with alleles that are unique either to Africans or to Europeans. The largest studies, which yield estimates with the greatest precision, give mean values of M close to 25 percent, with a standard error of about 3 percent. This is probably the best estimate for the African-American population overall. However, M varies across different regions of the United States, being as low as 4 percent to 10 percent in some southeastern States and spreading out in a fan-shaped gradient toward the north and the west to reach over 40 percent in some northeastern and northwestern states. Among the most typical and precise estimates of M are those for Oakland, California (22.0 percent) and Pittsburgh, Pennsylvania (25.2 percent). This regional variation in M reflects the pattern of selective migration of blacks from the Deep South since the mid-nineteenth century. Gene flow, of course, goes in both directions. In every generation there has been a small percentage of persons who have some African ancestry but whose ancestry is predominantly Caucasian and who permanently "pass as white." The white American gene pool therefore contains some genes that can be traced to Africans who were brought over as slaves (estimated by analyses of genetic polymorphisms to be less than 1 percent).

Genetic Distance and Population Differences in g. The preceding discourse on the genetics of populations is germane to any discussion of population differences in g. The differences in gene frequencies that originally created different breeding populations largely explain the physical phenotypic differences observed between populations called races. Most of these differences in visible phenotypic characteristics are the result of natural selection working over the course of human evolution. Selection changes gene frequencies in a population by acting directly on any genetically based phenotypic variation that affects Darwinian fitness for a given environment. This applies not only to physical characteristics, but also to behavioral capacities, which are necessarily to some degree a function of underlying physical structures. Structure and function are intimately related, as their evolutionary origins are inseparable.

The behavioral capacities or traits that demonstrate genetic variation can also be viewed from an evolutionary perspective. Given the variation in allele frequencies between populations for virtually every known polymorphic gene, it is exceedingly improbable that populations do not differ in the alleles that affect the structural and functional basis of heritable behavioral traits. The empirical generalization that every polygenic physical characteristic that shows differences between individuals also shows mean differences between populations applies to behavioral as well as physical characteristics. Given the relative genetic distances between the major racial populations, one might expect some behavioral differences between Asians and Europeans to be of lesser magnitude than those between these groups and sub-Saharan Africans.

The behavioral, psychological, or mental characteristics that show the highest g loadings are the most heritable and have the most biological correlates (see Chapter 6) and are therefore the most likely to show genetic population differences. Because of the relative genetic distances, they are also the most likely to show such differences between Africans (including predominantly African descendants) and Caucasians or Asians.

Of the approximately 100,000 human polymorphic genes, about 50,000 are functional in the brain and about 30,000 are unique to brain functions. The brain is by far the structurally and functionally most complex organ in the human body and the greater part of this complexity resides in the neural structures of the cerebral hemispheres, which, in humans, are much larger relative to total brain size than in any other species. A general principle of neural organization states that, within a given species, the size and complexity of a structure reflect the behavioral importance of that structure. The reason, again, is that structure and function have evolved conjointly as an integrated adaptive mechanism. But as there are only some 50,000 genes involved in the brain's development and there are at least 200 billion neurons and trillions of synaptic connections in the brain, it is clear that any single gene must influence some huge number of neurons-not just any neurons selected at random, but complex systems of neurons organized to serve special functions related to behavioral capacities.

It is extremely improbable that the evolution of racial differences since the advent of Homo sapiens excluded allelic changes only in those 50,000 genes that are involved with the brain.

Brain size has increased almost threefold during the course of human evolution, from about 500 cm3 in the australopithecenes to about 1,350 cm3 (the present estimated worldwide average) in Homo sapiens. Nearly all of this increase in brain volume has occurred in connection with those parts of the cerebral hemispheres associated with cognitive processes, particularly the prefrontal lobes and the posterior association areas, which control foresight, planning, goal-directed behavior, and the integration of sensory information required for higher levels of information processing. The parts of the brain involved in vegetative and sensorimotor functions per se differ much less in size, relative to total brain size, even between humans and chimpanzees than do the parts of the brain that subserve cognitive functions. Moreover, most of the evolutionary increase in brain volume has resulted not from a uniform increase in the total number of cortical neurons per Se, but from a much greater increase in the number and complexity of the interconnections between neurons, making possible a higher level of interneuronal communication on which complex information processing depends. Although the human brain is three times larger than the chimpanzee brain, it has only 1.25 times as many neurons; the much greater difference is in their degree of arborization, that is, their number of synapses and interconnecting branches.

No other organ system has evolved as rapidly as the brain of Homo sapiens, a species that is unprecedented in this respect. Although in hominid evolution there was also an increase in general body size, it was not nearly as great as the increase in brain size. In humans, the correlation between individual differences in brain size and in stature is only about + .20. One minus the square of this relatively small correlation, which is .96, reflects the proportion of the total variance in brain size that cannot be accounted for by variation in overall body size. Much of this residual variance in brain size presumably involves cognitive functions.

Bear in mind that, from the standpoint of natural selection, a larger brain size (and its corresponding larger head size) is in many ways decidedly disadvantageous. A large brain is metabolically very expensive, requiring a high-calorie diet. Though the human brain is less than 2 percent of total body weight, it accounts for some 20 percent of the body's basal metabolic rate (BMR). In other primates, the brain accounts for about 10 percent of the BMR, and for most carnivores, less than 5 percent. A larger head also greatly increases the difficulty of giving birth and incurs much greater risk of perinatal trauma or even fetal death, which are much more frequent in humans than in any other animal species. A larger head also puts a greater strain on the skeletal and muscular support. Further, it increases the chances of being fatally hit by an enemy's club or missile. Despite such disadvantages of larger head size, the human brain, in fact, evolved markedly in size, with its cortical layer accommodating to a relatively lesser increase in head size by becoming highly convoluted in the endocranial vault. In the evolution of the brain, the effects of natural selection had to have reflected the net selective pressures that made an increase in brain size disadvantageous versus those that were advantageous. The advantages obviously outweighed the disadvantages to some degree or the increase in hominid brain size would not have occurred.

The only conceivable advantage to an increase in the size and complexity of the brain is the greater behavioral capacity this would confer. This would include: the integration of sensory information, fine hand-eye coordination, quickness of responding or voluntary response inhibition and delayed reaction depending on the circumstances, perceiving functional relationships between two things when only one or neither is physically 'present, connecting past and future events, learning from experience, generalization, far transfer of learning, imagery, intentionality and planning, short-term and long-term memory capacity, mentally manipulating objects without need to handle them physically, foresight, problem solving, use of denotative language in vocal communication, as well as all of the information processes that are inferred from performance on what were referred to in Chapter 8 as "elementary cognitive tasks." These basic information processes are involved in coping with the natural exigencies and the contingencies of humans' environment. An increase in these capabilities and their functional efficiency are, in fact, associated with allometric differences in brain size between various species of animals, those with greater brain volume in relation to their overall body size generally displaying more of the kinds of capabilities listed above. The functional efficiency of the various behavioral capabilities that are common to all members of a given species can be enhanced differentially by natural selection, in the same way (though probably not to the same degree) that artificial selection has made dogs of various breeds differ in propensities and trainability for specific types of behavior.

What kinds of environmental pressures encountered by Homo erectus and early Homo sapiens would have selected for increased size and complexity of the brain? Evolutionists have proposed several plausible scenarios. Generally, a more complex brain would be advantageous in hunting skill, cooperative social interaction, and the development of tool use, followed by the higher-order skill of using tools to make other tools, a capacity possessed by no contemporary species other than Homo sapiens.

The environmental forces that contributed to the differentiation of major populations and their gene pools through natural selection were mainly climatic, but parasite avoidance and resistance were also instrumental. Homo sapiens evolved in Africa from earlier species of Homo that originated there. In migrating from Africa and into Europe and Asia, they encountered highly diverse climates. These migrants, like their parent population that remained in sub-Saharan Africa, were foragers, but they had to forage for sustenance under the highly different conditions of their climatically diverse habitats. Foraging was possible all during the year in the tropical and subtropical climates of equatorial regions, while in the more northern climate of Eurasia the abundance of food that could be obtained by hunting and gathering greatly fluctuated with the seasons. This necessitated the development of more sophisticated techniques for hunting large game, requiring vocal communication and cooperative efforts (e.g., by ambushing, trapping, or corralling), along with foresight in planning ahead for the preservation, storage, and rationing of food in order to survive the severe winter months when foraging is practically impossible. Extreme seasonal changes and the cold climate of the northern regions (now inhabited by Mongoloids and Caucasians) also demanded the ingenuity and skills for constructing more permanent and sturdy dwellings and designing substantial clothing to protect against the elements. Whatever bodily and behavioral adaptive differences between populations were wrought by the contrasting conditions of the hot climate of sub-Saharan Africa and the cold seasons of northern Europe and northeast Asia would have been markedly intensified by the last glaciation, which occurred approximately 30,000 to 10,000 years ago, after Homo sapiens had inhabited most of the globe. During this long period of time, large regions of the Northern Hemisphere were covered by ice and the north Eurasian winters were far more severe than they have ever been for over 10,000 years.

It seems most plausible, therefore, that behavioral adaptations of a kind that could be described as complex mental abilities were more crucial for survival of the populations that migrated to the northern Eurasian regions, and were therefore under greater selection pressure as fitness characters, than in the populations that remained in tropical or subtropical regions.

Climate has also influenced the evolution of brain size apparently indirectly through its direct effect on head size, particularly the shape of the skull. Head size and shape are more related to climate than is the body as a whole. Because the human brain metabolizes 20 percent of the body's total energy supply, it generates more heat in relation to its size than any other organ. The resting rate of energy output of the average European adult male's brain is equal to about three-fourths that of a 100-watt light bulb. Because temperature changes in the brain of only four to five degrees Celsius, are seriously adverse to the normal functioning of the brain, it must conserve heat (in a cold environment) or dissipate heat (in a hot environment). Simply in terms of solid geometry, a sphere contains a larger volume (or cubic capacity) for its total surface area than does than any other shape. Conversely, a given volume can be contained in a sphere that has a smaller surface area than can be contained by a non-spherical shape with the same surface area (an elongated oval shape, for instance). Since heat radiation takes place at the surface, more spherical shapes will radiate less heat and conserve more heat for a given volume than a non-spherical shape, and less spherical shapes will lose more heat by radiation. Applying these geometric principles to head size and shape, one would predict that natural selection would favor a smaller head with a less spherical (dolichocephalic) shape because of its better heat dissipation in hot climates, and would favor a more spherical (brachycephalic) head to accommodate a larger volume of brain matter with a smaller surface area because of its better heat conservation in cold climates. (The dolichocephalic-brachycephalic dimension is related to the head's width:length ratio, known as the cephalic index.) In brief, a smaller, dolichocephalic cranium is advantageous for thermoregulation of the brain in a hot climate, whereas a larger, brachycephalic cranium is advantageous in a cold climate. In the world's populations, head breadth is correlated about +.8 with cranial capacity; head length is correlated about +.4.

Evidence that the average endocranial volume of various populations is related to cranial shape and that both phenomena are, in some part, adaptations to climatic conditions in different regions has been shown by physical anthropologist Kenneth Beals and his co-workers. They amassed measurements of endocranial volume in modern humans from some 20,000 individual crania collected from every continent, representing 122 ethnically distinguishable populations. They found that the global mean cranial capacity for populations in hot climates is 1,297 ± 10.5 cm3 for populations in cold and temperate climates it is 1,386 ± 6.7 cm3, a highly significant (p < l0-4) difference of 89 cm3. Beals also plotted a correlation scatter diagram of the mean cranial capacity in cm3 of each of 122 global populations as a function of their distance from the equator (in absolute degrees north or south latitude). The Pearson correlation between absolute distance from the equator and cranial capacity was r = +.62 (p < 10-5). (The regression equation is: cranial capacity = 2.5 cm3 X {degrees latitude} + 1257.3 cm3; that is, an average increase of 2.5 cm3 in cranial capacity for every 1 degree increase in latitude.) The same analysis applied to populations of the African-Eurasian landmass showed a cranial capacity X latitude correlation of + .76 (p < 10-4) and a regression slope of 3.1 cm3 increase in cranial capacity per every 1 degree of absolute latitude in distance from the equator. The indigenous populations of North and South American continents show a correlation of + .44 and a regression slope of 1.5; the relationship of cranial capacity to latitude is less pronounced in the New World than in the Old World, probably because Homo sapiens inhabited the New World much more recently, having migrated from Asia to North America only about 15,000 years ago, while Homo sapiens have inhabited the African and Eurasian continents for a much longer period.

RACIAL DIFFERENCES IN HEAD/BRAIN SIZE

Are the climatic factors associated with population differences in cranial capacity, as summarized in the preceding section, reflected in the average cranial or brain-size measurements of the three broadest contemporary population groups, generally termed Caucasoid (Europeans and their descendants), Negroid (Africans and descendants), and Mongoloid (Northeast Asians and descendants)? A recent comprehensive review summarized the worldwide literature on brain volume in cm3 as determined from four kinds of measurements: (a) direct measurement of the brain obtained by autopsy, (b) direct measurement of endocranial volume of the skull, (c) cranial capacity estimated from external head measurements, and (d) cranial capacity estimated from head measurements and corrected for body size. The aggregation of data obtained by different methods, based on large samples, from a number of studies tends to average-out the sampling error and method effects and provides the best overall estimates of the racial group means in head/brain size measurements. The results of this aggregation are shown in Table 12.1.

Probably the technically most precise data on brain size for American whites and blacks were obtained from a study of autopsied brains by a team of experts at the Case-Western Reserve University's Medical School in Cleveland, Ohio. It measured the autopsied brains of 811 whites and 450 blacks matched for mean age (sixty years). Subjects with any brain pathology were excluded from the study. The same methods were used to remove, preserve, and weigh the brains for all subjects. The results for each race X sex group are shown in Table 12.2. As the total sample (N = 1,261) ranged in age from 25 to 80 years, with a mean of 60 years in both racial groups, it was possible to estimate (by regression) the mean brain weight for each race X sex group at age 25 based on all of the data for each group (shown in the last column of Table 12.2). For the mean height-adjusted brain weight, the W-B difference in standard deviation units is 0.76s for males, 0.78s for females. (The actual height-adjusted W-B differences are 102 g for males and 95 g for females.) Neurologically, a difference of 100 g in brain weight corresponds to approximately 550 million cortical neurons. But this average estimate ignores any sex differences in brain size and density of cortical neurons.

Note that for each racial group the sexes differ in brain weight by about 130 g, which is about 30 g more than the average racial difference. This presents a paradox, because while brain size is correlated with IQ, there is little or no sex difference in IQ (even the largest IQ differences that have been claimed by anyone are much smaller than would be predicted by the sex difference in brain size). Attempts to explain this paradox amount to plausible speculations. One thing seems certain: Because of the small correlation (about .20) between brain size and body size, the sex difference in brain volume and weight can be only partially accounted for by the regression of brain size on body size. The resolution of this paradox may come from the evidence that females have a higher density of neurons in the posterior temporal cortex, which is the major association area and is involved in higher thought processes. Females have 11 percent more neurons per unit volume than do males, which, if true for the brain as a whole, would more than offset the 10 percent male-female difference in overall brain volume. This sex difference in neuronal packing density is considered a true sexual dimorphism, as are the sex differences in overall body size, skeletal form, the proportion and distribution of body fat, and other secondary sexual characteristics. Sexual dimorphism is seen throughout the animal kingdom and in many species is far more extreme than in Homo sapiens. I have not found any investigation of racial differences in neuron density that, as in the case of sex differences, would offset the racial difference in brain weight or volume. Until doubts on this point are empirically resolved, however, interpretations of the behavioral significance of the racial difference in brain size remain tentative. One indication that the race difference in brain weight is not of the same nature as the sex difference is that the allometric ratio of brain weight (in g) to body weight (in kg) is less similar between the racial groups than between the sexes within each racial group.

Also, we must take into account the fact that, on average, about 30 percent of total adult female body weight is fat, as compared to 15 percent for males. Because body fat is much less innervated than muscle tissue, brain size is more highly correlated with fat-free body weight than with total body weight. Statistically controlling for fat-free body weight (instead of total body weight) has been found to reduce the sex difference in head circumference by about 77 percent, or about three times as much as controlling for total body weight. Because head circumference is an imperfect proxy for brain size, the percentage reduction of the sex difference in directly measured brain volume (or weight) that would be achieved by controlling for fat-free weight will be uncertain until such studies are performed. Measuring fat-free body weight should become routine in the conduct of brain-size studies based on autopsied brains or on in vivo brain measurements obtained by imaging techniques.

The white-black difference in head/brain size is significant in neonates (about 0.4s difference in head circumference) and within each racial group head size at birth is correlated (about +.13) with IQ at age seven years, when the average within-groups correlation with IQ is +.21. A retrospective study of two groups of seven-year-old children, those with IQ < 80 and those with IQ > 120 were found to have differed by 0.5s in head circumference measured at one year of age. Also, small head size measured at eight months has been found to interact most unfavorably with birth weight; infants with very low birth weight who had subnormal head size at eight months had an average IQ about nine points (0.6s) lower at school age than did infants of comparable birth weight but with normal head size (corrected for prematurity).

I have not found an estimate of the heritability of directly measured brain size. However, the heritability, h2, of cranial capacity (estimated by formula from head length, width, and circumference) based on Falconer' s formula [h2=rMZ-rDZ) applied to 107 MZ twin pairs and 129 DZ twin pairs ranged widely for different race X sex subgroups, for a within-subgroup average of .19. When the estimates of cranial capacity were adjusted for age, stature, and weight, the h2 values averaged 0.53. The narrow h2 (i.e., the proportion of the total variance attributable only to additive genetic effects) of various head measurements determined in a Caucasoid sample (Bulgarians) by the midparent X offspring correlation (all offspring over fifteen years of age) were: length .37, height .33, breadth .46, circumference .52. All of these estimates of the heritability of cranial size indicate a considerable amount of nongenetic (or environmental) variance, at least as much as for IQ. Moreover, much more of the nongenetic variance is within-families (i.e., unshared among siblings reared together) than is between-families (shared) variance. This implies that shared environmental effects, such as those associated with parents' education, occupation, and general socioeconomic level, are not the major source of variance in cranial capacity as estimated from head measurements. Also, what little evidence we have suggests that the total environmental variance in head measurements is greater for blacks than for whites. (The nature of these environmental influences is discussed later in this chapter.)

Implications of Brain Size for IQ Differences. Chapter 6 reviewed the major evidence showing that head measurements and brain size itself are significantly correlated with IQ. The only available correlations for blacks are based on head length, width, and circumference (and cranial capacity estimated by formula from these measurements); as yet there are no reported correlations between IQ and directly measured brain size for blacks. However, the head measurements are significantly correlated with IQ for age-matched whites and blacks, both on raw measurements and on measurements corrected for height and weight, although the correlations are somewhat lower in blacks. Longitudinal data show that the head circumference X IQ correlation significantly increases between ages 4 and 7, and cross-sectional data indicate that the correlation gradually increases up to 15 years of age, by which time the average growth curves for head size and brain size have reached asymptote.

It is especially important to note that for both racial groups the head size X IQ correlation exists within-families as well as between-families, indicating an intrinsic, or functional, relationship, as explained in Chapter 6. Equally important is the fact that within each sex, whites and blacks share precisely one and the same regression line for the regression of head size on IQ. When blacks and whites are perfectly matched for true-score IQ (i.e., IQ corrected for measurement error), either at the black mean or at the white mean, the overall average W-B difference in head circumference is virtually nil, as shown in Table 12.3.

Taken together, these findings suggest that head size and IQ are similarly related to IQ for both blacks and whites. Although matching blacks and whites for IQ virtually eliminates the average difference in head size, matching the groups on head size does not equalize their IQs. This is what we in fact should expect if brain size is only one of a number of brain factors involved in IQ. When matched on IQ, the groups are thereby also equal on at least one of these brain factors, in this case, size. But when black and white groups are matched on head or brain size, they still differ in IQ, though to a lesser degree than in unmatched or representative samples of each population.

The black-white difference in head/brain size is also related to Spearman's hypothesis. A study in which head measurements were correlated (within racial groups) with each of seventeen diverse psychometric tests showed that the column vector of seventeen correlations was rank-order correlated + .64 (p < .01) with the corresponding vector composed of each test's g loading (within groups). In other words, a test's g loading significantly predicts the degree to which that test is correlated with head/brain size. We would also predict from Spearman's hypothesis that the degree to which each test was correlated with the head measurements should correlate with the magnitude of the W-B difference on each test. In fact, the column vector of test X head-size correlations and the vector of standardized mean W-B differences on each of the tests correlate + .51 (p < .05).

From the available empirical evidence, we can roughly estimate the fraction of the mean IQ difference between the black and white populations that could be attributed to the average difference in brain size. As noted in Chapter 6, direct measurements of in vivo brain size obtained by magnetic resonance imaging (MRI) show an average correlation with IQ of about + .40 in several studies based on white samples. Given the reasonable assumption that this correlation is the same for blacks, statistical regression would predict that an IQ difference equivalent to 1s would be reduced by 0.4s, leaving a difference of only 0.6s, for black and white groups matched on brain size. This is a sizable effect. As the best estimate of the W-B mean IQ difference in the population is equivalent to 1.ls or 16 IQ points, then 0.40 X 16=6 IQ points of the black-white IQ difference would be accounted for by differences in brain size. (Slightly more than 0.4s would predictably be accounted for if a hypothetically pure measure of g could be used.) Only MRI studies of brain size in representative samples of each population will allow us to improve this estimate.

Other evidence of a systematic relationship between racial differences in cranial capacity and IQ comes from an "ecological" correlation, which is commonly used in epidemiological research. It is simply the Pearson r between the means of three or more defined groups, which disregards individual variation within the groups. Referring back to Table 12.1, I have plotted the median IQ of each of the three populations as a function of the overall mean cranial capacity of each population. The median IQ is the median value of all of the mean values of IQ reported in the world literature for Mongoloid, Caucasoid, and Negroid populations. (The source of the cranial capacity means for each group was explained in connection with Table 12.1.) The result of this plot is shown in Figure 12.4. The regression of median IQ on mean cranial capacity is almost perfectly linear, with a Pearson r = +.998. Unless the data points in Figure 12.4 are themselves highly questionable, the near-perfect linearity of the regression indicates that IQ can be regarded as a true interval scale. No mathematical transformation of the IQ scale would have yielded a higher correlation. Thus it appears that the central tendency of IQ for different populations is quite accurately predicted by the central tendency of each population's cranial capacity.

POPULATION DIFFERENCES IN g: THE DEFAULT HYPOTHESIS

Consider the following items of evidence: the many biological correlates of g; the fact that among all of the psychometric factors in the domain of cognitive abilities the g factor accounts for the largest part of the mean difference between blacks and whites; the evolutionary history of Homo sapiens and the quantitative differentiation of human populations in allele frequencies for many characteristics, including brain size, largely through adaptive selection for fitness in highly varied climates and habitats; the brain evolved more rapidly than any other organ; half of humans' polymorphic genes affect brain development; the primary evolutionary differentiation and largest genetic distance between human populations is that between the African populations and all others; the intrinsic positive correlation between brain size and measures of g; the positive mean white-black difference in brain size; the positive correlation between the variable heritability of individual differences in various measures of cognitive abilities and the variable magnitudes of their g loadings. All these phenomena, when viewed together, provide the basis for what I shall call the default hypothesis concerning the nature of population or racial differences in g.

Although we are concerned here with variation between populations, it is also important to keep in mind that, from an evolutionary perspective, it is most unlikely that there are intraspecies differences in the basic structural design and operating principles of the brain. The main structural and functional units of the brain found in any one normal human being should be validly generalizable to all other normal humans. That is to say, the processes by which the brain perceives, learns, reasons, remembers, and the like are the same for everyone, as are the essential structures and functions of every organ system in the entire body. Individual differences and population differences in normal brain processes exist at a different level, superimposed, as it were, over and above the brain's common structures and operating principles.

The default hypothesis states that human individual differences and population differences in heritable behavioral capacities, as products of the evolutionary process in the distant past, are essentially composed of the same stuff, so to speak, controlled by differences in allele frequencies, and that differences in allele frequencies between populations exist for all heritable characteristics, physical or behavioral, in which we find individual differences within populations.

With respect to the brain and its heritable behavioral correlates, the default hypothesis holds that individual differences and population differences do not result from differences in the brain's basic structural operating mechanisms per se, but result entirely from other aspects of cerebral physiology that modify the sensitivity, efficiency, and effectiveness of the basic information processes that mediate the individual's responses to certain aspects of the environment. A crude analogy would be differences in the operating efficiency (e.g., miles per gallon, horsepower, maximum speed) of different makes of automobiles, all powered by internal combustion engines (hence the same operating mechanisms) but differing in, say, the number of cylinders, their cubic capacity, and the octane rating of the gasoline they are using. Electric motor cars and steam-engine cars (analogous to different species or genera) would have such distinctively different operating mechanisms that their differences in performance would call for quite different explanations.

In brief, the default hypothesis states that the proximal causes of both individual differences and population differences in heritable psychological traits are essentially the same, and are continuous variables. The population differences reflect differences in allele frequencies of the same genes that cause individual differences. Population differences also reflect environmental effects, as do individual differences, and these may differ in frequency between populations, as do allele frequencies.

In research on population differences in mean levels of g, I think that the default hypothesis should be viewed as the true "null" hypothesis, that is, the initial hypothesis that must be disproved. The conventional null hypothesis of inferential statistics (i.e., no differences between populations) is so improbable in light of evolutionary knowledge as to be scientifically inappropriate for the study of population differences in any traits that show individual differences. The real question is not whether population differences exist for a given polygenic trait, but rather the direction and magnitude of the difference.

The question of direction of a difference brings up another aspect of the default hypothesis, namely, that it is rare in nature for genotypes and phenotypes of adaptive traits to be negatively correlated. It is exceedingly improbable that racial populations, which are known to differ, on average, in a host of genetically conditioned physical characteristics, would not differ in any of the brain characteristics associated with cognitive abilities, when half of all segregating genes in the human genome are involved with the brain. It is equally improbable that heritable variation among individuals in polygenic adaptive traits, such as g, would not show nontrivial differences between populations, which are aggregations of individuals. Again, from a scientific standpoint, the only real questions about population differences concern their direction, their magnitude, and their causal mechanism(s). One may also be interested in the social significance of the phenotypic differences. Research will be most productively focused not on whether or not genes are involved in population differences, but in discovering the relative effects of genetic and environmental causes of differences and the nature of these causes, so they can be better understood and perhaps influenced.

The rest of this chapter deals only with the scientific aspect of the default hypothesis. (For a discussion of its social significance, see Chapter 14.) Since far more empirical research relevant to the examination of the default hypothesis with respect to g has been done on the black-white difference, particularly within the United States, than on any other populations, I will focus exclusively on the causal basis of the mean black-white difference in the level of g.

HERITABILITY OF IQ WITHIN GROUPS AND BETWEEN GROUPS

One of the aims of science is to comprehend as wide a range of phenomena as possible within a single framework, using the fewest possible mechanisms with the fewest assumptions and ad hoc hypotheses. With respect to IQ, the default hypothesis relating individual differences and population differences is consistent with this aim, as it encompasses the explanation of both within-group (WG) and between-group (BG) differences as having the same causal sources of variance. The default hypothesis that the BG and WG differences are homogeneous in their causal factors implies that a phenotypic difference of PD between two population groups in mean level of IQ results from the same causal effects as does any difference between individuals (within either of the two populations) whose IQs differ by PD (i.e., the phenotypic difference). In either case, PD is the joint result of both genetic (G) and environmental (E) effects. In terms of the default hypothesis, the effects of genotype X environment covariance are the same between populations as within populations. The same is hypothesized for genotype x environment interaction, although studies have found it contributes negligibly to within-population variance in g.

It is possible for a particular allele to be present in one population but absent in another, or for alleles at certain loci to be turned on in some environments and turned off in others, or to be regulated differently in different environments. These conditions would constitute exceptions to the default hypothesis. But without empirical evidence of these conditions with respect to population differences in g, which is a highly polygenic trait in which most of the variance within (and probably between) populations is attributable to quantitative differences in allele frequencies at many loci, initial investigation is best directed at testing the default hypothesis.

In terms of the black-white IQ difference, the default hypothesis means that the question of why (on average) two whites differ by amount PD in IQ, or two blacks differ by amount PD or a black and a white differ by amount PD can all be answered in the same terms. There is no need to invoke any special "racial" factor, either genetic or cultural.

The countervailing dual hypothesis contends that: (1) within-group individual differences (WG), on the one hand, and between-group mean differences (BG), on the other, have different, independent causes; and (2) there is no relationship between the sources of WG differences and of BG differences. In this view, the high heritability of individual differences in g within groups tells us nothing about the heritability (if any) of g between groups.

The empirical fact that there is a large genetic component in WG individual differences in g is so well established by now (see Chapter 7) that, with rare exceptions, it is no longer challenged by advocates for the dual hypothesis. The defining tenet of the dual hypothesis, at least as it applies to the phenotypic black-white IQ difference, is that there is no genetic component in the mean BG difference; that is, the causes of the observed BG difference in IQ are entirely environmental. These environmental sources may include nutrition and other biological conditions, as well as socioeconomic, attitudinal, or cultural group differences, to name the most frequently hypothesized causal factors. (Psychometric test bias, as such, has been largely ruled out; see Chapter 11, pp. 360-67.)

Within-Group Heritability of IQ in Black and in White Groups. Before contrasting the dual and the default hypotheses in terms of their formal implications and their consistency with empirical findings, we need to understand what is, and is not, known about the heritability of individual differences in IQ within each population.

The many studies of IQ heritability based on white samples are summarized in Chapter 7. They give estimates that range mostly between .40 and .60 for children and adolescents, and between .60 and .80 for adults.

The few studies of IQ heritability in black samples have all been performed in conjunction with age-matched white samples, so that group comparisons would be based on the same tests administered under the same conditions. Only two such studies based on large samples (total Ns of about 300 and 700) of black and white twins of school age have been reported. The data of these studies do not support rejection of the null hypothesis of no black-white difference in the heritability coefficients for IQ. Nor do these studies show any evidence of a statistically significant racial difference between the magnitudes of the correlations for either MZ or DZ twins. But the sample sizes in these studies, though large, are not large enough to yield statistical significance for real, though small, group differences. The small differences between the black and white twin correlations observed in these studies are, however, consistent with the black-white differences in the correlations, between full siblings found in a study of all of the school-age sibling pairs in the total black and white populations of the seventeen elementary schools of Berkeley, California. The average sibling correlations for IQ in that study were +.38 for blacks and +.40 for whites. (For height, the respective age-corrected correlations were .45 and .42.) Because the samples totaled more than 1,500 sibling pairs, even differences as small as .02 are statistically significant. If the heritability of IQ, calculated from twin data, were very different in the black and white populations, we would expect the difference to show up in the sibling correlations as well. The fact that sibling correlations based on such large samples differ so little between blacks and whites suggests that the black-white difference in IQ heritability is so small that rejection of the null hypothesis of no W-B difference in IQ heritability would require enormous samples of black and white MZ and DZ twins- far more than any study has yet attempted or is ever likely to attempt. Such a small difference, even if it were statistically reliable, would be of no theoretical or practical importance. On the basis of the existing evidence, therefore, it is reasonable to conclude that the difference between the U.S. black and white populations in the proportion of within-group variance in IQ attributable to genetic factors (that is, the heritability of IQ) is probably too small to be detectable.

The Relationship of Between-Group to Within-Group Heritability.

The mantra invoked to ward off any unpalatable implications of the fact that IQ has substantially equal heritability in both the black and the white populations is that "heritability within groups does not imply (or prove, or generalize to) heritability between groups." Arguing that the fact that there is genetic variance in individual differences within groups gives no warrant to generalize to differences between groups is, of course, formally equivalent to saying exactly the same thing about environmental variance, which is the complement of the within-groups heritability (i.e., 1-h2). But a little analysis is required to understand the peculiar nature of the relationship between within-group heritability (WGH) and between-group heritability (BGH).

To say there is no relationship of any kind between WGH and BGH is wrong. They are mathematically related according to the following equation:

BGH = WGH*(rg(1-rp)/rp(1-rg)) where BGH is the between-group heritability and WGH is the within-group heritability. rg is the genetic intraclass correlation within groups, i.e., rg = (genetic variance between groups)/(genetic variance between groups + genetic variance within groups). rp, is the phenotypic intraclass correlation within groups; it is equal to the squared point-biserial correlation between individuals' nominal group membership (e.g., black or white, quantitized as 0 or 1) and the quantitative variable of interest (e.g., IQ).

This is termed the formal relationship between WGH and BGH. Although there is no argument about the mathematical correctness of this formulation, it is not empirically applicable, because a single equation containing two unknowns (i.e., BGH and rg), cannot be solved. (It is also clear mathematically that the formula must assume that WGH is greater than zero and that rg is less than unity.) The value of rp can easily be obtained empirically. (For example, if two groups each have the same standard deviation on a given variable and the group means differ by one such standard deviation, the value of rp = .20). If we knew the value of rg we could solve the equation for BGH (or vice versa). (If the between-groups difference were entirely nongenetic, as strict environmentalists maintain, then of course rg would be zero.) But we know neither rg nor BGH, so the formula is empirically useless.

However, this formula does indicate that for an hypothesized value of rg greater than zero, BGH is a linearly increasing function of WGH. As I will point out, the hypothesized relationship between WGH and BGH can suggest some useful conjectures and empirical analyses. The formal relationship between WGH and BGH makes no assumptions about the sources of either the genetic or the environmental variance in BGH and WGH, or whether BGH and WGH are qualitatively the same or different in this respect. The default hypothesis, however, posits that the genetic and the environmental factors that cause the between-groups difference exist within each group (but not necessarily in equal degrees). The opposing dual hypothesis is that the environmental factors that cause variance between groups are different not just in degree, but in kind, from the environmental factors that cause individual differences within a group. This conjecture raises problems that I will examine shortly.

The between-groups (BG) versus within-groups (WG) problem can be visualized as shown in Figure 12.5. Assume a population is composed of two equal-sized subpopulations, A and B, and assume that on some characteristic (e.g., IQ) the phenotypic means of these two subpopulations differ, that is, A-B = PD. (Sampling error and measurement error are assumed to be zero in this didactic diagram.) The measurement of the phenotypic characteristic (P) is standardized in the total population, so its population standard deviation is 1s and the total variance is the square of the standard deviation, 1s2. Any variance can he visualized as the area of a square. The square in Figure 12.5 represents the total phenotypic variance (1s2) of the whole population, and its square root is the standard deviation (1s) of the phenotypic measurements. The total variance (area of the square) is partitioned horizontally into the variance between groups (BG) and the variance within groups (WG). The total variance is partitioned vertically into the genetic (G) variance, i.e., heritability (h2) and the environmental (E) variance, i.e., environmentality (e2). At present, the only variables we are able to determine empirically are the total phenotypic variance, h2WG , and the within-group genetic and environmental variances, h2WG, and e2WG. The between-group variables, h2BG and e2BG, are undetermined (and so are shown in parentheses). As the genetic and environmental proportions of the BG variance have not been empirically determined, they are shown separated by a dotted line in Figure 12.5. This dotted line could move either to the left or to the right, based on new empirical evidence. Its approximate position is the bone of contention between the advocates of the default hypothesis and those of the conventional null hypothesis.

Extreme "environmentalists" argue that both h2WG=0 and h2BG=0, leaving environmental agents as the source of all observed phenotypic variance. (Hardly anyone now holds this position with respect to IQ.) A much more common position nowadays is to accept the empirically established WG values, but maintain that the BG variance is all environmental. "Agnostics" would say (correctly) that h2BG is not empirically known, and some might add that, though unknown, it is plausibly greater than zero.

The strong form of the default hypothesis is represented in Figure 12.5 by the dotted-line extension of the solid vertical line, thus partitioning both the WG and BG variances into the same proportions of genetic and environmental variance. A "relaxed" form of the default hypothesis still posits h2BG > 0, but allows h2BG to differ from h2WG. In general, this is closer to reality than is the strong form of the default hypothesis. In both forms of the default hypothesis

WG variance and BG variance are attributable to the same causal factors, although they may differ in degree. The purpose of hypothesizing some fairly precise value for h2BG is not because one necessarily thinks it is true, or wants to "sell" it to someone, but rather because scientific knowledge advances by the process that Karl Popper described as "conjectures and refutations"-a strong hypothesis (or conjecture) can permit certain possibly testable deductions or inferences, and can be decisively refuted only if formulated precisely and preferably quantitatively. Any hypothesis is merely the temporary scaffolding that assists in discovering new facts about nature. It helps us to formulate questions precisely and further focuses investigative efforts on research that will yield diacritical results. Beyond this purpose, a hypothesis has no other use. It is not a subject for advocacy.

A clear quantitative statement of the default hypothesis depends upon understanding some important technical points about variance and its relation to linear measurement. The large square in Figure 12.6 represents the total variance (0.2) of a standardized phenotypic variable (P), with a standard deviation sp = 1. The area of the large square (total phenotypic variance) is partitioned into its genetic and environmental components, corresponding to a heritability of .75 (which makes it easy to visualize). The genetic variance sG2 in Figure 12.6 (unshaded area) is equal to .75, leaving the environmental component sE2 (shaded area) equal to .25. Since the variance of each effect is shown in the diagram as an area, the square root of the area represents the standard deviation of that effect. The linear distances or differences between points on a scaled variable are shown as line segments scaled in standard deviation units, not in variance units. Thus the line segments that form the area in the lower right of the shaded square in Figure 12.6 are each equal to 0.25^0.5 or .5 (in standard deviation units). The linear distances represented by the environmental variance is 0.5; and the linear distance represented by the genetic variance is 0.866. Notice that these two linear measurements do not add up to the length of the side of the total square, which is 1. That is, standard deviation units are not additive. Before the sum of the standard deviations of two or more component elements can represent the standard deviation of the total of the component elements, you must first take the square root of the sum of the squared standard deviations.

[From this point on -- I have eliminated many hard to edit formulas. See the original text for the complete derivations of expressions]

We can now ask, "How many units of environmental variance are needed to add up to the total phenotypic variance? The answer is 4. This ratio is in variance units. To express it in linear terms, it has to be converted into standard deviation units, that is, 2.

Suppose we obtain IQ scores for all members of two equal-size groups called A and B. Further assume that within each group the IQs have a normal distribution, and the mean of group A is greater than the mean of group B. To keep the math simple, let the IQ scores have perfect reliability, let the standard deviation of the scores be the same in both groups, and let the mean phenotypic difference be equal to the average within-group phenotypic standard deviation.

Now consider the hypothesis that the between-group heritability (BGH) is zero and that therefore the cause of the A-B difference is purely environmental. Assume that the within-group heritability (WGH) is the same in each group, say, WGHA = WGHB = .75. Now, if we remove the variance attributable to genetic factors (WGH) from the total variance of each group's scores, the remainder gives us the proportion of within-group variance attributable to purely environmental factors. If both the genetic and environmental effects on test scores are normally distributed within each group, the resulting curves after the genetic variance has been removed from each represent the distribution of environmental effects on test scores. Note that this does not refer to variation in the environment per se, but rather to the effects of environmental variation on the phenotypes (i.e., IQ scores, in this case.) The standard deviation of this distribution of environmental effects provides a unit of measurement for environmental effects.

The distribution of just the total environmental effects (assuming WGH = .75) is shown in the two curves in the bottom half of Figure 12.7. The phenotypic difference between the group means is kept constant, but on the scale of environmental effects (measured in environmental standard deviation units), the mean environmental effects for groups A and B differ by the ratio 2sE, as shown in the lower half of Figure 12.7. What this means is that for two groups to differ phenotypically by 1sP When WGH = .75 and BGH = 0, the two groups would have to differ by 2sE on the scale of environmental effects. This is analogous to two groups in which each member of one group has a monozygotic twin in the other group, thus making the distribution of genotypes exactly the same in both groups. For the test score distributions of these two genotypically matched groups to differ by 1sP, the groups would have to differ by 2sE on the scale of environmental effects (assuming WGH = .75).

The hypothetical decomposition of a mean phenotypic difference between two groups as expressed in terms of the simplest model is that the phenotypic difference between the groups is completely determined by their genetic difference and their environmental difference. These variables are related quantitatively by the simple path model shown in Figure 12.8. The arrows represent the direction of causation; each arrow is labeled with the respective regression coefficients (also called path coefficients), h and e, between the variables, which, when, are mathematically equivalent to the respective correlation coefficients, and to the standard deviations of the genetic and environmental effects. In reality, of course, there could be a causal path, but this would not alter the essential point of the present argument. We see that the phenotypic difference can be represented as a weighted sum of the genetic and the environmental effects on PD, the weights being h and e. Since these values are equivalent to standard deviations, they cannot be summed.

A phenotypic difference between the means of two groups can be expressed in units of the standard deviation of the average within-groups environmental effect, where BGH is the between-groups heritability and WGH is the within-groups heritability. Thus the phenotypic difference between the means of the two curves in the lower half of Figure is 2sE. That is, the means of the two environmental-effect curves differ by two standard deviations. The body of empirical evidence shows that an environmental effect on IQ this large would predictably occur only rarely in pairs of monozygotic twins reared apart (whose IQs are correlated .75) except for random errors of measurement. The difference in IQ attributable solely to nongenetic differences between random pairs of individuals in a population in which h2 is .75 is about the same as for MZ twins reared apart. On an IQ scale with s = 15, a difference of 2sE is approximately equal to 30 IQ points (i.e., 2 X 15). But the largest IQ difference between MZ twins reared apart reported in the literature is 1 .5s, or 23 IQ points. Further, the average absolute difference in IQ (assuming a perfectly normal distribution of IQ) between all random pairs of persons in the population (who differ both in g and in E) would be 1.1284s, or approximately 17 IQ points.

Now consider again the two groups in the upper half of Figure 12.7, called A and B. They differ in their mean test scores, with a phenotypic difference A-B =1sP, and have a within-group environmental effect difference of 2sE. If we hypothesize that the difference between the phenotypic means is entirely nongenetic (i.e., environmental), then the phenotypic difference of 1sP must be equal to 2sE.

By the same reasoning, we can determine the size of the environmental effect that is required to produce a phenotypic difference of lsP, given any values of the within-groups heritability (WGH) and the between-groups heritability (BGH). For a phenotypic difference of lsP. The strong default hypothesis is defined in terms of BGH = WGH; the relaxed default hypothesis allows independent values of BGH and WGH.

For example, in the first column inside Table 12.4(A), the BGH = .00. This represents the hypothesis that the cause of the mean group difference in test scores is purely environmental. When WGH is also equal to .00, the environmental difference of lsE between the groups accounts for all of the phenotypic difference of lsP, and thus accords perfectly with the environmental hypothesis that lsP= lsE. Table 12.4(A) shows that when WGH = BGH = .00, the value of sE = 1.00.

Maintaining the same purely environmental hypothesis that the BGH = 0, but with the WGH = .10, for two groups to differ phenotypically by lsP they must differ by l.05sE in environmental effect, which deviates .05 from the hypothesized value of lsE. The critical point of this analysis is that if the BGH= 0, values of WGH greater than 0 then require that sE be greater than 1.00. We can see in Table 12.4(A) that as the WGH increases, the required value of lsE must increasingly deviate from the hypothesized value of lsE, thereby becoming increasingly more problematic for empirical explanation. Since the empirical value of WGH for the IQ of adults lies within the range of .60 to .80, with a mean close to .70, it is particularly instructive to examine the values of lsE, for this range in WGH. When WGH = .70 and BGH = 0, for example, the lsP, difference between the groups is entirely due to environmental causes and amounts to l,83sE. Table 12.4(A) indicates that as we hypothesize levels of BGH that approach the empirically established levels of WGH, the smaller is the size of the environmental effect required to account for the phenotypic difference of lsP in group means.

Factor X. Recall that the strong form of the default hypothesis states that the average difference in test scores observed between groups A and B results from the same kinds of genetic (G) and environmental (E) influences acting to the same degree to produce individual differences within each group. The groups may differ, however, in the mean values of either G, or E, or both. Stated in terms of the demonstration in Table 12.4(A), this means that if WGH is the same for both groups, A and B, then, given any empirically obtained value of WGH, the limits of BGH are constrained, as shown. The hypothesis that BGH = 0 therefore appears improbable, given the typical range of empirical values of WGH.

To accept the preponderance of evidence that WGH > 0 and still insist that BGH = 0 regardless of the magnitude of the WGH, we must attribute the cause of the group difference to either of two sources: (1) the same kinds of environmental factors that influence the level of g but that do so at much greater magnitude between groups than within either group, or (2) empirically identified environmental factors that create variance between groups but do not do so within groups. The "relaxed" default hypothesis allows both of these possibilities. The dual hypothesis, on the other hand, requires either much larger environmental effects between groups than are empirically found, on average, within either group, or the existence of some additional empirically unidentified source of nongenetic variance that causes the difference between groups but does not contribute to individual differences within either group. If the two groups are hypothesized not to differ in WGH or in total phenotypic variance, this hypothesized additional source of nongenetic variance between groups must either have equal but opposite effects within each group, or it must exist only within one group but without producing any additional variance within that group. In 1973, I dubbed this hypothesized additional nongenetic effect Factor X. When groups of blacks and whites who are matched on virtually all of the environmental variables known to be correlated with IQ within either racial population still show a substantial mean difference in IQ, Factor X is the favored explanation in lieu of the hypothesis that genetic factors, though constituting the largest source of variance within groups, are at all involved in the IQ difference between groups. Thus Factor X is an ad hoc hypothesis that violates Occam's razor, the well-known maxim in science which states that if a phenomenon can he explained without assuming some hypothetical entity, there is no ground for assuming it.

The default hypothesis also constrains the magnitude of the genetic difference between groups, as shown in Table 12.4(B). (The explanations that were given for interpreting Table 12.4(A) apply here as well.) For two groups, A and B, whose phenotypic means differ by A-B = lsP, the strong default hypothesis (i.e., BGH = WGH) means that the groups differ on the scale of genetic effect by BGH/WGH = lsG.

The values of lsG in Table 12.4(B) show that the strong default hypothesis is not the same as a purely genetic hypothesis of the group difference. For example, for WGH = .70 and BGH = .70, the groups differ by lsG (Table 12.4B), and also the groups differ by lsE (Table 12.4A). For the relaxed default hypothesis, the environmental and genetic differences associated with each and every intersection of WGH and BGH in Tables 12.4A and 12.4B add up to lsP.

The foregoing analysis is relevant to the often repeated "thought experiment" proposed by those who argue for the plausibility of the dual hypothesis, as in the following example from an article by Carol Tavris: "Suppose that you have a bag of tomato seeds that vary genetically; all things being equal, some seeds will produce tomatoes that are puny and tasteless, and some will produce tomatoes that are plump and delicious. You take a random bunch of seeds in your left hand and random bunch in your right. Though one seed differs genetically from another, there is no average difference between the seeds in your left hand and those in your right. Now you plant the left hand's seeds in Pot A. You have doctored the soil in Pot A with nitrogen and other nutrients. You feed the pot every day, sing arias to it from La Traviata, and make sure it gets lots of sun. You protect it from pests, and you put in a trellis, so even the weakest little tomatoes have some support. Then you plant the seeds in your right hand in Pot B, which contains sandy soil lacking nutrients. You don't feed these tomatoes, or water them; you don't give them enough sun; you let pests munch on them. When the tomatoes mature, they will vary in size within each pot, purely because of genetic differences. But there will also be an average difference between the tomatoes of enriched Pot A and those of depleted Pot B. This difference between pots is due entirely to their different soils and tomato-rearing experiences."

Statistically stated, the argument is that (1) WGH = 1, BGH = 0. What is the expected magnitude of the required environmental effect implied by these conditions? In terms of within-group standard deviation units, it is sE =1/0. But of course the quotient of any fraction with zero in the denominator is undefined, so no inference about the magnitude is possible at all, given these conditions. However, if we make the WGH slightly less than perfect, say, .99, the expected difference in environmental effect becomes l0sE. This is an incredibly large, but in this case probably not unrealistic, effect given Tavris's descriptions of the contrasting environments of Pot A and Pot B.

The story of tomatoes-in-two-pots doesn't contradict the default hypothesis. Rather, it makes the very point of the default hypothesis by stating that Pots A and B each contain random samples of the same batch of seeds, so an equally massive result would have been observed if the left-hand and right-hand seeds had been planted in opposite pots. Factor X is not needed to explain the enriched and deprived tomatoes; the immense difference in the environmental conditions is quite sufficient to produce a difference in tomato size ten times greater than the average differences produced by environmental variation within each pot.

Extending the tomato analogy to humans, Tavris goes on to argue, "Blacks and whites do not grow up, on the average, in the same kind of pot". The question, then, is whether the average environmental difference between blacks and whites is sufficient to cause a lsP difference in IQ if BGH = 0 and WGH is far from zero. The default hypothesis, positing values of BGH near those of the empirical values of WGH, is more plausible than the hypothesis that BGH = 0. (A third hypothesis, which can be ruled out of serious consideration on evolutionary grounds, given the observed genetic similarity between all human groups, is that the basic organization of the brain and the processes involved in mental development are qualitatively so different for blacks and whites that any phenotypic difference between the groups cannot, even in principle, be analyzed in terms of quantitative variation on the same scale of the genetic or of the environmental factors that influence individual development of mental ability within one racial group.)

The Default Hypothesis in Terms of Multiple Regression. The behavioral geneticist Eric Turkheimer has proposed an approach for relating the quantitative genetic analysis of individual and of group differences. Phenotypic variance can be conceptually partitioned into its genetic and its environmental components in terms of a multiple regression equation. Turkheimer's method allows us to visualize the relationship of within-group and between-group genetic effects and environmental effects in terms of a regression plane located in a three-dimensional space in which the orthogonal dimensions are phenotype (P), genotype (G), and environment (E). Both individual and group mean phenotypic values (e.g., IQ) can then be represented on the surface of this plane. This amounts to a graphic statement of the strong default hypothesis, where the phenotypic difference between two individuals (or two group means), A and B, can be represented by the multiple regression of the phenotypic difference on the genetic and environmental differences (GD and ED).

According to the default hypothesis, mental development is affected by the genetic mechanisms of inheritance and by environmental factors in the same way for all biologically normal individuals in either group. (Rejection of this hypothesis would mean that evolution has caused some fundamental intraspecies differences in brain organization and mental development, a possibility which, though seemingly unlikely, has not yet been ruled out.) Thus the default hypothesis implies that a unit increase in genetic value 0 for individuals in group A is equal to the same unit increase in G for individuals in group B, and likewise for the environmental value E. Within these constraints posited by the default hypothesis, however, the groups may differ, on average, in the mean values of G, or E, or both. Accordingly, individuals of either group will fall at various points (depending on their own genotype and environment) on the same regression lines (i.e., for the regression of P on G and of P on E). This can be visualized graphically as a regression plane inside a square box (Figure 12.9). The G and E values for individuals (or for group means) A and B are projected onto the tilted plane; the projections are shown as a dot and a square. Their positions on the plane are then projected onto the phenotype dimension of the box.

The important point here is that the default hypothesis states that, for any value of WGH, the predicted scores of all individuals (and consequently the predicted group means) will lie on one and the same regression plane. Assuming the default hypothesis, this clearly shows the relationship between the heritability of individual differences within groups (WGH) and the heritability of group differences (BGH). This formulation makes the default hypothesis quantitatively explicit and therefore highly liable to empirical refutation. If there were some environmental factor(s) that is unique to one group and that contributes appreciably to the mean difference between the two groups, their means would not lie on the same plane. This would result, for example, if there were a between-groups G X E interaction. The existence of such an interaction would be inconsistent with the default hypothesis, because it would mean that the groups differ phenotypically due to some nonadditive effects of genes and environment so that, say, two individuals, one from each group, even if they had identical levels of IQ, would have had to attain that level by different developmental processes and environmental influences. The fact that significant G X E interactions with respect to IQ (or g) have not been found within racial groups renders such an interaction between groups an unlikely hypothesis.

It should be noted that the total nongenetic variance has been represented here as e2. As explained in Chapter 7, the true-score nongenetic variance can be partitioned into two components: between-families environment (BFE is also termed shared environment because it is common to siblings or to any children reared together) and within-family environment (WFE, or unshared environment, that part of the total environmental effect that differs between persons reared together).

The WFE results largely from an accumulation of more or less random microenvironmental factors. We know from studies of adult MZ twins reared apart and studies of genetically unrelated adults who were reared together from infancy in adoptive homes that the BFE has little effect on the phenotype of mental ability, such as IQ scores, even over a quite wide range of environments (see Chapter 7 for details). The BF environment certainly has large effects on mental development for the lowest extreme of the physical and social environment, conditions such as chronic malnutrition, diseases that affect brain development, and prolonged social isolation, particularly in infancy and early childhood. These conditions occur only rarely in First World populations. But some would argue that American inner cities are Third World environments, and they certainly resemble them in some ways. On a scale of environmental quality with respect to mental development, these adverse environmental conditions probably fall more than 2s below the average environment experienced by the majority of whites and very many blacks in America. The hypothetical function relating phenotypic mental ability (e.g., IQ) on the total range of BFE effects (termed the reaction range or reaction norm for the total environmental effect) is shown in Figure 12.10.

Pseudo-race Groups and the Default Hypothesis. In my studies of test bias, I used what I termed pseudo-race groups to test the hypothesis that many features of test performance are simply a result of group differences in the mean and distribution of IQ per se rather than a result of any cultural differences between groups. Pseudo-race groups are made up entirely of white subjects. The standard group is composed of individuals selected on the basis of estimated true-scores so as to be normally distributed, with a mean and standard deviation of the IQ distribution of whites in the general population. The pseudo-race group is composed of white individuals from the same population as the standard group, but selected on the basis of their estimated true-scores so as to be normally distributed, but with a mean and standard deviation of the IQ distribution of blacks in the general population. The two groups, with age controlled, are intentionally matched with the white and black populations they are intended to represent only on the single variable of interest, in this case IQ (or preferably g factor scores). Therefore, the groups should not differ systematically on any other characteristics, except for whatever characteristics may be correlated with IQ. Estimated true-scores must be used to minimize the regression (i.e., toward the white mean of 100) effect that would otherwise result from selecting white subjects on IQ so as to form a group with a lower mean IQ than that of the population from which they were selected.

The creation of two groups that, in this manner, are made to differ on a single trait can be viewed as another model of the strong default hypothesis. This method is especially useful in empirically examining various nonpsychometric correlates of the standard group versus pseudo-race group difference. These differences can then be compared against any such differences found between representative samples of the actual white and black populations. The critical question is, in the circumstances of daily life how closely does the behavior of the pseudo-race group resemble that of a comparable sample of actual blacks? The extent of the pseudo-race versus actual race difference in nonpsychometric or "real-life" behavior would delimit the g factor's power to account for the observed racial differences in many educationally, occupationally, and socially significant variables.

Notice that the standard and pseudo-race groups would perfectly simulate the conditions of the strong default hypothesis. Both genetic and environmental sources of variance exist in nearly equal degrees within each group, and the mean difference between the groups necessarily comprises comparable genetic and environmental sources of variance. If this particular set of genetic and environmental sources of IQ variance within and between the standard and pseudo-race groups simulates actual white-black differences in many forms of behavior that have some cognitive aspect but are typically attributed solely to cultural differences, it constitutes strong support for the default hypothesis. Experiments of this type could tell us a lot and should be performed.

EMPIRICAL EVIDENCE ON THE DEFAULT HYPOTHESIS

Thus far the quantitative implications of the default hypothesis have been considered only in theoretical or formal terms, which by themselves prove nothing, but are intended only to lend some precision to the statement of the hypothesis and its predicted empirical implications. It should be clear that the hypothesis cannot feasibly be tested directly in terms of applying first-order statistical analyses (e.g., the t test or analysis of variance applied to phenotypic measures) to determine the BGH of a trait, as is possible in the field of experimental genetics with plants or animals. In the latter field, true breeding experiments with cross-fostering in controlled environments across different subspecies and subsequent measurement of the phenotypic characteristics of the progeny of the cross-bred strains for comparison with the same phenotypes in the parent strains are possible and, in fact, common. In theory, such experiments could be performed with different human subspecies, or racial groups, and the results (after replications of the experiment to statistically reduce uncertainty) would constitute a nearly definitive test of the default hypothesis. An even more rigorous test of the hypothesis than is provided by a controlled breeding and cross-fostering experiment would involve in vitro fertilization to control for possible differences in the prenatal environment of the cross-fostered progeny. Such methods have been used in livestock breeding for years without any question as to the validity of the results. But, of course, for ethical reasons the methods of experimental genetics cannot be used for research in human genetics. Therefore, indirect methods, which are analytically and statistically more complex, have been developed by researchers in human genetics.

The seemingly intractable problem with regard to phenotypic group differences has been the empirical estimation of the BGH. To estimate the genetic variance within groups one needs to know the genetic kinship correlations based on the theoretically derived proportions of alleles common to relatives of different degrees (e.g., MZ twins = 1.00, DZ twins and full siblings, parent-child = 0.50 [or more with assortative mating, half-siblings = 0.25, first cousins = .125, etc.). These unobserved but theoretically known genetic kinship correlations are needed as parameters in the structural equations used to estimate the proportion of genetic variance (heritability) from the phenotypic correlations between relatives of different degrees of kinship. But we generally do not have phenotypical correlations between relatives that bridge different racial groups. Since few members of one racial group have a near relative (by common descent) in a different racial group, we don't have the parameters needed to estimate between-group heritability. Although interracial matings can produce half-siblings and cousins who are members of different racial groups, the offspring of interracial matings are far from ideal for estimating BGH because, at least for blacks and whites, the parents of the interracial offspring are known to be unrepresentative of these populations. Thus such a study would have doubtful generality.

An example of cross-racial kinships that could be used would be a female of group A who had two offspring by a male of group A and later had two offspring by a male of group B, resulting finally in two pairs of full-siblings who are both AA and two pairs of half-siblings who are both AB. A biometric genetic analysis of phenotypic measurements obtained on large samples of such full-siblings and half-siblings would theoretically afford a way of estimating both WGH and BGH. Again, however, unless such groups arose from a controlled breeding experiment, the resulting estimate of BGH would probably not be generalizable to the population groups of interest but would apply only to the specific groups used for this determination of BGH (and other groups obtained in the same way). There are two reasons: First, the degree of assortative mating for IQ is most likely the same, on average, for interracial and intraracial matings; that is, the A and B mates of the hypothetical female in our example would probably be phenotypically close in IQ, so at least one of them would be phenotypically (hence also probably genetically) unrepresentative of his own racial population. Therefore, the mixed offspring AB are not likely to differ genetically much, if at all, on average, from the unmixed offspring AA. Second, aside from assortative mating, it is unlikely that interracial half-siblings are derived from parents who are random or representative samples of their respective racial populations. It is known, for example, that present-day blacks and whites in interracial marriages in the United States are not typical of their respective populations in IQ related variables, such as levels of education and occupation.

How then can the default hypothesis be tested empirically? It is tested exactly as is any other scientific hypothesis; no hypothesis is regarded as scientific unless predictions derived from it are capable of risking refutation by an empirical test. Certain predictions can be made from the default hypothesis that are capable of empirical test. If the observed result differs significantly from the prediction, the hypothesis is considered disproved, unless it can be shown that the tested prediction was an incorrect deduction from the hypothesis, or that there are artifacts in the data or methodological flaws in their analysis that could account for the observed result. If the observed result does in fact accord with the prediction, the hypothesis survives, although it cannot be said to be proven. This is because it is logically impossible to prove the null hypothesis, which states that there is no difference between the predicted and the observed result. If there is an alternative hypothesis, it can also be tested against the same observed result.

For example, if we hypothesize that no tiger is living in the Sherwood Forest and a hundred people searching the forest fail to find a tiger, we have not proved the null hypothesis, because the searchers might have failed to look in the right places. If someone actually found a tiger in the forest, however, the hypothesis is absolutely disproved. The alternative hypothesis is that a tiger does live in the forest; finding a tiger clearly proves the hypothesis. The failure of searchers to find the tiger decreases the probability of its existence, and the more searching, the lower is the probability, but it can never prove the tiger's nonexistence.

Similarly, the default hypothesis predicts certain outcomes under specified conditions. If the observed outcome does not differ significantly from the predicted outcomes, the default hypothesis is upheld but not proved. If the prediction differs significantly from the observed result, the hypothesis must be rejected. Typically, it is modified to accord better with the existing evidence, and then its modified predictions are empirically tested with new data. If it survives numerous tests, it conventionally becomes a "fact." In this sense, for example, it is a "fact" that the earth revolves around the sun, and it is a "fact" that all present-day organisms have evolved from primitive forms.

Structural Equation Modeling. Probably the most rigorous methodology presently available to test the default hypothesis is the application of structural equation modeling to what is termed the biometric decomposition of a phenotypic mean difference into its genetic and environmental components. This methodology is an extraordinarily complex set of mathematical and statistical procedures, an adequate explanation of which is beyond the scope of this book, but for which detailed explanations are available. It is essentially a multiple regression technique that can be used to statistically test the differences in "goodness-of-fit" between alternative models, such as whether (1) a phenotypic mean difference between groups consists of a linear combination of the same genetic (G) and environmental (E) factors that contribute to individual differences within the groups, or (2) the group difference is attributable to some additional factor (an unknown Factor X) that contributes to variance between groups but not to variance within groups.

Biometric decomposition by this method requires quite modern and specialized computer programs (LISREL VII) and exacting conditions of the data to which it is applied -- above all, large and representative samples of the groups whose phenotypic means are to be decomposed into their genetic and environmental components. All subjects in each group must be measured with at least three or more different tests that are highly loaded on a common factor, such as g, and this factor must have high congruence between the two groups. Also, of course, each group must comprise at least two different degrees of kinship (e.g., MZ and DZ twins, or full-siblings and half-siblings) to permit reliable estimates of WGH for each of the tests. Further, in order to meet the assumption that WGH is the same in both groups, the estimates of WGH obtained for each of the tests should not differ significantly between the groups.

Given these stringent conditions, one can test whether the mean group difference in the general factor common to the various tests is consistent with the default model, which posits that the between-groups mean difference comprises the same genetic and environmental factors as do individual differences within each group. The goodness-of-fit of the data to the default model (i.e., group phenotypic difference = G + E) is then compared against the three alternative models, which posit only genetic (G) factors, or only environment (E), or neither G nor E, respectively, as the cause of the group difference. The method has been applied to estimate the genetic and environmental contributions to the observed sex difference in average blood pressure.

This methodology was applied to a data set that included scores on thirteen mental tests (average g loading = .67) given to samples of black and white adolescent MZ and DZ twins totaling 190 pairs. Age and a measure of socioeconomic status were regressed out of the test scores. The data showed by far the best fit to the default model, which therefore could not be rejected, while the fit of the data to the alternative models, by comparison with the default model, could be rejected at high levels of confidence (p < .005 to p < .001). That is, the observed W-B group difference is probably best explained in terms of both G and E factors, while either 0 or E alone is inadequate, given the assumption that G and E are the same within both groups. This result, however, does not warrant as much confidence as the above p values would indicate, as these particular data are less than ideal for one of the conditions of the model. The data set shows rather large and unsystematic (though nonsignificant) differences in the WGHs of blacks and whites on the various tests. Therefore, the estimate of BGH, though similar to the overall WGH of the thirteen tests (about .60), is questionable. Even though the WGHs of the general factor do not differ significantly between the races, the difference is large enough to leave doubt as to whether it is merely due to sampling error or is in fact real but cannot be detected given the sample size. If the latter is true, then the model used in this particular method of analysis (termed the psychometric factor model) cannot rigorously be applied to these particular data.

A highly similar methodology (using a less restrictive model termed the biometric factor model) was applied to a much larger data set by behavioral geneticists David Rowe and co-workers. But Rowe's large-scale preliminary studies should first be described. He began by studying the correlations between objective tests of scholastic achievement (which are substantially loaded on g as well as on specific achievement factors) and assessment of the quality of the child's home environment based on environmental variables that previous research had established as correlates of IQ and scholastic achievement and which, overall, are intended to indicate the amount of intellectual stimulation afforded by the child's environment outside of school. Measures of the achievement and home environment variables were obtained on large samples of biologically full-sibling pairs, each tested twice (at ages 6.6 and 9.0 years). The total sample comprised three groups: white, black, and Hispanic, and represented the full range of socioeconomic levels in the United States, with intentional oversampling of blacks and Hispanics.

The data on each population group were treated separately, yielding three matrices (white, black, and Hispanic), each comprising the correlations between (1) the achievement and the environmental variables within and between age groups, (2) the full-sibling correlations on each variable at each age, and (3) the cross-sibling correlations on each variable at each age -- yielding twenty-eight correlation coefficients for each of the three ethnic groups.

Now if, in addition to the environmental factors measured in this study, there were some unidentified Factor X that is unique to a certain group and is responsible for most of the difference in achievement levels between the ethnic groups, one would expect that the existence of Factor X in one (or two), but not all three, of the groups should be detectable by an observed difference between groups in the matrix of correlations among all of the variables. That is, a Factor X hypothesized to represent a unique causal process responsible for lower achievement in one groups but not in the others should cause the pattern of correlations between environment and achievement, or between siblings, or between different ages, to be distinct for that group. However, since the correlation matrices were statistically equal, there was not the slightest evidence of a Factor X operating in any group. The correlation matrices of the different ethnic groups were as similar to one another as were correlation matrices derived from randomly selected half-samples within each ethnic group.

Further analyses by Rowe et al. that included other variables yielded the same results. Altogether the six data sets used in their studies included 8,582 whites, 3,392 blacks, 1,766 Hispanics, and 906 Asians. None of the analyses required a minority-unique developmental process or a cultural-environmental Factor X to explain the correlations between the achievement variables and the environmental variables in either of the minority groups. The results are consistent with the default hypothesis, as explained by Rowe et al: "Our explanation for the similarity of developmental precesses is that (a) different racial and ethnic groups possess a common gene pooi, which can create behavioral similarities, and that (b) among second-generation ethnic and racial groups in the United States, cultural differences are smaller than commonly believed because of the omnipresent force of our mass-media culture, from television to fast-food restaurants. Certainly, a burden of proof must shift to those scholars arguing a cultural difference position. They need to explain how matrices representing developmental processes can be so similar across ethnic and racial groups if major developmental processes exert a minority-specific influence on school achievement."

The dual hypothesis, which attributes the within-group variance to both genetic and environmental factors but excludes genetic factors from the mean differences between groups, would, in the light of these results, have to invoke a Factor X which, on the one hand, is so subtle and ghostly as to be perfectly undetectable in the whole matrix of correlations among test scores, environmental measures, full-siblings, and ages, yet sufficiently powerful to depress the minority group scores, on average, by as much as one-half a standard deviation.

To test the hypothesis that genetic as well as environmental factors are implicated in the group differences, Rowe and Cleveland designed a study that used the kind of structural equation modeling methodology (with the biometric factor model) mentioned previously. The study used full-siblings and half-siblings to estimate the WGH for large samples of blacks and whites (total N = 1,220) on three Peabody basic achievement tests (Reading Recognition, Reading Comprehension, and general Mathematics). A previous study had found that the heritability (WGH) of these tests averaged about .50 and their average correlation with verbal IQ = .65. The achievement tests were correlated among themselves about .75., indicating that they all share a large common factor, with minor specificities for each subtest.

The default hypothesis that the difference between the black and white group means on the single general achievement factor has the same genetic and non-genetic causes that contribute to individual differences within each group could not be rejected. The data fit the default model extremely well, with a goodness-of-fit index of .98 (which, like a correlation coefficient, is scaled from zero to one). The authors concluded that the genetic and environmental sources of individual differences and of differences between racial means appear to be identical. Compared to the white siblings, the black siblings had lower means on both the genetic and the environmental components. To demonstrate the sensitivity of their methodology, the authors substituted a fake mean value for the real mean for whites on the Reading Recognition test and did the same for blacks on the Math test. The fake white mean approximately equaled the true black mean and vice versa. When the same analysis was applied to the data set with the fake means, it led to a clear-cut rejection of the default hypothesis. For the actual data set, however, the BGH did not differ significantly from the WGH. The values of the BGH were .66 to .74 for the verbal tests and .36 for the math test. On the side of caution, the authors state, "These estimates, of course, are imprecise because of sampling variation; they suggest that a part of the Black versus White mean difference is caused by racial genetic differences, but that it would take a larger study, especially one with more genetically informative half-sibling pairs, to make such estimates quantitatively precise".

Regression to the Population Mean. In the 1860s, Sir Francis Galton discovered a phenomenon that he first called reversion to the mean and later gave it the more grandiloquent title the law of filial regression to mediocrity. The phenomenon so described refers to the fact that, on every quantitative hereditary trait that Galton examined, from the size of peas to the size of persons, the measurement of the trait in the mature offspring of a given parent (or both parents) was, on average, closer to the population mean (for their own sex) than was that of the parent(s). An exceptionally tall father, for example, had Sons who are shorter than he; and an exceptionally short father had sons who were taller than he. (The same for mothers and daughters.)

This "regression to the mean" is probably better called regression toward the mean, the mean being that of the subpopulation from which the parent and offspring were selected. In quantitative terms, Galton's "law" predicts that the more that variation in a trait is determined by genetic factors, the closer the degree of regression (from one parent to one child), on average, approximates one-half. This is because an offspring receives exactly one-half of its genes from each parent, and therefore the parent-offspring genetic correlation equals .50. The corresponding phenolypic correlation, of course, is subject to environmental influences, which may cause the phenotypic sibling correlation to be greater than or (more usually) less than the genetic correlation of .50. The more that the trait is influenced by nongenetic factors, the greater is the departure of the parent-offspring correlation from .50. The average of the parent-child correlations for IQ reported in thirty-two studies is +.42. Traits in which variation is almost completely genetic, such as the number of fingerprint ridges, show a parent-offspring correlation very near .50. Mature height is also quite near this figure, but lower in childhood, because children attain their adult height at different rates. (Differences in both physical and mental growth curves are also largely genetic.)

Regression occurs for all degrees of kinship, its degree depending on the genetic correlation for the given kinship. Suppose we measure individuals (termed probands) selected at random from a given population and then measure their relatives (all of the same degree of kinship to the probands). Then, according to Galton's "law" and the extent to which the trait of interest is genetically determined, the expected value (i.e., best prediction) of the proband's relative (in standardized units, z) is rGZP. The expected difference between a proband and his or her relative will be equal to rGZP, where rG is the theoretical genetic correlation between relatives of a given degree of kinship, ZP is the standardized phenotypic measurement of the proband, and ZR the predicted or expected measurement of the proband's relative. It should be emphasized that this prediction is statistical and therefore achieves a high degree of accuracy only when averaged over a large number of pairs of relatives. The standard deviation of the errors of prediction for individual cases is quite large.

A common misconception is that regression to the mean implies that the total variance in the population shrinks from one generation to the next, until eventually everyone in the population would be located at the mean on a given trait. In fact, the population variance does not change at all as a result of the phenomenon of regression. Regression toward the mean works in both directions. That is, offspring with phenotypes extremely above (or below) the mean have parents whose phenotypes are less extreme, but are, on average, above (or below) the population mean. Regression toward the mean is a statistical result of the imperfect correlation between relatives, whatever the causes of the imperfect correlation, of which there may be many.

Genetic theory establishes the genetic correlations between various kinships and thereby indicates how much of the regression for any given degree of kinship is attributable to genetic factors. Without the genetic prediction, any particular kinship regression (or correlation) is causally not interpretable. Resemblance between relatives could be attributed to any combination of genetic and nongenetic factors.

Empirical determination of whether regression to the mean accords with the expectation of genetic theory, therefore, provides another means of testing the default hypothesis. Since regression can result from environmental as well as from genetic factors (and always does to some extent, unless the trait variation has perfect heritability [i.e., h2 = 1] and the phenotype is without measurement error), the usefulness of the regression phenomenon based on only one degree of kinship to test a causal hypothesis is problematic, regardless of its purely statistical significance. However, it would be remarkable (and improbable) if environmental factors consistently simulated the degree of regression predicted by genetic theory across a number of degrees of kinship.

A theory that completely excludes any involvement of genetic factors in producing an observed group difference offers no quantitative prediction as to the amount of regression for a given kinship and is unable to explain certain phenomena that are both predictable and explainable in terms of genetic regression. For example, consider Figure 11.2 (p. 358) in the previous chapter. It shows a phenomenon that has been observed in many studies and which many people not familiar with Galton's "law" find wholly surprising. One would expect, on purely environmental grounds, that the mean IQ difference between black and white children should decrease at each successively higher level of the parental socioeconomic status (i.e., education, occupational level, income, cultural advantages, and the like). It could hardly be argued that environmental advantages are not greater at higher levels of SES, in both the black and the white populations. Yet, as seen in Figure 11.2, the black and white group means actually diverge with increasing SES, although IQ increases with SES for both blacks and whites. The specific form of this increasing divergence of the white and black groups is also of some theoretical interest: the black means show a significantly lower rate of increase in IQ as a function of SES than do the white means. These two related phenomena, black-white divergence and rate of increase in mean IQ as a function of SES, are predictable and explainable in terms of regression, and would occur even if there were no difference in IQ between the mean IQs of the black and the white parents within each level of SES. These results are expected on purely genetic grounds, although environmental factors also are most likely involved in the regression. For a given parental IQ, the offspring IQs (regardless of race) regress about halfway to their population mean. As noted previously, this is also true for height and other heritable physical traits.

Probably the single most useful kinship for testing the default hypothesis is full siblings reared together, because they are plentiful, they have developed in generally more similar environments than have parents and their own children, and they have a genetic correlation of about .50. I say "about .50" because there are two genetic factors that tend slightly to alter this correlation. As they work in opposite directions, their effects tend to cancel each other. When the total genetic variance includes nonadditive genetic effects (particularly genetic dominance) it slightly decreases the genetic correlation between full siblings, while assortative mating (i.e., correlation between the parents' genotypes) slightly increases the sibling correlation. Because of nongenetic factors, the phenotypic correlation between siblings is generally below the genetic correlation. Meta-analyses of virtually all of the full-sibling IQ correlations reported in the world literature yield an overall average r of only slightly below the predicted +.50.

Some years ago, an official from a large school system came to me with a problem concerning the school system's attempt to find more black children who would qualify for placement in classes for the "high potential" or "academically gifted" pupils (i.e., IQ of 120 or above). Black pupils were markedly underrepresented in these classes relative to whites and Asians attending the same schools. Having noticed that a fair number of the white and Asian children in these classes had a sibling who also qualified, the school system tested the siblings of the black pupils who had already been placed in the high-potential classes. However, exceedingly few of the black siblings in regular classes were especially outstanding students or had IQ scores that qualified them for the high-potential program. The official, who was concerned about bias in the testing program, asked if I had any other idea as to a possible explanation for their finding. His results are in fact fully explainable in terms of regression toward the mean.

I later analyzed the IQ scores on all of the full-sibling pairs in grades one through six who had taken the same IQ tests (Lorge-Thorndike) normed on a national sample in all of the fourteen elementary schools of another California school district. As this study has been described more fully elsewhere, I will only summarize here. There were over 900 white sibling pairs and over 500 black sibling pairs. The sibling intraclass correlations for whites and blacks were .40 and .38, respectively. The departure of these correlations from the genetically expected value of .50 indicates that nongenetic factors (i.e., environmental influences and unreliability of measurement) affect the sibling correlation similarly in both groups. In this school district, blacks and whites who were perfectly matched for a true-score IQ of 120 had siblings whose average IQ was 113 for whites and 99 for blacks. In about 33 percent of the white sibling pairs both siblings had an IQ of 120 or above, as compared with only about 12 percent of black siblings.

Of more general significance, however, was the finding that Galton's "law" held true for both black and white sibling pairs over the full range of IQs (approximately IQ 50 to IQ 150) in this school district. In other words, the sibling regression lines for each group showed no significant deviation from linearity. (Including nonlinear transformations of the variables in the multiple regression equation produced no significant increment in the simple sibling correlation.) These regression findings can be regarded, not as a proof of the default hypothesis, but as wholly consistent with it. No purely environmental theory would have predicted such results. Of course, ex post facto and ad hoc explanations in strictly environmental terms are always possible if one postulates environmental influences on IQ that perfectly mimic the basic principles of genetics that apply to every quantitative physical characteristic observed in all sexually reproducing plants and animals.

A number of different mental tests besides IQ were also given to the pupils in the school district described above. They included sixteen age-normed measures of scholastic achievement in language and arithmetic skills, short-term memory, and a speeded paper-and-pencil psychomotor test that mainly reflects effort or motivation in the testing situation. Sibling intraclass correlations were obtained on each of the sixteen tests. IQ, being the most g loaded of all the tests, had the largest sibling correlation. All sixteen of the sibling correlations, however, fell below +.50 to varying degrees; the correlations ranged from .10 to .45., averaging .30 for whites and .28 for blacks. (For comparison, the average age-adjusted sibling correlations for height and weight in this sample were .44 and .38, respectively.) Deviations of these sibling correlations from the genetic correlation of .50 are an indication that the test score variances do reflect nongenetic factors to varying degrees. Conversely, the closer the obtained sibling correlation approaches the expected genetic correlation of .50, the larger its genetic component. These data, therefore, allow two predictions, which, if borne out, would be consistent with the default hypothesis: (1.) The varying magnitudes of the sibling correlations on the sixteen diverse tests in blacks and whites should be positively correlated. In fact, the correlation between the vector of sixteen black sibling correlations and the corresponding vector of sixteen white sibling correlations was r = +.71, p = .002. (2.) For both blacks and whites, there should be a positive correlation between (a) the magnitudes of the sibling correlations on the sixteen tests and (b) the magnitudes of the standardized mean W-B differences on the sixteen tests. The results show that the correlation between the standardized mean W-B differences on the sixteen tests and the siblings correlations is r = +.61, p < .013 for blacks, and r = +.80, p < .001 for whites.

Note that with regard to the second prediction, a purely environmental hypothesis of the mean W-B differences would predict a negative correlation between the magnitudes of the sibling correlations and the magnitudes of the mean W-B differences. The results in fact showing a strong positive correlation contradict this purely nongenetic hypothesis.

CONTROLLING THE ENVIRONMENT: TRANSRACIAL ADOPTION

Theoretically, a transracial adoption study should provide a strong test of the default hypothesis. In reality, however, a real-life adoption study can hardly meet the ideal conditions necessary to make it definitive. Such conditions can be perfectly met only through the cross-fostering methods used in animal behavior genetics, in which probands can be randomly assigned to foster parents. Although adoption in infancy is probably the most comprehensive and powerful environmental intervention possible with humans, under natural conditions the adoption design is unavoidably problematic because the investigator cannot experimentally control the specific selective factors that affect transracial adoptions -- the adopted children themselves, their biological parents, or the adopting parents. Prenatal and perinatal conditions and the preadoption environment are largely uncontrolled. So, too, is the willingness of parents to volunteer their adopted children for such a study, which introduces an ambiguous selection factor into the subject sampling of any adoption study. It is known that individuals who volunteer as subjects in studies that involve the measurement of mental ability generally tend to be somewhat above-average in ability. For these reasons, and given the scarcity of transracial adoptions, few such studies have been reported in the literature. Only one of these, known as the Minnesota Transracial Adoption Study, is based on large enough samples of black and white adoptees to permit statistical analysis. While even the Minnesota Study does not meet the theoretically ideal conditions, it is nevertheless informative with respect to the default hypothesis.

Initiated and conducted by Sandra Scarr and several colleagues, the Minnesota Transracial Adoption Study examined the same groups of children when they were about age 7 and again in a 10-year follow-up when they were about age 17. The follow-up study is especially important, because it has been found in other studies that family environmental influences on IQ decrease from early childhood to late adolescence, while there is a corresponding increase in the phenotypic expression of the genetic component of IQ variance. Therefore, one would have more confidence in the follow-up data (obtained at age 17) as a test of the default hypothesis than in the data obtained at age 7.

Four main groups were compared on IQ and scholastic performance:

1. Biological offspring of the white adoptive parents.

2. Adopted children whose biological father and mother were both white (WW).

3. Adopted interracial children whose biological fathers were black and whose mothers were white (BW).

4. Adopted children whose biological father and mother were both black (BB).

The adoptive parents were all upper-middle class, employed in professional and managerial occupations, with an average educational level of about sixteen years (college graduate) and an average WAIS IQ of about 120. The biological parents of the BB and BW adoptees averaged 11.5 years and 12.5 years of education, respectively. The IQs of the adoptees' biological parents were not known. Few of the adoptees ever lived with their biological parents; some lived briefly in foster homes before they were legally adopted. The average age of adoption was 32 months for the BB adoptees, 9 months for the BW adoptees, and 19 months for the WW adoptees. The adoptees came mostly from the North Central and North Eastern regions of the United States. The Stanford-Binet and the Wechsler Intelligence Scale for Children (WTSC) were used in the first study (at age seven), the Wechsler Adult Intelligence Scale (WAIS) was used in the follow-up study (at age seventeen).

The investigators hypothesized that the typical W-B IQ difference results from the lesser relevance of the specific information content of IQ tests to the blacks' typical cultural environment. They therefore suggest that if black children were reared in a middle or upper-middle class white environment they would perform near the white average on IQ tests and in scholastic achievement. This cultural-difference hypothesis therefore posits no genetic effect on the mean W-B IQ difference; rather, it assumes equal black and white means in genotypic g. The default hypothesis, on the other hand, posits both genetic and environmental factors as determinants of the mean W-B IQ difference. It therefore predicts that groups of black and white children reared in highly similar environments typical of the white middle-class culture would still differ in IQ to the extent expected from the heritability of IQ within either population.

The data of the Minnesota Study also allow another prediction based on the default hypothesis, namely, that the interracial children (BW) should score, on average, nearly (but not necessarily exactly) halfway between the means of the WW and BB groups. Because the alleles that enhance IQ are genetically dominant, we would expect the BW group mean to be slightly closer to the mean of the WW group than to the mean of the BB group. That is, the heterosis (outbreeding enhancement of the trait) due to dominance deviation would raise the BW group's mean slightly above the midpoint between the BB and WW groups. (This halfway point would be the expected value if the heritability of IQ reflected only the effects of additive genetic variance.) Testing this predicted heterotic effect is unfortunately debased by the fact that the IQs of the biological parents of the BB and BW groups were not known. As the BB biological parents had about one year less education than the BW parents, given the correlation between IQ and education, it is likely that the mean IQ of the BB parents was somewhat lower than the mean IQ of the BW parents, and so would produce a result similar to that predicted in terms of heterosis. It is also possible, though less likely, that the later age of adoption (by twenty-one months) of the BB adoptees than of the BW adoptees would produce an effect similar to that predicted in terms of heterosis.

The results based on the subjects who were tested on both occasions are shown in Table 12.5. Because different tests based on different standardization groups were used in the first testing than were used in the follow-up testing, the overall average difference of about eight IQ points (evident for all groups) between the two test periods is of no theoretical importance for the hypothesis of interest. The only important comparisons are those between the WW, BW, and BB adopted groups within each age level. They show that:

* The biological offspring have about the same average IQ as has been reported for children of upper-middle-class parents. Their IQs are lower, on average, than the average IQ of their parents, consistent with the expected genetic regression toward the population mean (mainly because of genetic dominance, which is known to affect IQ-see Chapter. 7, pp. 189-91). The above-average environment of these adoptive families probably counteracts the predicted genetic regression effect to some extent, expectably more at age seven than at age seventeen.

* The BB adoptees' mean IQ is close to the mean IQ of ninety for blacks in the same North Central area (from, which the BB adoptees came) reared by their own parents. At age seventeen the BB group's IQ is virtually identical to the mean IQ of blacks in the North Central part of the United States. Having been reared from two years of age in a white upper-middle-class environment has apparently had little or no effect on their expected IQ, that is, the average IQ of black children reared in the average black environment. This finding specifically contradicts the expectation of the cultural-difference explanation of the W-B IQ difference, but is consistent with the default hypothesis.

* The BB group is more typical of the U.S. black population than is the BW group. The BB group's IQ at age seventeen is sixteen points below that of the white adoptees and thirteen points below the mean IQ of whites in the national standardization sample of the WAIS. Thus the BB adoptees' IQ is not very different from what would be expected if they were reared in the average environment of blacks in general (i.e., IQ eighty-five).

* The mean IQ of the interracial adoptees (BW), both at ages seven and seventeen, is nearly intermediate between the WW and BB adoptees, but falls slightly closer to the WW mean. This is consistent with, but does not prove, the predicted heterotic effect of outbreeding on IQ. The intermediate IQ at age seven is (WW + BB)/2 = (117.6 + 95.4)/2 = 106.5, or three points below the observed IQ of the BW group; at age seventeen the intermediate IQ is 97.5, or one point below the observed IQ of the BW group. Of course, mean deviations of this magnitude, given the sample sizes in this study, are not significant. Hence no conclusion can be drawn from these data regarding the predicted heterotic effect. But all of the group IQ means do differ significantly from one another, both at age seven and at age seventeen, and the fact that the BW adoptees are so nearly intermediate between the WW and BR groups is hard to explain in purely environmental or cultural terms. But it is fully consistent with the genetic prediction. An ad hoc explanation would have to argue for the existence of some cultural effects that quantitatively simulate the prediction of the default hypothesis, which is derived by simple arithmetic from accepted genetic theory.

* Results similar to those for IQ were also found for scholastic achievement measured at age seventeen, except that the groups differed slightly less on the scholastic achievement measures than on IQ. This is probably because the level of scholastic achievement is generally more susceptible to family influences than is the IQ. The mean scores based on the average of five measures of scholastic achievement and aptitude expressed on the same scale as the IQ were: Nonadopted biological offspring = 107.2, WW adoptees =103.1, BW adoptees = 100.1, BB adoptees = 95.1. Again, the BW group's mean is but one point above the midpoint between the means of the WW and BB groups.

In light of what has been learned from many other adoption studies, the results of this transracial adoption study are hardly surprising. As was noted in Chapter 7 (pp. 177-79), adoption studies have shown that the between-family (or shared) environment is the smallest component of true-score IQ variance by late adolescence.

It is instructive to consider another adoption study by Scarr and Weinberg, based on nearly 200 white children who, in their first year of life, were adopted into 104 white families. Although the adoptive families ranged rather widely in socioeconomic status, by the time the adoptees were adolescents there were nonsignificant and near-zero correlations between the adoptee's IQs and the characteristics of their adoptive families, such as the parents' education, IQ, occupation, and income. Scarr and Weinberg concluded that, within the range of "humane environments," variations in family socioeconomic characteristics and in child-rearing practices have little or no effect on IQ measured in adolescence. Most "humane environments," they claimed, are functionally equivalent for the child's mental development.

In the transracial adoption study, therefore, one would not expect that the large differences between the mean IQs of the WW, BW, and BB adoptees would have been mainly caused by differences in the unquestionably humane and well-above-average adoptive family environments in which these children grew up. Viewed in the context of adoption studies in which race is not a factor, the group differences observed in the transracial adoption study would be attributed to genetic factors.

There is simply no good evidence that social environmental factors have a large effect on IQ, particularly in adolescence and beyond, except in cases of extreme environmental deprivation. In the Texas Adoption Study, for example, adoptees whose biological mothers had IQs of ninety-five or below were compared with adoptees whose biological mothers had IQs of 120 or above. Although these children were given up by their mothers in infancy and all were adopted into good homes, the two groups differed by 15.7 IQ points at age 7 years and by 19 IQ points at age 17. These mean differences, which are about one-half of the mean difference between the low-IQ and high-IQ biological mothers of these children, are close to what one would predict from a simple genetic model according to which the standardized regression of offspring on biological parents is .50.

In still another study, Turkheimer used a quite clever adoption design in which each of the adoptee probands was compared against two nonadopted children, one who was reared in the same social class as the adopted proband's biological mother, the other who was reared in the same social class as the proband's adoptive mother. (In all cases, the proband's biological mother was of lower SES than the adoptive mother.) This design would answer the question of whether a child born to a mother of lower SES background and adopted into a family of higher SES background would have an IQ that is closer to children who were born and reared in a lower SES background than to children born and reared in a higher SES background. The result: the proband adoptees' mean IQ was nearly the same as the mean IQ of the nonadopted children of mothers of lower SES background but differed significantly (by more than 7.5 IQ points) from the mean IQ of the nonadopted children of mothers of higher SES background. In other words, the adopted probands, although reared by adoptive mothers of higher SES than that of the probands' biological mothers, turned out about the same with respect to IQ as if they had been reared by their biological mothers, who were of lower SES. Again, it appears that the family social environment has a surprisingly weak influence on IQ. This broad factor therefore would seem to carry little explanatory weight for the IQ differences between the WW, BW, and BB groups in the transracial adoption study.

There is no evidence that the effect of adoption is to lower a child's IQ from what it would have been if the child were reared by it own parents, and some evidence indicates the contrary. Nor is there evidence that transracial adoption per se is disadvantageous for cognitive development. Three independent studies of Asian children (from Cambodia, Korea, Thailand, and Vietnam) adopted into white families in the United States and Belgium have found that, by school age, their IQ (and scholastic achievement), on average, considerably exceeds that of middle-class white American and Belgian children by at least ten IQ points, despite the fact that many of the Asian children had been diagnosed as suffering from malnutrition prior to adoption.

The authors of the Minnesota Study suggest the difference in age of adoption of the BB and BW groups (32 months and 9 months, respectively) as a possible cause of the lower IQ of the BB group (by 12 points at age 7, 9 points at age 17). The children were in foster care prior to adoption, but there is no indication that the foster homes did not provide a humane environment. A large-scale study specifically addressed to the effect of early versus late age of adoption on children's later IQ did find that infants who were adopted before one year of age had significantly higher IQs at age four years than did children adopted after one year of age, but this difference disappeared when the children were retested at school age. The adoptees were compared with nonadopted controls matched on a number of biological, maternal, prenatal, and perinatal variables as well as on SES, education, and race. The authors concluded, "The adopted children studied in this project not only did not have higher IQ than the [matched] controls, but also did not perform at the same intellectual level as the biologic children from the same high socioeconomic environment into which they were adopted. . . . the better socioeconomic environment provided by adoptive parents is favorable for an adopted child's physical growth (height and weight) and academic achievement but has no influence on the child's head measurement and intellectual capacity, both of which require a genetic influence."

In the Minnesota Transracial Adoption Study, multiple regression analyses were performed to compare the effects of ten environmental variables with the effects of two genetic variables in accounting for the IQ variance at age seventeen in the combined black and interracial groups (i.e., BB & BW). The ten environmental variables were those associated with the conditions of adoption and the adoptive family characteristics (e.g., age of placement, time in adoptive home, number of preadoptive placements, quality of preadoptive placements, adoptive mother's and father's education, IQ, occupation, and family income). The two genetic variables were the biological mother's race and education. (The biological father's education, although it was known, was not used in the regression analysis; if it were included, the results might lend slightly more weight to the genetic variance accounted for by this analysis.) The unbiased multiple correlation (R) between the ten environmental variables and IQ was .28. The unbiased R between the two genetic variables and IQ was .39. This is a fairly impressive correlation, considering that mother's race was treated, as a dichotomous variable with a 72%(BW mothers)/28%(BB mothers) split. (The greater the departure from the optimal 50%/50% split, the more restricted is the size of the obtained correlation. If the obtained correlation of .39 were corrected to compensate for this suboptimal split, the corrected value would be .43.) Moreover, mother's education (measured in years) is a rather weak surrogate for IQ; it is correlated about + .7 with IQ in the general population. (In the present sample, the biological mothers' years of education in the BB group had a mean of 10.9, SD = 1.9 years, range 6-14 years; the BW group had a mean of 12.4, SD = 1.8, range 7-18.)

The two critiques, by Levin and by Lynn, of the authors' social-environmental interpretation of the results of their follow-up study are well worth reading, as is the authors' detailed reply, in which they state, "We think that it is exceedingly implausible that these differences are either entirely genetically based or entirely environmentally based."

STUDIES BASED ON RACIAL ADMIXTURE

In the Minnesota Transracial Adoption Study, the interracial adoptees labeled BW (black father, white mother) had a mean IQ approximately intermediate between those of the white (WW) and the black (BB) adoptees. One might expect, therefore, that individual variation in IQ among the population of black Americans would be correlated with individual variation in the percentage of Caucasian admixture. (The mean percentage of European genes in American blacks today is approximately 25 percent, with an undetermined standard deviation for individual variation.) This prediction could be used to test the hypothesis that blacks and whites differ in the frequencies of the alleles whose phenotypic effects are positively correlated with g. The several attempts to do so, unfortunately, are riddled with technical difficulties and so are unable to reduce the uncertainty as to the nature of the mean W-B difference in IQ.

An ideal study would require that the relative proportions of European and African genes in each hybrid individual be known precisely. This, in turn, would demand genealogical records extending back to each individual's earliest ancestors of unmixed European and African origin. In addition, for the results to be generalizable to the present-day populations of interest, one would also need to know how representative of the white and black populations in each generation of interracial ancestors of the study probands (i.e., the present hybrid individuals whose level of g is measured) were. A high degree of assortative mating for g, for example, would mean that these ancestors were not representative and that cross-racial matings transmitted much the same g-related alleles from each racial line. Also, the results would be ambiguous if there were a marked systematic difference in the g levels of the black and white mates (e.g., in half of the matings the black [or hybrid] g > white g and vice versa in the other half). This situation would act to cancel any racial effect in the offspring's level of g.

A large data set that met these ideal conditions would provide a strong test of the genetic hypothesis. Unfortunately, such ideal data do not exist, and are probably impossible to obtain. Investigators have therefore resorted to estimating the degree of European admixture in representative samples of American blacks 'by means of blood-group analyses, using those blood groups that differ most in frequency between contemporary Europeans and Africans in the regions of origin of the probands' ancestors. Each marker blood group is identified with a particular polymorphic gene. Certain antigens or immunoglobulins in the blood serum, which have different polymorphic gene loci, are also used in the same way. The gene loci for all of the known human blood loci constitute but a very small fraction of the total number of genes in the human genome. To date, only two such loci, the Fy (Duffy) blood group and the immunoglobulin Gm, have been identified that discriminate very markedly between Europeans and Africans, with near-zero frequencies in one population and relatively high frequencies in the other. A number of other blood groups and blood serum antigens also discriminate between Europeans and Africans, but with much less precision. T. E. Reed, an expert on the genetics of blood groups, has calculated that a minimum of eighteen gene loci with perfect discrimination power (i.e., 100 percent frequency in one population and 0 percent in the other) are needed to determine the proportions of European/African admixture with a 5 percent or less error rate for specific individuals. This condition is literally impossible to achieve given the small number of blood groups and serum antigens known to differ in racial frequencies. However, blood group data, particularly that of Fy and Gm, aggregated in reasonably large samples are capable of showing statistically significant mean differences in mental test scores between groups if in fact the mean difference has a genetic component.

A critical problem with this methodology is that we know next to nothing about the level of g in either the specific European or African ancestors or of the g-related selective factors that may have influenced mating patterns over the many subsequent generations of the hybrid offspring, from the time of the first African arrivals in America up to the present. Therefore, even if most of the European blood-group genes in present-day American blacks had been randomly sampled from European ancestors, the genes associated with g may not have been as randomly sampled, if systematic selective mating took place between the original ancestral groups or in the many generations of hybrid descendants.

Another problem with the estimation of racial admixture from blood-group frequencies is that most of the European genes in the American black gene pool were introduced generations ago, mostly during the period of slavery. According to genetic principles, the alleles of a particular racial origin would become in increasingly disassociated from one another in each subsequent generation. The genetic result of this disassociation, which is due to the phenomena known as crossing-over and independent segregation of alleles, is that any allele that shows different frequencies in the ancestral racial groups becomes increasingly less predictive of other such alleles in each subsequent generation of the racially hybridized population. If a given blood group of European origin is not reliably correlated with other blood groups of European origin in a representative sample of hybrid individuals, we could hardly expect it to be correlated with the alleles of European origin that affect g. In psychometric terms, such a blood group would be said to have little or no validity for ranking hybrid individuals according to their degree of genetic admixture, and would therefore be useless in testing the hypothesis that variation in g in a hybrid (black-white) population is positively correlated with variation in amount of European admixture.

This disassociation among various European genes in black Americans was demonstrated in a study based on large samples of blacks and whites in Georgia and Kentucky. The average correlations among the seven blood-group alleles that differed most in racial frequencies (out of sixteen blood groups tested) were not significantly different from zero, averaging -.015 in the white samples (for which the theoretically expected correlation is zero) and -.030 in the black samples. (Although the correlations between blood groups in individuals were nil, the total frequencies of each of the various blood groups were quite consistent [r=.88] across the Georgia and Kentucky samples.) Gm was not included in this correlation analysis but is known to be correlated with Fy. These results, then, imply that virtually all blood groups other than Fy and Gm are practically useless for estimating the proportions of Caucasian admixture in hybrid black individuals. It is little wonder, then, that, in this study, the blood-group data from the hybrid black sample yielded no evidence of being significantly or consistently correlated with g (which was measured as the composite score on nineteen tests).

A similar study, but much more complex in design and analyses, by Sandra Scarr and co-workers, ranked 181 black individuals (in Philadelphia) on a continuous variable, called an "odds" index, estimated from twelve genetic markers that indicated the degree to which an individual's genetic markers resembled those of Africans without any Caucasian ancestry versus the genetic markers of Europeans (without any African ancestry). This is probably an even less accurate estimate of ancestral admixture than would be a direct measure of the percentage of African admixture, which (for reasons not adequately explained by the authors) was not used in this study, although it was used successfully in another study of the genetic basis of the average white-black difference in diastolic blood pressure. The "odds" index of African ancestry showed no significant correlation with individual IQs. It also failed to discriminate significantly between the means of the top and bottom one-third of the total distribution on the "ancestral odds" index of Caucasian ancestry. In brief, the null hypothesis (i.e., no relationship between hybrid mental test score and amount of European ancestry) could not be rejected by the data of this study. The first principal component of four cognitive tests yielded a correlation of only -.05 with the ancestral index. Among these tests, the best measure of fluid g, Raven matrices, had the largest correlation (-.13) with the estimated degree of African ancestry. (In this study, a correlation of ­?.14 would be significant at p < .05, one-tailed.) But even the correlation between the ancestral odds index based on the three best genetic markers and the ancestral odds index based on the remaining nine genetic markers was a nonsignificant +.10. A measure of skin color (which has a much greater heritability than mental test scores) correlated .27 (p < .01) with the index of African ancestry. When skin color and SES were partialed out of the correlation between ancestry and test scores, all the correlations were reduced (e.g., the Raven correlation dropped from ?.13 to ?.10). Since both skin color and SES have genetic components that are correlated with the ancestral index and with test scores, partialing out these variables further favors the null hypothesis by removing some of the hypothesized genetic correlation between racial admixture and test scores.

It is likely that the conclusions of this study constitute what statisticians refer to as Type II error, acceptance of the null hypothesis when it is in fact false. Although these data cannot reject the null hypothesis, it is questionable whether they are capable in fact of rejecting an alternative hypothesis derived from the default theory. The specific features of this data set severely diminish its power to reject the null hypothesis. In a rather complex analysis, I have argued that the limitations of this study (largely the lack of power due to the low validity of the ancestral index when used with an insufficient sample size) would make it incapable of rejecting not only the null hypothesis, but also any reasonable alternative hypothesis. This study therefore cannot reduce the heredity-environment uncertainty regarding the W-B difference in psychometric g. In another instance of Type II error, the study even upholds the null hypothesis regarding the nonexistence of correlations that are in fact well established by large-scale studies. It concludes, for example, that there is no significant correlation between lightness of skin color and SES of American blacks, despite the fact that correlations significant beyond the .01 level are reported in the literature, both for individuals' SES of origin and for attained SES.

Skin Color and IQ. Earlier researchers relied on objective measures of skin color as an index of the amount of African/European admixture. In sixteen out of the eighteen studies of the IQ of American blacks in which skin color was measured, the correlations between lightness of skin color and test scores were positive (ranging from +.12 to +.30).

Although these positive correlations theoretically might well reflect the proportion of Caucasian genes affecting IQ in the hybrid blacks, they are weak evidence, because skin color is confounded with social attitudes that may influence IQ or its educational and occupational correlates. It is more likely that the correlations are the result of cross-assortative mating for skin color and IQ, which would cause these variables to be correlated in the black population. (There is no doubt that assortative mating for skin color has taken place in the black population.) The same is of course true for the other visible racial characteristics that may be correlated with IQ. If, in the black population, lighter skin color (or a generally more Caucasoid appearance) and higher IQ (or its correlates: education, occupation, SES) are both considered desirable in a mate, they will be subject to assortative mating and to cross-assortative mating for the two characteristics, and the offspring would therefore tend to possess both characteristics. But any IQ-enhancing genes are as likely to have come from the African as from the European ancestors of the hybrid descendants.

In general, skin color and the other visible physical aspects of racial differences are unpromising variables for research aimed at reducing the heredity-environment uncertainty of the causal basis of the average W-B difference in g.

Black-White Hybrids in Post-World War II Germany. We saw in the Minnesota Transracial Adoption Study that the interracial (BW) adoptees, whose biological fathers were black and whose biological mothers were white, averaged lower in IQ than the adoptees who had two white parents (WW). This finding appears to be at odds with the study conducted by Eyferth in Germany following World War II, which found no difference between offspring of BW and WW matings who were reared by their biological mothers. All of the fathers (black or white) were members of the U.S. occupation forces stationed in Germany. The mothers were unmarried German women, mostly of low SES. There were about ninety-eight interracial (BW) children and about eighty-three white children (WW). The mothers of the BW and WW children were approximately matched for SES. The children averaged about 10 years of age, ranging between ages 5 and 13 years. They all were tested with the German version of the Wechsler Intelligence Scale for Children (WISC). The results are shown in Table 12.6. The overall WW-BW difference is only one IQ point. As there is no basis for expecting a difference between boys and girls (whose average IQs are equal in the WISC standardization sample), the eight-point difference between the WW boys and WW girls in this study is most likely due to sampling error. But sampling error does not only result in sample differences that are larger than the corresponding population difference; it can also result in sample differences that are smaller than the population difference, and this could be the case for the overall mean WW-BW difference.

This study, although consistent with a purely environmental hypothesis of the racial difference in test scores, is not conclusive, however, because the IQs of the probands' mothers and fathers' were unknown and the white and black fathers were not equally representative of their respective populations, since about 30 percent of blacks, as compared with about 3 percent of whites, failed the preinduction mental test and were not admitted into the armed services. Further, nothing was known about the Army rank of the black or white fathers of the illegitimate offspring; they could have been more similar in IQ than the average black or white in the occupation forces because of selective preferences on the part of the German women with whom they had sexual relations. Then, too, nearly all of the children were tested before adolescence, which is before the genotypic aspect of IQ has become fully manifested. Generally in adoption studies, the correlation of IQ and genotype increases between childhood and late adolescence, while the correlation between IQ and environment decreases markedly. Finally, heterosis (the outbreeding effect; see Chapter 7, p. 196) probably enhanced the IQ level of the interracial children, thereby diminishing the IQ difference between the interracial children and the white children born to German women. A heterotic effect equivalent to about +4 IQ points was reported for European-Asian interracial offspring in Hawaii.

Genetic Implications of IQ and Fertility for Black and White Women.

Fertility is defined as the number of living children a woman (married or unmarried) gives birth to during her lifetime. If, in a breeding population, IQ (and therefore g) is consistently correlated with fertility, it will have a compounded effect on the trend of the population's mean IQ in each generation -- an increasing trend if the correlation is positive, a decreasing trend if it is negative (referred to as positive or negative selection for the trait). This consequence naturally follows from the fact that mothers' and children's IQs are correlated, certainly genetically and usually environmentally.

If IQ were more negatively correlated with fertility in one population than in another (for example, the American black and white populations), over two or more generations the difference between the two populations' mean IQs would be expected to diverge increasingly in each successive generation. Since some part of the total IQ variance within each population is partly genetic (i.e., the heritability), the intergenerational divergence in population means would also have to be partly genetic. It could not be otherwise, unless one assumed that the mother-child correlation for IQ is entirely environmental (an assumption that has been conclusively ruled out by adoption studies). Therefore, in each successive generation, as long as there is a fairly consistent difference in the correlation between IQ and fertility for the black and white populations, some part of the increasing mean group difference in IQ is necessarily genetic. If fertility is negatively correlated with a desirable trait that has a genetic component, IQ for example, the trend is called dysgenic; if positively correlated, eugenic.

The phenomenon of regression toward the population mean (see Chapter 12, pp. 467-72) does not mitigate a dysgenic trend. Regression to the mean does not predict that a population's genotypic mean in one generation regresses toward the genotypic mean of the preceding generation. In large populations, changes in the genotypic mean of a given trait from one generation to the next can come about only through positive (or negative) selection for that trait, that is, by changes in the proportion's of the breeding population that fall into different intervals of the total distribution of the trait in question.

It is also possible that a downward genetic trend can be phenotypically masked by a simultaneous upward trend in certain environmental factors that favorably affect IQ, such as advances in prenatal care, obstetrical practices, nutrition, decrease in childhood diseases, and education. But as the positive effect of these environmental factors approaches asymptote, the downward dysgenic trend will continue, and the phenotypic (IQ) difference between the populations will begin to increase.

Is there any evidence for such a trend in the American black and white populations? There is, at least presently and during the last half of this century, since U.S. Census data relevant to this question have been available. A detailed study based on data from the U.S. Census Bureau and affiliated agencies was conducted by Daniel Vining, a demographer at the University of Pennsylvania. His analyses indicate that, if IQ is, to some degree heritable (which it is), then throughout most of this century (and particularly since about 1950) there has been an overall downward trend in the genotypic IQ of both the white and the black populations. The trend has been more unfavorable for the black population.

But how could the evidence for a downward trend in the genotypic component of IQ be true, when other studies have shown a gradual rise in phenotypic IQ over the past few decades? (This intergenerational rise in IQ, known as the "Flynn effect," is described in Chapter 10, pp. 318-22). Since the evidence for both of these effects is solid, the only plausible explanation is that the rapid improvement in environmental conditions during this century has offset and even exceeded the dysgenic trend. However, this implies that the effect of the dysgenic trend should become increasingly evident at the phenotypic level as improvements in the environmental factors that enhance mental development approach their effective asymptote for the whole population.

Table 12.7 shows the fertility (F) of white and black women within each one standard deviation interval of the total distribution of IQ in each population. (The average fertility estimates include women who have had children and women who have not had any children by age thirty-four.) Assuming a normal distribution (which is closely approximated for IQ within the range of ± 2s), the table also shows: (a) the estimated proportion (P) of the population within each interval, (b) the product of F X P, and (c) the mean IQ of the women within each interval. The average fertility in each of the IQ intervals and the average IQs in those intervals are negatively correlated (-.86 for whites, ?.96 for blacks), indicating a dysgenic trend in both populations, though stronger in the black population.

Now, as a way of understanding the importance of Table 12.7, let us suppose that the mean IQ for whites was 100 and the mean IQ for blacks was 85 in the generation preceding that of the present sample of women represented in Table 12.7. Further, suppose that in that preceding generation the level of fertility was the same within each IQ interval. Then their offspring (that is, the present generation) would have an overall mean IQ equal to the weighted mean of the average IQ within each IQ interval (the weights being the proportion, P, of the population falling within each IQ interval). These means would also be 100 and eighty-five for the white and black populations, respectively.

But now suppose that in the present generation there is negative selection for IQ, with the fertility of the women in each IQ interval exactly as shown in Table 12.7. (This represents the actual condition in 1978 as best as we can determine.)

What then will be the overall mean IQ of the subsequent generation of offspring? The weights that must be used in the calculation are the products of the average fertility (F) in each interval and the proportion (P) of women in each interval (i.e., the of values F X P, shown in Table 12.7). The predicted overall weighted mean IQ, then, turns out to be 98.2 for whites and 82.6 for blacks, a drop of 1.8 IQ points and of 2.4 IQ points, respectively. The effect thus increases the W-B IQ difference from 15 IQ points in the parent generation to 15.6 IQ points in the offspring generation -- an increase in the W-B difference of 0.6 IQ points in a single generation. Provided that IQ has substantial heritability within each population, this difference must be partly genetic. So if blacks have had a greater relative increase in environmental advantages that enhance IQ across the generations than whites have had, the decline of the genetic component of the black mean would be greater than the decline of the white genetic mean, because of environmental masking, as previously explained. We do not know just how many generations this differential dysgenic trend has been in effect, but extrapolated over three or four generations it would have worsening consequences for the comparative proportions in each population that fall above or below 100 IQ. (Of course, fertility rates could change in the positive direction, but so far there is no evidence of this.) In the offspring generation of the population samples of women shown in Table 12.7, the percentage of each population above/below IQ 100 would be: whites 43.6%/56.4%, blacks 12.4%/87.6% (assuming no increase in environmental masking between the generations). The W/B ratio above 100 IQ is about 43.6%/12.4% = 3.5; the B/W ratio below 100 IQ is .87.6%/56.4% = 1.55. These ratios or any approximations of them would have considerable consequences if, for example, an IQ of 100 is a critical cutoff score for the better-paid types of employment in an increasingly technological and information-intensive economy (see Chapter 14). Because generation time (measured as mother's age at the birth of her first child) is about two years less in blacks than in whites, the dysgenic trend would compound faster over time in the black population than in the white. Therefore, the figures given above probably underestimate any genetic component of the W-B IQ difference attributable to differential fertility.

This prediction follows from recent statistics on fertility rates. A direct test of this effect would require a comparison of the average IQ of women in one generation with the average IQ of all of their children who constitute the next generation. Such cross-generational IQ data are available from the National Longitudinal Study of Youth (NLSY). Large numbers of youths, including whites and blacks, originally selected as part of a nationally representative sample of the U.S. population, were followed to maturity. The mean IQ of the women in this group was compared with the mean IQ of their school-age children. Whereas the mean IQ difference between the white and black mothers in the study was 13.2 IQ points, the difference between the white and black children was 17.5 IQ points. That is, the overall mean W-B IQ difference in this sample had increased by about four IQ points in one generation. As there is no indication that the children had been reared in less advantaged environments than their mothers, this effect is most reasonably attributable to the negative correlation between the mothers' IQs and their fertility, which is more marked in the NLSY sample than in the Census sample represented in Table 12.7. But I have not found any bona fide data set that disconfirms either the existence of a dysgenic trend for IQ of the population as a whole or the widening disparity in the mean W-B IQ difference.

Racial Differences in Neonate Behavior. Although individual differences in infant psychomotor behavior (i.e., reactivity to sensory stimulation, muscular strength, and coordination) have very little, if any, correlation with mental ability measured from about age three years and up (and therefore are not directly relevant to individual or group differences in g), black and white infants, both in Africa and in America, differ markedly in psychomotor behavior even within the first few days and weeks after birth. Black neonates are more precocious in psychomotor development, on average, than whites, who are more precocious in this respect than Asians. This is true even when the black, white, and Asian babies were born in the same hospital to mothers of similar SES background who gave birth under the same obstetrical conditions. Early precocity in motor behavior among blacks also appears to be positively related to degree of African ancestry and is negatively related to their SES. African blacks are more precocious than American blacks, and, at least in the United States, black infants of lower SES are more precocious in motor development than blacks of middle and upper-middle SES. (The same SES relationship is also observed in whites.) These behavioral differences appear so early (e.g., one or two days after delivery, when the neonates are still in hospital and have had little contact with the mothers) that purely cultural or environmental explanations seem unlikely. Substantiated in at least three dozen studies, these findings constitute strong evidence for innate behavioral differences between groups.

Relationship of Myopia to IQ and Race. In Chapter 6 it was noted that myopia (nearsightedness) is positively correlated with IQ and that the relationship appears to be pleiotropic, that is, a gene affecting one of the traits also has some effect on the other trait. Further, there are significant racial and ethnic differences in the frequency of myopia. Among the major racial groups measured, the highest rates of myopia are found in Asians (particularly Chinese and Japanese); the lowest rates among Africans; and Europeans are intermediate. Among Europeans, Jews have the highest rate of myopia, about twice that of gentiles and about on a par with that of the Asians. The same rank ordering of all these groups is found for the central tendency of scores on highly g-loaded tests, even when these groups have had comparable exposure to education. Cultural and environmental factors, except as they may have had an evolutionary impact in the distant past, cannot adequately explain the differences found among contemporary populations. Among populations of the same ethnic background, no relationship has been found between myopia and literacy. Comparisons of groups of the same ethnicity who learned to read before age twelve with those who learned after age twelve showed no difference in rates of myopia.

Table 12.8 shows the results of preinduction examinations of random samples of 1,000 black and 11,000 white draftees for the U.S. Armed Services who were diagnosed as (a) mildly myopic and accepted for service, and (b) too severely myopic to be accepted. As myopia (measured in diopters) is approximately normally distributed in the population, the percentages of whites and blacks diagnosed as myopic can also be expressed in terms of their deviations from the population mean in standard deviation (s) units. These average deviations are shown on the right side of Table 12.8. They indicate the approximate cutoff points (in s units) for the diagnosis of mild and of severe myopia in the total frequency distribution of refractive error (extending from extreme hyperopia, or farsightedness [+3s], to emmetropia, or normal vision [0s], to extreme myopia [-3s]). The last column in Table 12.8 shows the W-B difference in the cutoff point for the diagnosis of myopia, which is ls for all who had either mild or severe myopia. Unfortunately, mental test scores on these subjects were not reported, but from other studies one would expect the group diagnosed as myopic to score about 0.5s higher than the nonmyopic. Studies in Europe and in the United States have reported differences of about seven to eight IQ points between myopes and nonmyopes.

Because myopia appears to be pleiotropic with IQ, the black-white difference in myopia is consistent with the hypothesis of a genetic component in the racial IQ difference. Further studies would be needed to make it an importantly interesting hypothesis. For one thing, the pleiotropy of myopia is not yet all that firmly established. Although one study provides fairly strong evidence for it, confirming studies are needed before one can make any inferences in regard to racial differences. More crucial, it is not known if myopia and IQ are also pleiotropic in the black population; there are no published studies of the correlation between IQ and myopia in blacks. Failure to find such a relationship would nullify the hypothesis.

Other testable hypotheses could also be based on various highly heritable physical traits that are correlated with g (see Chapter 6), some of which show racial differences (e.g., the ability to taste phenylthiocarbamide, color vision, visual acuity, susceptibility to perceptual illusions). But it is first necessary to establish that the correlation of the physical trait with g is pleiotropic within each racial group.

As each specific gene in the human genome related to g is discovered -- a search that is getting underway -- a determination of the genes' frequencies in different populations may make it possible to estimate the minimum percentage of the between-race variance in g that has a genetic basis. Assuming that the genetic research on quantitative trait loci already underway continues apace, it is possible that the uncertainty regarding the existence, and perhaps even the magnitude, of genetic group differences in g could probably be resolved, should we so desire, within the first decade of the next century.

ENVIRONMENTAL CAUSES OF GROUP DIFFERENCES IN g

From the standpoint of research strategy, it is sensible to ask where one can best look for the environmental variables that are the most likely to cause the nongenetic component of the black-white difference in g. The Factor X hypothesis encourages a search for nongenetic factors that are unique to the black-white difference and absent from individual differences among whites or among blacks. The default hypothesis leads us to look at the same kinds of environmental factors that contribute to g variance within each population as causal factors in the g difference between groups.

Among the environmental factors that have been shown to be important within either group, the between-families environmental variance markedly decreases after childhood, becoming virtually nil by late adolescence (see Chapter 7, pp. 179-81). In contrast, the within-family environmental variance remains fairly constant from early childhood to maturity, when it accounts for nearly all of the nongenetic variance and constitutes about 20 percent of the total true-score variance in psychometric g. The macroenvironmental variables responsible for the transient between-families variance in g would therefore seem to be an unlikely source of the observed population difference in g. A more likely source is the microenvironment that produces the within-family variance. The macroenvironment consists of those aspects of interpersonal behavior, values, customs, preferences, and life-style to which children are exposed at home and which clearly differ between families and ethnic groups in American society. The microenvironment consists of a great many small, often random, events that take place in the course of prenatal and postnatal life. Singly they have small effects on mental development, but in the aggregate they may have a large cumulative effect on the individual. These microenvironmental effects probably account for most of the nongenetic variance in IQ that remains after childhood.

This difference in the potency and persistence of the macro- and microenvironments has been consistently demonstrated in environmental enrichment and intervention programs specifically intended to provide underprivileged black children with the kinds of macroenvironmental advantages typically experienced by white middle-class children. They include use of educational toys and picture books, interaction with nurturing adults, attendance in a preschool or cognitively oriented day-care center, early adoption by well-educated white parents, and even extraordinarily intensive cognitive development programs such as the Milwaukee Project and the Abecedarian Project (Chapter 10, pp. 340-44). The effects of these programs on IQ and scholastic performance have generally been short-lived, and it is still debatable whether these improvements in the macroenvironment have actually raised the level of g at all. This is not surprising if we consider that the same class of environmental variables, largely associated with socioeconomic status (SES), has so little, if any, positive effect on g or on IQ beyond childhood within the white population. Recent research has shown that the kinds of macroenvironmental factors typically used to describe differences between white lower-middle class and white upper-middle class child-rearing environments and long thought to affect children's cognitive development actually have surprisingly little effect on IQ beyond childhood. The macroenvironmental variables associated with SES, therefore, seem unlikely sources of the black-white difference in g.

Hypothesizing environmental factors that are not demonstrably correlated with IQ within one or both populations is useless from the standpoint of scientific explanation. Unless an environmental variable can be shown to correlate with IQ, it has no explanatory value. Many environment-IQ correlations reported in the psychological literature, though real and significant, can be disqualified, however, because the relevant studies completely confound the environmental and the genetic causes of IQ variance. Multiple correlations between a host of environmental assessments and children's IQs ranging from below .50 to over .80 have been found for children reared by their biological parents. But nearly all the correlations found in these studies actually have a genetic basis. This is because children's IQs have 50 percent of their genetic variance in IQ in common with their biological parents, and the parents' IQs are highly correlated (usually about .70) with the very environmental variables that supposedly cause the variance in children's mental development. For children reared by adoptive parents for whom there is no genetic relationship, these same environmental assessments show little correlation with the children's IQs, and virtually zero correlation when the children have reached adolescence. The kinds of environmental variables that show little or no correlation with the IQs of the children who were adopted in infancy, therefore, are not likely to be able to explain IQ differences between subpopulations all living in the same general culture. This is borne out by the study of transracial adoptions (reviewed previously, pp. 472-78).

We can now review briefly the main classes of environmental variables that have been put forth to explain the black-white IQ difference, and evaluate each one in light of the above methodological criteria and the current empirical evidence.

Socioeconomic Status. Measures of SES are typically a composite of occupation, education, income, location of residence, membership in civic or social organizations, and certain amenities in the home (e.g., telephone, TV, phonograph, records, books, newspapers, magazines). Children's SES is that of their parents. For adults, SES is sometimes divided into "attained SES" and "SES of origin" (i.e., the SES of the parents who reared the individual). All of these variables are highly correlated with each other and they share a large general factor in common. Occupation (rank ordered on a scale from unskilled labor to professional and managerial) has the highest loading on this general SES factor.

The population correlations between SES and IQ for children fall in the range .30 to .40; for adults the correlations are .50 to .70, increasing with age as individuals approach their highest occupational level. There has probably been a higher degree of social mobility in the United States than in any other country. The attained SES of between one-third and one-half of the adult population in each generation ends up either above or below their SES of origin. IQ and the level of educational attainments associated with IQ are the best predictors of SES mobility. SES is an effect of IQ rather than a cause. If SES were the cause of IQ, the correlation between adults' IQ and their attained SES would not be markedly higher than the correlation between children's IQ and their parents' SES. Further, the IQs of adolescents adopted in infancy are not correlated with the SES of their adoptive parents. Adults' attained SES (and hence their SES as parents) itself has a large genetic component, so there is a genetic correlation between SES and IQ, and this is so within both the white and the black populations. Consequently, if black and white groups are specially selected so as to be matched or statistically equate on SES, they are thereby also equated to some degree on the genetic component of IQ. Whatever IQ difference remains between the two SES-equated groups, therefore, does not represent a wholly environmental effect. (Because the contrary is so often declared by sociologists, it has been termed the sociologist's fallacy.)

When representative samples of the white and black populations are matched or statistically equated on SES, the mean IQ difference is reduced by about one-third. Not all of this five or six IQ points reduction in the mean W-B difference represents an environmental effect, because, as explained above, whites and blacks who are equated on SES are also more alike in the genetic part of IQ than are blacks and whites in general. In every large-scale study, when black and white children were matched within each level on the scale of the parents' SES, the children's mean W-B IQ difference increased, going from the lowest to the highest level of SES. A statistical corollary of this phenomenon is the general finding that SES has a somewhat lower correlation (by about .10) with children's IQ in the black than in the white population. Both of these phenomena simply reflect the greater effect of IQ regression toward the population mean for black than for white children matched on above-average SES, as previously explained in this chapter (pp. 467-72). The effect shows up not only for IQ but for all highly g-loaded tests that have been examined in this way. For example, when SAT scores were related to the family income levels of the self-selected students taking the SAT for college admission, Asians from the lowest income level scored higher than blacks from the highest, and black students scored more than one standard deviation below white students from the same income level. It is impossible to explain the overall subpopulation differences in g-loaded test performance in terms of racial group differences in the privileges (or their lack) associated with SES and income.

Additional evidence that W-B differences in cognitive abilities are not the same as SES differences is provided by the comparison of the profile of W-B differences with the profile of SES differences on a variety of psychometric tests that measure somewhat different cognitive abilities (in addition to g).

This is illustrated in the three panels of Figure 12.1. The W-B difference in the national standardization sample on each of the thirteen subtests of the Wechsler Intelligence Scale for Children-Revised (WISC-R) is expressed as a point-biserial correlation between age-controlled scale scores and race (quantitized as white = 1, black = 0). The upper (solid-line) profile in each panel shows the full correlations of race (i.e., W or B) with the age-scaled subtest scores. The lower (dashed-line) profile in each panel shows the partial correlations, with the Full Scale IQ partialed out. Virtually all of the g factor is removed in the partial correlations, thus showing the profile of W-B differences free of g. The partial correlations (i.e., W-B differences) fall to around zero and differ significantly from zero on only six of the thirteen subtests (indicated by asterisks). The profile points for subtests on which whites outperform blacks are positive; those on which blacks outperform whites are negative (i.e., below zero).

Whites perform significantly better than blacks on the subtests called Comprehension, Block Design, Object Assembly, and Mazes. The latter three tests are loaded on the spatial visualization factor of the WISC-R. Blacks perform significantly better than whites on Arithmetic and Digit Span. Both of these tests are loaded on the short-term memory factor of the WISC-R. (As the test of arithmetic reasoning is given orally, the subject must remember the key elements of the problem long enough to solve it.) It is noteworthy that Vocabulary is the one test that shows zero W-B difference when g is removed. Along with Information and Similarities, which even show a slight (but nonsignificant) advantage for blacks, these are the subtests most often claimed to be culturally biased against blacks. The same profile differences on the WISC-R were found in another study based on 270 whites and 270 blacks who were perfectly matched on Full Scale IQ.

Panels B and C in Figure 12.11 show the profiles of the full and the partial correlations of the WISC-R subtests with SES, separately for whites and blacks. SES was measured on a five-point scale, which yields a mean W-B difference of 0.67 in standard deviation units. Comparison of the profile for race in Panel A with the profiles for SES in Panels B and C reveals marked differences. The Pearson correlation between profiles serves as an objective measure of their degree of similarity. The profiles of the partial correlations for race and for SES are negatively correlated: ?.45 for whites; ?.63 for blacks. The SES profiles for whites and for blacks are positively correlated: +0.59. While the profile of race X subtest correlations and the profile of SES X subtest correlations are highly dissimilar, the black profile of SES X subtest scores and the white profile of SES X subtest scores are fairly similar. Comparable results were found in another study that included racial and SES profiles based on seventy-five cognitive variables measured in a total sample of 70,000 high school students. The authors concluded, "[C]omparable levels of socioeconomic status tend to move profiles toward somewhat greater degrees of similarity, but there are also powerful causal factors that operate differentially for race [black-white] that are not revealed in these data. Degree of [economic] privilege is an inadequate explanation of the differences" (p. 205).

Race and SES Differences in Educational Achievement. Because the specific knowledge content of educational achievement tests is explicitly taught and learned in school, of course, scores on such tests reflect not only the individual's level of g but also the amount and type of schooling, the quality of teaching, and the degree of motivation for scholastic achievement. Nevertheless, tests of educational achievement are quite g-loaded, especially for groups of high school age with comparable years of schooling.

It is informative, therefore, to look at the black-white difference on achievement tests for the two most basic scholastic subjects, reading/verbal skills and mathematics, when a number of SES-related factors have been controlled. Such data were obtained on over 28,000 high school students in two independent large-scale surveys, the National Longitudinal Survey of Youth (NLSY) and the National Education Longitudinal Survey (NELS). In the two studies, the actual W-B mean differences on three tests (Math, Verbal, Reading) ranged from about 0.75 to 1 .25s. Regression analyses of the test scores obtained in each study controlled for a number of SES-related factors: family income, mother's education, father's education, age of mother at the birth of the proband, sex, number of siblings, mother single or married, mother working (or not), region of the country in which the proband lives.

When the effects of these SES factors on test scores statistically were removed by regression, the mean W-B differences in the NLSY were: for Math 0.49s, for Verbal 0.55s; in the NELS, for Math 0.59s, for Reading 0.51s. In a multiple-regression analysis for predicting the achievement test scores from twenty-four demographic and personal background variables, no other variable among the twenty-four had a larger predictive weight (independently of all the other variables in the regression equation) than the dichotomous W/B variable. Parents' education was the next most strongly predictive variable (independently of race and all other variables), averaging only about half as much predictive weight as the W/B variable. That most of the predictive power of parental education in these analyses is genetically mediated is inferred from the studies of individuals reared by adoptive parents, whose IQs and educational attainment, have a near-zero correlation with that of the adoptees. See Chapter 7.) Thus for measures of educational achievement, as for IQ. demographic and SES variables have been shown to account for only a small part of the W-B difference.

The Cumulative Deficit Theory. Cumulative deficit is really an empirical phenomenon that, in the 1960s, became a general theory of how environmental deprivation progressively decreased the IQ and scholastic performance of black children with increasing age relative to white age norms. The phenomenon itself is more accurately termed 'age-related decrement in IQ and achievement,' which is neutral as regards its nature and cause. The theory of cumulative deficit, its history, and empirical literature have been reviewed elsewhere. The theory says that environmental and educational disadvantages that cause a failure to learn something at an early age cause further failure at a later age and the resulting performance deficit, which affects IQ and scholastic achievement alike increases with age at an accelerating rate, accumulating like compound interest. At each stage of learning, the increasing deficit of prerequisite knowledge and skills hinders learning at each later stage of learning. This theory of the cause of shortfall in IQ and achievement of blacks and other poorly achieving group was a prominent feature of the rationale for the large-scale federal program intended to ameliorate these conditions begun in the 1960s -- interventions such as Head Start, compensatory education, and a host of experimental preschool programs for disadvantaged children.

The raw scores on all mental tests, including tests of scholastic achievement show an increasing divergence among individuals as they mature, from earl childhood to the late teens. In other words, both the mean and the standard deviation of raw scores increase with age. Similarly, the mean W-B difference in raw scores increases with age. This age-related increase in the mean W-B raw score difference, however, is not what is meant by the term "cumulative deficit." The cumulative deficit effect can only be measured at each age in term of the standardized scores (i.e., measures in unit of the standard deviation) for each age. A significant increase of the mean W-B difference in standardize scores (i.e., in s units) constitutes evidence for cumulative deficit, although this term does not imply the nature of its cause, which has remained purely hypothetical.

The mental test and scholastic achievement data of large-scale studies, such as those from the famous Coleman Report based on 450,000 pupils in 6,00 schools across the nation, failed to find any sign of the cumulative deficit effect for blacks in the nation as a whole. However, suggestive evidence was found for some school districts in the rural South, where the W-B difference in test of verbal ability increased from l.5s to l.7s to l.9s in Grades 6, 9, and 12, respectively. These findings were only suggestive because they were entirely based on cross-sectional data (i.e., different samples tested at each grade level rather than longitudinal data (the same sample tested at different grade levels).

Cross-sectional studies of age effects are liable to migratory and demographic changes in the composition of a local population.

Another method with fewer disadvantages even than a longitudinal study (which can suffer from nonrandom attrition of the study sample) compares the IQs of younger and older siblings attending the same schools. Cumulative deficit would be revealed by consistent IQ differences in favor of younger (Y) rather than older (O) siblings. This is measured by the signed difference between younger and older siblings (i.e., Y-O) on age-standardization test scores that constitute an equal-interval scale throughout their full range. Averaged over a large number of sibling pairs, the mean Y-O difference represents only an environmental or nongenetic effect, because there is nothing in genetic theory that relates sibling differences to birth order. The expected mean genotypic value of the signed differences between younger and older full siblings is therefore necessarily zero. A phenotypic Y-O difference would indicate the presence of a cumulative IQ deficit with increasing age.

This method was applied to IQ data obtained from all of the full siblings from kindergarten through grade six in a total of seventeen schools in California that had about 60 percent white and 40 percent black pupils. In general, there was no evidence of a cumulative deficit effect, either for blacks or for whites, with the exception of blacks in the primary grades, who showed the effect only on the verbal part of the IQ test that required some reading skill; the effect was largely attributable to the black males' greater lag in early reading skills compared to the black females; in the early years of schooling, boys in general tend to advance less rapidly in reading than do girls. Blacks showed no cumulative deficit effect at all in nonverbal IQ, and beyond the elementary grades there was no trace of a cumulative deficit in verbal IQ.

Overall, the cumulative deficit hypothesis was not borne out in this California school district, although the mean W-B IQ difference in this school population was greater than ls. However, the black population in this California study was socioeconomically more advantaged and socially more integrated with the white population than is true for blacks in many other parts of the country, particular those in the rural South. It is possible that the California black pupils did not show a cumulative deficit in IQ because the vast majority of them had grown up in a reasonably good environment and the cumulative deficit phenomenon might be manifested only when the blacks' degree of environmental disadvantage falls below some critical threshold for a normal rate of mental growth.

Exactly the same methodology, based on Y-O sibling differences in IQ, was therefore applied in an entire school system of a county in rural Georgia. It perfectly exemplified a generally poor community, especially its black population, which was well below the national black average in SES. Although the school population (49 percent white and 51 percent black) had long since been racially desegregated when the test data were obtained, the blacks' level of scholastic performance was exceedingly low by national standards. The mean W-B IQ difference for the entire school population was 1 .95s (white mean 102, SD 16.7; black mean 71, SD 15.1). If cumulative deficit were a genuine phenomenon and not an artifact of uncontrolled demographic variables in previous cross-sectional studies, the sibling methodology should reveal it in this rural Georgia community. One would be hard put to find a more disadvantaged black community, by all indices, anywhere in the United States. This study, therefore, provides a critical test of the cumulative deficit hypothesis.

The rural Georgia study included all of the full siblings of both racial groups from kindergarten through grade twelve. Appropriate forms of the same standardized IQ test (California Test of Mental Maturity) were used at each grade level. An examination of the test's scale properties in this population showed that it measured IQ as an interval scale throughout the full range of IQ at every age in both the black and white groups, had equally high reliability for both groups, and, despite the nearly two standard deviations IQ difference between the groups, IQ had an approximately normal distribution within each group.

No cumulative deficit effect could be detected in the white group. The Y-O sibling differences for whites showed no increase with age and they were uncorrelated with the age difference between siblings.

The result for blacks, however, was markedly different. The cumulative deficit effect was manifested at a high level of significance (p < .001). Blacks showed large decrements in IQ with increasing age that were almost linear from five to sixteen years of age, for both verbal and nonverbal IQ. For total IQ, the blacks had an average rate of IQ decrement of 1.42 points per year during their first ten or eleven years in school -- in all, a total decrement of about sixteen IQ points, or about half the total W-B difference of thirty-one IQ points that existed in this population.

It would be difficult to attribute the cause of this result to anything other than the effect of an exceedingly poor environment. A genetic hypothesis of the cumulative deficit effect seems highly unlikely in view of the fact that it was not found in blacks in the California study, although the sample size was large enough to detect even a very small effect size at a high level of statistical significance. Even if the blacks in California had, on average, a larger amount of Caucasian ancestry than blacks in rural Georgia, the cumulative deficit effect should have been evident, even if to a lesser degree, in the California group if genetic factors were involved. Therefore, the cause of the cumulative deficit, at least as observed in this study, is most probably of environmental origin. But the specific nature of the environmental cause remains unknown. The fact that it did not show up in the California sample suggests that a cumulative deficit does not account for any appreciable part of the overall W-B IQ difference of about 1s in nationally representative samples.

The overall W-B IQ difference of 1 .95s in the rural Georgia sample would be reduced to about ls if the decrement attributable to the cumulative effect were removed. What aspects of the environment could cause that large a decrement? It would be worthwhile to apply the sibling method used in these studies in other parts of the country, and in rural, urban or "inner city," and suburban populations of whites and blacks to determine just how widespread this cumulative deficit effect is in the black population. It is probably the most promising strategy for discovering the specific environmental factors involved in the W-B IQ difference.

The Interaction of Race X Sex X Ability. In 1970, it came to my attention that the level of scholastic achievement was generally higher for black females than for black males. A greater percentage of black females than of black males graduate from high school, enter and succeed in college, pass high-level civil service examinations, and succeed in skilled and professional occupations. A comparable sex difference is not found in the white population. To investigate whether this phenomenon could be attributed to a sex difference in IQ that favored females relative to males in the black population, I proposed the hypothesis I called the race X sex X ability interaction. It posits a sex difference in g (measured as IQ), which is expressed to some extent in all of the "real life" correlates of g. Because of the normal distribution of g for both sexes, selection on criteria that demand levels of cognitive ability that are well above the average level of ability in the population will be most apt to reveal the hypothesized sex difference in g and all its correlates. Success in passing high-level civil service examinations, in admission to selective colleges, and in high-level occupations, all require levels of ability well above the population average. They should therefore show a large difference in the proportions of each sex that can meet these high selection criteria, even when the average sex difference in the population as a whole is relatively small. This hypothesis is shown graphically in Figure 12.12. For example, if the cutoff score on the criterion for selection is at the white mean IQ of 100 (which is shown as la above the black mean IQ of eighty-five), and if the black female-male difference (F-M) in IQ is only 0.2s (i.e., three IQ points), the F/M ratio above the cutoff score would be about 1.4 females to 1 male. If the selection cutoff score (X) is placed 2s above the black mean, the F/M ratio would be 1.6 females to 1 male.

This hypothesis seemed highly worthy of empirical investigation, because if the sex difference in IQ for the black population were larger than it is for the white population (in which it is presumed to be virtually zero), the sex difference could help identify specific environmental factors in the W-B IQ difference itself. It is well established that the male of every mammalian species is generally more vulnerable to all kinds of environmental stress than is the female. There are higher rates of spontaneous abortion and of stillbirths for male fetuses and also a greater susceptibility to communicable diseases and a higher rate of infant mortality. Males are also psychologically less well buffered against unfavorable environmental influences than are females. Because a higher proportion of blacks than of whites grow up in poor and stressful environmental conditions that would hinder mental development, a sex difference in IQ, disfavoring males, would be greater for blacks than for whites.

I tested this race X sex X ability interaction hypothesis on all of the test data I could find on white and black samples that provided test statistics separately for males and females within each racial group. The analyses were based on a collection of various studies which, in all, included seven highly g-loaded tests and a total of more than 20,000 subjects, all of school age and most below age thirteen. With respect to the race X sex interaction, the predicted effect was inconsistent for different tests and in different samples. The overall effect for the combined data showed a mean female-male (F-M) difference for blacks of +0.2s and for whites of +0.ls. Across various tests and samples, the F-M differences for whites and for blacks correlated +.54 (p < .0 1), indicating that similar factors for both races accounted for the slight sex difference, but had a stronger effect for blacks. With the large sample sizes, even these small sex differences (equivalent to 3 and 1.5 IQ points for blacks and whites, respectively) are statistically significant. But they are too small to explain the quite large differences in cognitively demanding achievements between male and female blacks. Apparently the sex difference in black achievement must be attributed to factors other than g per Se. These may be personality or motivational factors, or sexually differential reward systems for achievement in black society, or differential discrimination by the majority culture. Moreover, because the majority of subjects were of elementary school age and because girls mature more rapidly than boys in this age range, some part of the observed sex difference in test scores might be attributable to differing rates of maturation. Add to this the fact that the test data were not systematically gathered so as to be representative of the whole black and white populations of the United States, or even of any particular region, and it is apparent that while this study allows statistical rejection of the null hypothesis, it does so without lending strong support to the race X sex interaction hypothesis.

The demise of the hypothesized race X sex interaction was probably assured by a subsequent large-scale study that examined the national standardization sample of 2,000 subjects on the WISC-R, the 3,371 ninth-grade students in Project TALENT who were given an IQ test, and a sample of 152,944 pupils in grades 5, 8, and 11 in Pennsylvania, who were given a test measuring verbal and mathematical achievement. The subjects' SES was also obtained in all three data sets. In all these data, the only significant (p < .05 with an N of 50,000) evidence of a race X sex X ability interaction was on the verbal achievement test for eleventh graders, and even it is of questionable significance when one considers the total number of statistical tests used in this study. In any case, it is a trifling effect. Moreover, SES did not enter into any significant interaction with race and sex.

Still another large data set used the Vocabulary and Block Design subtests of the WISC-R administered to a carefully selected national probability sample of 7,119 noninstitutionalized children aged six to eleven years. The Vocabulary + Block Design composite of the WISC-R has the highest correlation with the WISC-R Full Scale IQ of any other pair of subtests, and both Vocabulary and Block Design are highly g loaded. These data also showed no effects that are consistent with the race X sex X ability interaction hypothesis for either Vocabulary or Block Design. Similarly, the massive data of the National Collaborative Perinatal Project, which measured the IQs of more than 20,000 white and black children at ages four and seven years, yielded such a small interaction effect as to make its statistical significance virtually irrelevant.

Although the race X sex interaction hypothesis must now be discarded, it has nevertheless raised an important question about the environmental factors that have biological consequences for mental development as a possible cause of the W-B difference in g.

NONGENETIC BIOLOGICAL FACTORS IN THE W-B DIFFERENCE

The psychological, educational, and social factors that differ between families within racial groups have been found to have little, if any, effect on individual differences in the level of g after childhood. This class of variables, largely associated with socioeconomic differences between families, has similarly little effect on the differing average levels of g between native-born, English-speaking whites and blacks. By late adolescence, the IQs of black and white infants adopted by middle or upper-middle SES white parents are, on average, closer to the mean IQ of their respective populations than to that of either their adoptive parents or their adoptive parents' biological children. Preschool programs such as Head Start and the much more intensive and long-term educational interventions (e.g., the Milwaukee Project and the Abecedarian Project) have been shown to have little effect on g.

It is reasonable, therefore, to look beyond these strictly social and educational variables and to consider the nongenetic, or environmental, factors of a biological nature that may have adverse effects on mental development. These include prenatal variables such as the mother's age, general health, and life-style during pregnancy (e.g., maternal nutrition, smoking, drinking, drug habits), number of previous pregnancies, spacing of pregnancies, blood-type incompatibility (e.g., kernicterus) between mother and fetus, trauma, and history of X-ray exposure. To these can be added the many obstetrical and perinatal variables, including premature birth, low birth weight, duration of labor, forceps delivery, anoxia at birth. Postnatal factors shown to have adverse effects include neonatal and childhood diseases, head trauma, and malnutrition during the period of maximum growth of the brain (from birth to five years of age). Although each of these biological factors singly may have only a very small average effect on IQ in the population, the cumulative effect of many such adverse microenvironmental factors on any one individual can produce a decrement in g that has significant consequences for that individual's educability. Also, certain variables, though they may have a large negative effect on later IQ for some individuals, occur with such low frequency in the population as to have a negligible effect on the total variance in IQ, either within or between groups.

The largest study of the relationship between these nongenetic factors and IQ is the National Collaborative Perinatal Project conducted by the National Institutes of Health. The study pooled data gathered from twelve metropolitan hospitals located in different regions of the United States. Some 27,000 mothers and their children were studied over a period of several years, starting early in the mother's pregnancy, through the neonatal period, and at frequent intervals thereafter up to age four years (when all of the children were given the Stanford Binet IQ test). Most of this sample was also tested at age seven years with the Wechsler Intelligence Scale for Children (WISC). About 45 percent of the sample children were white and 55 percent were black. The white sample was slightly below the national average for whites in SES; the black sample was slightly higher in SES than the national black average. The white mothers and black mothers differed 1.02 on a nonverbal IQ test. The mean W-B IQ difference for the children was 0.86s at age four years and 1.0ls at age seven years.

A total of 168 variables (in addition to race) were screened. They measured family characteristics, family history, maternal characteristics, prenatal period, labor and delivery, neonatal period, infancy, and childhood. The first point of interest is that eighty-two of the 168 variables showed highly significant (p < .001) correlations with IQ at age four in the white or in the black sample (or in both). Among these variables, 59 (or 72 percent) were also correlated with race; and among the 33 variables that correlated .10 or more with IQ, 31 (or 94 percent) were correlated with race.

Many of these 168 variables, of course, are correlated with each other and therefore are not all independently related to IQ. However, a multiple regression analysis applied to the set of sixty-five variables for which there was complete data for all the probands in the study reveals the proportion of the total variance in IQ that can be reliably accounted for by all sixty-five variables. The regression analyses were performed separately within groups, both by sex (male-female) and by race (white-black), yielding four separate analyses. The percentage of IQ variance accounted for by the sixty-five independent variables (averaged over the four sex X race groups) was 22.7 percent. This is over one-fifth of total IQ variance.

However, not all of this variance in these sixty-five variables is necessarily environmental. Some of the IQ variance is attributable to regional differences in the populations surveyed, as the total subject sample was distributed over twelve cities in different parts of the country. And some of the variance is attributable to the mother's education and socioeconomic status. (This information was not obtained for fathers.) Mother's education alone accounts for 13 percent of the children's IQ variance, but this is most likely a genetic effect, since adopted children of this age show about the same degree of relationship to their biological mothers with whom they have had no social contact. The proband's score on the Bayley Scale obtained at eight months of age also should not be counted as an environmental variable. This yields four variables in the regression analysis that should not be counted strictly as environmental factors -- region, mother's education, SES, and child's own test score at eight months. With the effects of these variables removed, the remaining sixty-one environmental variables account for 3.4 percent of the variance in children's IQ, averaged over the four race X sex groups. Rather unexpectedly, the proportion of environmental variance in IQ was somewhat greater in the white sample than in the black (4.2 percent vs. 2.6 percent). The most important variable affecting the probands' IQ independently of mother's education and SES in both racial groups was mother's age, which was positively correlated with child's IQ for mothers in the age range of twelve to thirty-six years.

How can we interpret these percentage figures in terms of IQ points? Assuming that the total variance in the population consisted only of the variance contributed by this large set of environmental variables, virtually all of a biological but nongenetic nature, the standard deviation of true-score IQs in the population would be 2.7 IQ points. The average absolute IQ difference between pairs of individuals picked at random from this population would be three IQ points. This is the average effect that the strictly biological environmental variables measured in the Collaborative Project has on IQ. It amounts to about one-fifth of the average mean W-B IQ difference.

Unfortunately, the authors of the Collaborative Project performed only within-group regression analyses. They did not enter race as an independent variable into the multiple regression analysis, stating explicitly that the independent effect of race was not assessed. A regression analysis in which race, as an independent variable, was entered after all of the nongenetic environmental variables could have shown the independent effect of race on IQ when the effect of the environmental variables was removed. This would have allowed testing of the strict form of the default hypothesis. It posits that the environmental variance between groups is the same as the environmental variance within groups, in which case about three points of the fifteen points mean W-B IQ difference would be attributable to nongenetic biological environment, assuming that all of these environmental factors worked in a harmful direction for blacks.

There are three reasons to suspect that this study underrepresents the effects of the nongenetic biological environment on the IQ of blacks in the general populations.

1. The black sample is somewhat above average in SES compared to the black population as a whole. What today is termed the underclass, which includes some one-fourth to one-third of the total black population, is underrepresented in the study sample; much of the U.S. black population is at or below the zero point on the scale of SES used in this study, as shown in Figure 12.13. The biological factors that adversely affect IQ almost certainly have a higher incidence in this poorest segment of the population, which was underrepresented in the Collaborative Project.

2. The selection of mothers entering the study excluded all women who had not received care in the prenatal clinic from early in their pregnancies. All of the subjects in the study, both black and white, received prenatal care, while many underclass mothers do not receive prenatal care. The Project mothers also received comparable high-quality obstetrical and perinatal treatment, followed up with comparable neonatal and infant medical care provided by the collaborating hospitals. Pregnancies in the underclass are typically without these medical advantages.

3. Certain environmental factors that in recent years have been studied in relation to IQ, such as nutrition, breast feeding, fetal alcohol syndrome, and drug abuse, were not considered in the Collaborative Project conducted three decades ago. The causal role of these factors should be examined, as should the increasing incidence of premature delivery and low birth weight. The latter variables are in fact the strongest correlates of low IQ.

Low Birth Weight (LBW). Infant mortality can be viewed as the extreme point on a continuum of pathology and reproductive casualty. The rate of neonatal and infant mortality in a particular population, therefore, serves as an indicator of other sublethal but nevertheless damaging health conditions, which negatively affect children's mental development. While the infant mortality rate has steadily declined in the population as a whole over the last several decades, it is still about twice as great in the U.S. black population (17.6 per 1,000 live births) as in the white population (8.5 per 1,000). Other minority populations differ only slightly from whites; of the groups with lower SES than the white average (such as Hispanics, American Indians, and native Alaskans) the infant mortality rate averages about 8.6 per 1,000. Asians have by far the lowest average, about 4.3 per 1 ,000.

LBW is defined as a birth weight under 2,500 grams (5.5 pounds). It represents a region on the risk continuum of which infant death is the end point. Therefore, the rates of LBW and of infant mortality are highly correlated across different subpopulations. Although premature birth incurs its own risks for the neonate's development, it is not the same as LBW, because a premature baby may have normal weight for its gestational age. LBW also occurs in full-term babies, who are thereby at increased risk for retarded mental development and for other developmental problems, such as behavioral adjustment, learning disabilities, and poor scholastic performance. Throughout the full range of LBW, all of these developmental risks increase as birth weight decreases. For present purposes, it is important to note that a disproportionate number of the babies born to black women are either premature or of LBW. Although black women have about 17 percent of all the babies born in the United States today, they have about 32 percent of the LBW babies.

The mother's age is the strongest correlate of LBW and is probably its chief causal factor. Teenage mothers account for about one-fourth of LBW babies. Even teenage girls under age eighteen who have had proper health care during pregnancy are twice as likely to have premature or LBW babies as women in their twenties. One suggested explanation is that teenage girls are still in their growing period, which causes some of the nutrients essential for normal development to be diverted from the growing fetus to the growing mother. In addition to teenage pregnancy, other significant correlates of LBW are unmarried status, maternal anemia, substance abuse of various kinds, and low educational levels. SES per se accounts for only about 1 percent of the total variance in birth weight, and race (black/white) has a large effect on birth weight independently of SES. Most of the W-B difference in birth weight remains unaccounted for by such variables as SES, poverty status, maternal age, and education. Prenatal medical care, however, has a small effect.

LBW, independently of SES, is related to low maternal IQ. Controlling for IQ reduces the B-W disparity in the percentage of LBW babies by about one-half. But even college-educated black women have higher rates of LBW babies and therefore also higher rates of infant mortality than occur for white women of similar educational background (10.2 per thousand vs. 5.4 per thousand live births). When black babies and white babies, both born to college-educated parents, are statistically equated for birth weight, they have the same mortality rates in the first year of life. In the general population, however, black infants who are not of LBW have a mortality rate almost twice that of white infants.

The cause of the high rate of LBW (and the consequently higher infant mortality rate) in the black population as compared with other racial or ethnic groups, including those that are less advantaged than blacks, remains a mystery. Researchers have been able to account for only about half of the disparity in terms of the combined obvious factors such as poverty, low levels of SES, education, health and prenatal care, and mother's age. The explanations run the gamut from the largely genetic to the purely environmental. Some researchers regard LBW as an inherent, evolved, genetic racial characteristic. Others have hypothesized that black mothers may have subtle health problems that span generations, and some have suggested subtle but stressful effects of racism as a cause.

Since the specific causes of LBW largely remain unidentified while the survival rate of LBW babies has been increasing over the past 20 years, researchers are now focusing on ways to mitigate its risks for developmental disabilities and to enhance the cognitive and behavioral development of LBW babies. The experimental treatment was highly similar to that provided in the Abecedarian Project described in Chapter 10 (pp. 342-44). The largest program of this kind, conducted with nearly one thousand LBW infants in eight states, showed large Stanford-Binet IQ gains (compared against a control group) for LBW children when they were tested at thirty-six months of age. The heavier LBW probands (BW between 2,001 and 2,500 grams) scored an average of 13.2 IQ points above the untreated control group (98.0 vs. 84.8); the lighter probands (<2,000 grams) scored 6.6 IQ points above the controls (91 vs 84.4). Because IQ measured at thirty-six months is typically unstable, follow-up studies are crucial to determine if these promising IQ gains in the treated group would persist into the school years. The data obtained in the first follow-up, conducted when the children were five years of age, show that the apparent initial gain in IQ had not been maintained; the intervention group scored no higher than the control group. There was a further follow-up at age eight, but its results have not yet been reported.

A study of forty-six LBW black and forty-six LBW white children matched for gestational age and birth weight (all between 1,000 and 2,500 grams and averaging 1,276 grams for blacks and 1,263 grams for whites) showed that when the degree of LBW and other IQ-related background variables were controlled, the W-B IQ difference, even at three years of age, was nearly the same as that found for the general population. None of the LBW children in these selected samples had any chronic illness or neurological abnormality; all were born to mothers over eighteen years of age and had parents who were married. The black mothers and white mothers were matched for educational level. (Black mothers actually had slightly more education than white mothers, although the difference was statistically insignificant, t < 1). When the children were tested at thirty-three to thirty-four months, the mean Stanford-B met IQ of the black and the white groups was 90 and 104, respectively, a difference of 1s. In the same study, groups of middle class black and white children of normal birth weight and gestational age, matched on maternal education, had a mean Stanford-Binet IQ of ninety-seven and 111, respectively (a 1.2s difference).

Nutrition. A most remarkable study conducted at Cambridge University showed that the average IQ of preterm, LBW babies was strongly influenced by whether the babies received mother's milk or formula while in hospital. The probands were 300 babies who weighed under 1,850 grams at birth. While in hospital, 107 of the babies received formula, and 193 received mother's milk. The effects of breast feeding per se were ruled out (at least while the babies were in hospital), as all of the babies were fed by tube. At 7.5 to eight years of age, WISC-R IQs were obtained for all 300 children. Astonishingly, those who had received maternal milk outscored those who had been formula-fed by 10.2 IQ points (103.0 vs. 92.8). The Verbal and Performance scales showed identical effects. After a regression analysis that adjusted for confounding factors (SES, mother's age and education, birth weight, gestational age, birth rank, sex, and number of days in respirator), the difference between the two groups was still a highly significant 8.3 IQ points. Not all of the group who received mother's milk had it exclusively; some received variable proportions of mother's milk and formula. It was therefore possible to perform a critical test of whether the effect was genuinely attributable to the difference between mother's milk and formula or was attributable to some other factor. There was in fact a significant linear dose-response relationship between the amount of mother's milk the babies received and IQ at age 7.5 to eight years. Whether the milk was from the baby's own mother or from donors, it had a beneficial effect on IQ compared against the formula. The study did not attempt to determine whether mother's milk has a similarly advantageous effect for babies who are full-term and of normal birth weight.

The results, however, would seem to be highly relevant to the IQ of black children in contemporary U.S. for two reasons: (1) as was already pointed out, black infants are much more frequently of LBW than are those of other racial/ethnic groups, and (2) they are much less frequently breast fed. Surveys of the National Center for Health Statistics show that, as of 1987, 61.1 percent of non-Hispanic white babies and 25.3 percent of non-Hispanic black babies are breast fed. Black women who breast feed also end nursing sooner than do white mothers. These data suggest that some part of the average W-B IQ difference may be attributable to the combined effects of a high rate of LBW and a low frequency of breast feeding. Nationwide in the 1940s and 1950s, breast feeding declined markedly to less than 30 percent, as greater numbers of women entered the work force. But since the late 1950s there has been an overall upward trend in the percentage of babies who are breast fed, now exceeding 60 percent.

The practice of breast feeding itself is positively correlated with SES, maternal age and education, and, interestingly, with birth weight. The frequency of breast feeding for LBW babies (<2,500 grams) is only 38.4 percent as against 56.1 percent for babies of normal birth weight (>2,500 grams). But as regards mental development it is probably the LBW babies that stand to benefit the most from mother's milk. Human milk apparently contains factors that affect nervous system development, probably long-chain lipids, hormones, or other nutrients involved in brain growth, that are not present in formulas.

More generally, Eysenck has hypothesized that nutritional deficiencies may be major nongenetic cause of the W-B IQ difference and that research should be focused on dietary supplements to determine their effect on children's IQ. He is not referring here to the type of malnutrition resulting from low caloric intake and insufficient protein, which is endemic in parts of the Third World but rare in the United States. Rather, he is referring to more or less idiosyncratic deficiencies associated with the wide range of individual differences in the requirements for certain vitamins and minerals essential for optimal brain development and cognitive functions. These individual differences can occur even among full siblings reared together and having the same diet. The dietary deficiency in these cases is not manifested by the gross outward signs of malnutrition seen in some children of Third World countries, but can only be diagnosed by means of blood tests. Dietary deficiencies, mainly in certain minerals and trace elements, occur even in some middle-class white families that enjoy a normally wholesome diet and show no signs of malnutrition. Blood samples were taken from all of the children in such families prior to the supplementation of certain minerals to the diet and later analyzed. They revealed that only those children who showed a significant IQ gain (twice the test's standard error of measurement, or nine IQ points) after receiving the supplements for several months previously showed deficiencies of one or more of the minerals in their blood. The children for whom the dietary supplement resulted in IQ gains were called "responders." The many children who were nonresponders showed little or no blood evidence of a deficiency in the key nutrients. Most interesting from a theoretical standpoint is that the IQ gains showed up on tests of fluid g (Gf), which measures immediate problem-solving ability, but failed to do so on tests of crystallized g (Gc), such as general information and vocabulary, which measure the past learning that had taken place before dietary supplements were begun. Eysenck believes it is more likely that a much larger percentage of black children than of white children have a deficiency of the nutritional elements that, when supplemented in the diet, produce the observed gain in Gf, which eventually, of course, would also be reflected in Gc through the child's improved learning ability. This promising hypothesis, which has not yet been researched with respect to raising black children's level of g, is well worth studying.

Drug Abuse during Pregnancy. Many drugs can be more damaging to the developing fetus than to an adult, and drug abuse takes a higher toll on the mental development of newborns in the underclass than it does in the general population. Among all drugs, prenatal exposure to alcohol is the most frequent cause of developmental disorders, including varying degrees of mental retardation. Fetal alcohol syndrome (FAS), a severe form of prenatal damage caused by the mother's alcohol intake, is estimated to affect about three per 1,000 live births. The signs of FAS include stunted physical development and characteristic facial features, besides some degree of behavioral impairment -- at school age about half of such children are diagnosed as mentally retarded or as learning disabled. The adverse effect of prenatal exposure to alcohol on the infant's later mental development appears to be a continuous variable; there is no safe threshold of maternal alcohol intake below which there is zero risk to the fetus. Therefore the U.S. Surgeon General has recommended that women not drink at any time during pregnancy. Just how much of the total population variance in IQ might be attributed to prenatal alcohol is not known, but in the underclass segment of the population its effect, combined with other microenvironmental factors that lower IQ, is apt to be considerable.

After alcohol, use of barbiturates, or sedatives like drugs, by pregnant women is the most prevalent source of adverse effects on their children's IQ. Between 1950 and 1970, an estimated twenty-two million children were born in the United States to women who were taking prescribed barbiturates. Many others, without prescription, abused these drugs. Two major studies were conducted in Denmark to determine the effect of phenobarbital, a commonly used barbiturate, on the adult IQ of men whose mothers had used this drug during pregnancy. The men's IQs were compared with the IQs of controls matched on ten background variables that are correlated with IQ, such as proband's age, family SES when the probands were infants, parents' ages, whether the pregnancy was "wanted" or "not wanted," etc. Further control of background variables was achieved statistically by a multiple regression technique. In the first study, IQ was measured by the Wechsler Adult Intelligence Scale (WAIS), an individually administered test; the second study used the Danish Military Draft Board Intelligence Test, a forty-five-minute group test. In both studies the negative effect of prenatal phenobarbital on adult IQ, after controlling for background variables, was considerable. In the authors' words: "The individuals exposed to phenobarbital are not mentally retarded nor did they have any obvious physical abnormalities. Rather, because of their exposure more than 20 years previously, they ultimately test at approximately 0.5 SD or more lower on measured intelligence than otherwise would have been expected. Analysis of various subclasses of the total sample showed that the negative drug exposure effect was greater among those from lower SES background, those exposed in the third trimester and earlier, and the offspring of an unwanted pregnancy.

AD HOC THEORIES OF THE WHITE-BLACK IQ DIFFERENCE

The totality of environmental factors now known to affect IQ within either the white or the black population taken together cannot account for a larger amount of the total variance between groups than does the default hypothesis. The total between-populations variance accounted for by empirically demonstrable environmental factors does not exceed 20 to 30 percent. According to the default hypothesis, the remaining variance is attributable to genetic factors. But one can still eschew genetic factors and instead hypothesize a second class of nongenetic factors to explain the observed differences -- factors other than those already taken into account as sources of nongenetic variance within groups. However, exceptionally powerful effects would have to be attributed to these hypothesized nongenetic factors if they are to explain fully the between-groups variance that the default hypothesis posits as genetic.

The explanations so far proposed to account for so large a part of the IQ variance in strictly nongenetic terms involve subtle factors that seem implausible in light of our knowledge of the nature and magnitude of the effects that affect IQ. Many researchers in the branches of behavioral science related to this issue, as opposed to journalists and commentators, are of the opinion that the W-B difference in IQ involves genetic factors. A questionnaire survey conducted in 1987 solicited the anonymous opinions of 661 experts, most of them in the fields of differential psychology, psychometrics, and behavioral genetics. Here is how they responded to the question: "Which of the following best characterizes your opinion of the heritability of the black-white difference in IQ?"

15% said: The difference is entirely due to environmental variation.

1 % said: The difference is entirely due to genetic variation.

45% said: The difference is a product of both genetic and environmental variation.

24% said: The data are insufficient to support any reasonable opinion.

14% said: They did not feel qualified to answer the question.

Those behavioral scientists who attribute the difference entirely to the environment typically hypothesize factors that are unique to the historical experience of blacks in the United States, such as a past history of slavery, minority status, caste status, white racism, social prejudice and discrimination, a lowered level of aspiration resulting from restricted opportunity, peer pressure against "acting white," and the like. The obvious difficulty with these variables is that we lack independent evidence that they have any effect on g or other mental ability factors, although in some cases one can easily imagine how they might adversely affect motivation for certain kinds of achievement. But as yet no mechanism has been identified that causally links them to g or other psychometric factors. There are several other problems with attributing causality to this class of variables:

1. Some of the variables (e.g., a past history of slavery, minority or caste status) do not explain the W-B ls to 1.5s mean difference on psychometric tests in places where blacks have never been slaves in a nonblack society, or where they have never been a minority population, or where there has not been a color line.

2. These theories are made questionable by the empirical findings for other racial or ethnic groups that historically have experienced as much discrimination as have blacks, in America and other parts of the world, but do not show any deficit in mean IQ. Asians (Chinese, Japanese, East Indian) and Jews, for example, are minorities (some are physically identifiable) in the United States and in other countries, and have often experienced discrimination and even persecution, yet they perform as well or better on g-loaded tests and in g-loaded occupations than the majority population of any of the countries in which they reside. Social discrimination per se obviously does not cause lower levels of g. One might even conclude the opposite, considering the minority subpopulations in the United States and elsewhere that show high g and high g-related achievements, relative to the majority population.

3. The causal variable posited by these theories is unable to explain the detailed empirical findings, such as the large variability in the size of the W-B difference on various kinds of psychometric tests. As noted in Chapter 11, most of this variability is quite well explained by the modified Spearman hypothesis. It states that the size of the W-B difference on various psychometric tests is mainly related to the tests' g loadings, and the difference is increased if the test is also loaded on a spatial factor and it is decreased if the test is also loaded on a short-term memory factor. It is unlikely that broad social variables would produce, within the black and white populations, the ability to rank-order the various tests in a battery in terms of their loadings on g and the spatial and memory factors and then to distribute their effort on these tests to accord with the prediction of the modified Spearman hypothesis. (Even Ph.D. psychologists cannot do this.) Such a possibility is simply out of the question for three-year-olds, whose performance on a battery of diverse tests has been found to accord with Spearman's hypothesis (see Chapter 11, p. 385). It is hard to even imagine a social variable that could cause systematic variation in the size of the W-B difference across different tests that is unrelated to the specific informational or cultural content of the tests, but is consistently related to the tests' g loadings (which can only be determined by performing a factor analysis).

4. Test scores have the same validity for predicting educational and occupational performance for all American-born, English-speaking subpopulations whatever their race or ethnicity. Blacks, on average, do not perform at a higher level educationally or on the job, relative to other groups, than is predicted by g-loaded tests. An additional ad hoc hypothesis is required, namely, that the social variables that depress blacks' test scores must also depress blacks' performance on a host of nonpsychometric variables to a degree predicted by the regression of the nonpsychometric variables on the psychometric variables within the white population. This seems highly improbable. In general, the social variables hypothesized to explain the lower average IQ of blacks would have to simulate consistently all of the effects predicted by the default hypothesis and Spearman's hypothesis. To date, the environmental theories of the W-B IQ difference put forward have been unable to do this. Moreover, it is difficult or impossible to perform an empirical test of their validity.

A theory that seems to have gained favor among some social anthropologists is the idea of "caste status" put forth by the anthropologist John Ogbu. He states the key point of his theory as follows: "The people who have most difficulty with IQ tests and other forms of cognitive tasks are involuntary or nonimmigrant minorities. This difficulty arises because their cultures are not merely different from that of the dominant group but may be in opposition to the latter. Therefore, the tests acquire symbolic meanings for these minorities, which cause additional but as yet unrecognized problems. It is more difficult for them to cross cognitive boundaries.

Ogbu' s answer to criticism number 2 (above) is to argue that cultural factors that depress IQ do so only in the case of involuntary or nonimmigrant minorities and their descendants. In the United States this applies only to blacks (who were brought to America involuntarily to be sold as slaves) and native Americans (who score, on average, intermediate between blacks and whites on tests of fluid g). This theory does not account for the relatively high test scores and achievements of East Indians in Africa, whose ancestors were brought to Africa as indentured laborers during the nineteenth century, but Ogbu could reply that the indentured Indians were not truly involuntary immigrants. American blacks, in Ogbu's theory, have the status of a caste that is determined by birth and from which there is no mobility. Lower-caste status, it is argued, depresses IQ. Ogbu cites as evidence the Harijans (untouchables) of India and the Burakumi in Japan as examples. (The Burkumi constitute a small subpopulation of Asian origin that engages in work the Japanese have traditionally considered undesirable, such as tanning leather.) Although it is true that these "lower-caste" groups generally do have lower test scores and perform less well in school than do higher-status groups in India or Japan, the body of psychometric evidence is much less than that for American blacks. We know hardly anything regarding the magnitude or psychometric nature or the degree of genetic selection for g in the origins of these caste-like groups in India and Japan.

Ogbu also argues that conventional IQ tests measure only those types of cognitive behavior that are culturally valued by Western middle-class societies, and IQ tests therefore inevitably discriminate against minorities within such societies. But since such tests have equal predictive validity for blacks and whites, this would have to imply that performance on the many practical criteria predicted by the tests is also lowered by involuntary but not voluntary minority status. According to Ogbu, the "Western intelligence" measured by our psychometric tests represents only a narrow set of specialized cognitive abilities and skills. These have been selected on the basis of Western values from the common species pool of capabilities for adaptation to specific environmental circumstances. It logically follows, then, that the g factor and the spatial factor themselves represent specialized Western cognitive skills. The question that Ogbu neither asks nor answers is why this set of Western-selected abilities has not been acquired to the same degree by a population of African descent that has been exposed to a Western society for many generations, while first-generation immigrants and refugees in America who came from the decidedly non-Western Oriental and East Indian cultures soon perform on a par with the dominant population of European descent.

A similar view of racial and ethnic IQ differences has been expressed by the economist Thomas Sowell. He does not offer a formal or explanatory theory, but rather a broad analogy between American blacks and other ethnic and national groups that have settled in the United States at different times in the past. Sowell points out that many immigrant groups performed poorly on tests at one time (usually soon after their arrival in America) and had relatively low educational standing, which limited their employment to low-paying jobs. The somewhat lower test scores of recent immigrants are usually attributable to unfamiliarity with the English language, as evidenced by their relatively superior performance on nonverbal tests. Within a single generation, most immigrant groups (typically those from Europe or Asia) performed on various intellectual criteria at least on a par with the established majority population. Sowell views the American black population as a part of this same general phenomenon and expects that in due course it, too, will rise to the overall national level. Only one generation, he points out, has grown up since inception of the Civil Rights movement and the end of de jure segregation.

But Sowell's analogy between blacks and other immigrant groups seems strained when one examines the performance of comparatively recent arrivals from Asia. The W-B difference in IQ (as distinguished from educational and socioeconomic performance) has not decreased significantly since World War I, when mental tests were first used on a nationwide scale. On the other hand, the children of certain recent refugee and immigrant groups from Asia, despite their different language and culture, have scored as high as the native white population on nonverbal IQ tests and they often exceed the white average in scholastic performance. Like Ogbu, Sowell does not deal with the detailed pattern of psychometric differences between blacks and whites. He attributes the lower black performance on tests involving abstract reasoning ability to poor motivation, quoting a statement by observers that black soldiers tested during World War I tended to "lapse into inattention and almost into sleep" during abstract tests." Spearman, to the contrary, concluded on the basis of factor analyzing more than 100 varied tests that "abstractness" is one of the distinguishing characteristics of the most highly g-loaded tests.

Recently, a clearly and specifically formulated hypothesis, termed stereotype threat, has been proposed to explain at least some part of the black shortfall on cognitive tests. It should not be classed as a Factor X theory, because specific predictions can be logically derived from the hypothesis and tested empirically. Its authors have done so, with positive, though somewhat limited, results.

Stereotype threat is defined as the perceived risk of confirming, as self-characteristic, a negative stereotype about one's group. The phenomenon has been demonstrated in four independent experiments. Groups of black and white undergraduates at Stanford University took mentally demanding verbal tests under preliminary instructions that were specifically intended to elicit stereotype threat. This was termed the diagnostic condition, since the instructions emphasized that the student's score (which they would be given) would be a true indicator of their verbal ability and of their limitations. Their test performance was statistically compared with that of a control group, for whom the preliminary instructions were specifically intended to minimize stereotype threat by making no reference to ability and telling the subjects that the results were being used only for research on difficult verbal problems. This was termed the nondiagnostic condition. Under both conditions, subjects were asked to do their best. The theoretically predicted outcome is that the difference in test performance between the diagnostic and the nondiagnostic conditions will be greater for blacks than for whites. With the black and white groups statistically equated for SAT scores, the hypothesis was generally borne out in the four studies, although the predicted interaction (race X condition) in two of the experiments failed to reach the conventional 5 percent level of confidence.

Standard deviations were not reported for any of the performance measures, so the effect size of the stereotype threat cannot be precisely determined. From the reported analysis of variance, however, I have estimated the effect size to be about 0.3s, on average. Applied to IQ in the general population, this would be equivalent to about five IQ points. Clearly, the stereotype threat hypothesis should be further studied using samples of blacks and whites that are less highly selected for intellectual ability than are the students at Stanford. One wonders if stereotype threat affects the IQ scores even of preschool-age children (at age three), for whom the W-B difference is about ls. Do children at this age have much awareness of stereotypes?

In fact, the phenomenon of stereotype threat can be explained in terms of a more general construct, test anxiety, which has been studied since the early days of psychometrics. Test anxiety tends to lower performance levels on tests in proportion to the degree of complexity and the amount of mental effort they require of the subject. The relatively greater effect of test anxiety in the black samples, who had somewhat lower SAT scores, than the white subjects in the Stanford experiments constitutes an example of the Yerkes-Dodson law. It describes the empirically observed nonlinear relationship between three variables: (1) anxiety (or drive) level, (2) task (or test) complexity and difficulty, and (3) level of test performance. According to the Yerkes-Dodson law, the maximal test performance occurs at decreasing levels of anxiety as the perceived complexity or difficulty level of the test increases (see Figure 12.14). If, for example, two groups, A and B, have the same level of test anxiety, but group A is higher than group B in the ability measured by the test (so group B finds the test more complex and difficult than does group A), then group B would perform less well than group A. The results of the Stanford studies, therefore, can be explained in terms of the Yerkes-Dodson law, without any need to postulate a racial group difference in susceptibility to stereotype threat or even a difference in the level of test anxiety. The outcome predicted by the Yerkes-Dodson law has been empirically demonstrated in large groups of college students who were either relatively high or relatively low in measured cognitive ability; increased levels of anxiety adversely affected the intelligence test performance of low ability students (for whom the test was frustratingly difficult) but improved the level of performance of high-ability students (who experienced less difficulty).

This more general formulation of the stereotype threat hypothesis in terms of the Yerkes-Dodson law suggests other experiments for studying the phenomenon by experimentally manipulating the level of test difficulty and by equating the tests' difficulty levels for the white and black groups by matching items for percent passing the item within each group. Groups of blacks and whites should also be matched on true-scores derived from g-loaded tests, since equating the groups statistically by means of linear covariance analysis (as was used in the Stanford studies) does not adequately take account of the nonlinear relationship between anxiety and test performance as a function of difficulty level.

Strong conclusions regarding the stereotype threat hypothesis are unwarranted at present, as the total evidence for it is based on fairly small samples of high-ability university students, with results of marginal statistical significance. Research should be extended to more representative samples of the black and white populations and using standard mental test batteries under normal testing conditions except, of course, for the preliminary instructions needed to manipulate the experimental variable (that is, the inducement of stereotype threat). Further, by conducting the same type of experiment using exclusively white (or black) subjects, divided into lower- and higher-ability groups, it might be shown that the phenomenon attributed to stereotype threat has nothing to do with race as such, but results from the interaction of ability level with test anxiety as a function of test complexity.

In contrast to these various ad hoc hypotheses intended to explain the average W-B population difference in cognitive ability, particularly g, the default hypothesis has the attributes of simplicity, internal coherence, and parsimony of explanation. Further, it does not violate Occam's razor by treating one particular racial population as a special case that is culturally far more different from any other populations. The size of the cultural difference that needs to be hypothesized by a purely environmental theory of the W-B difference is far greater than the relatively small genetic difference implied by our evolution from common human ancestors.

The default hypothesis explains differences in g between populations in terms of quantitative variation in the very same genetic and environmental factors that influence the neural substrate of g and cause individual variation within all human populations. This hypothesis is consistent with a preponderance of psychometric, behavior-genetic, and evolutionary lines of evidence. And like true scientific hypotheses generally, it continually invites empirical refutation. It should ultimately be judged on the same basis, so aptly described by the anthropologist Owen Lovejoy, for judging the Darwinian theory of human evolution: "Evolutionary scenarios must be evaluated much in the same way that jury members must judge a prosecutor's narrative. Ultimately they must make their judgment not on the basis of any single fact or observation, but on the totality of the available evidence. Rarely will any single item of evidence prove pivotal in determining whether a prosecutor's scenario or the defense's alternative is most likely to be correct. Many single details may actually fail to favor one scenario over another. The most probable account, instead, is the one which is the most internally consistent -- the one in which all the facts mesh together most neatly with one another and with the motives in the case. Of paramount importance is the economy of explanation. There are always alternative explanations of any single isolated fact. The greater the number of special explanations required in a narrative, however, the less probable its accuracy. An effective scenario almost always has a compelling facility to explain a chain of facts with a minimum of such special explanations. Instead the pieces of the puzzle should fall into p1ace."

Notes:

4. One often hears it said that the genetic differences within racial groups (defined as statistically different breeding populations) is much greater than the differences between racial groups. This is true, however, only if one is comparing the range of individual differences on a given characteristic (or on a number of characteristics) within each population with the range of the differences that exist between the means of each of the separate populations on the given characteristic. In fact, if the differences between the means of the various populations were not larger than the mean difference between individuals within each population, it would be impossible to distinguish different populations statistically. Thinking statistically in terms of the analysis of variance, if we obtained a very large random sample of the world's population and computed the total variance (i.e., the total sum of squares based on individuals) of a given genetic character, we would find that about 85 percent of the total genetic variance exists within the several major racial populations and 15 percent exists between these populations. But when we then divide the sum of squares (SS) between populations by its degrees of freedom to obtain the mean square (MS) and we do the same for the sum of squares within populations, the ratio of the two mean squares, i.e., Between MS/Within MS, (known as the variance ratio, or F ratio, named for its inventor, R.A. Fischer) would be an extremely large value and, of course, would be highly significant statistically, thus confirming the population differences as an objective reality.

5. Among the genetically conditioned physical differences in central tendency, nearly all attributable to natural selection, that exist between various contemporary breeding populations in the world are: pigmentation of skin, hair, and eyes, body size and proportions, endocranial capacity, brain size, cephalic index (100 X head-width/head-length), number of vertebrae and many other skeletal features, bone density, hair form and distribution, size and shape of genitalia and breasts, testosterone level, various facial features, interpupillary distance, visual and auditory acuity, color blindness, myopia (nearsightedness), number and shape of teeth, fissural patterns on the surfaces of teeth, age at eruption of permanent teeth, consistency of ear wax, blood groups, blood pressure, basal metabolic rate, finger and palm prints, number and distribution of sweat glands, galvanic skin resistance, body odor, body temperature, heat and cold tolerance, length of gestation period, male/female birth ratio, frequency of dizygotic twin births, degree of physical maturity at birth, physical maturation rate, rate of development of alpha (brain) waves in infancy, congenital anomalies, milk intolerance (after childhood), chronic and genetic diseases, resistance to infectious diseases. Modern medicine has recognized the importance of racial differences in many physical characteristics and in susceptibilities to various diseases, chronic disorders, birth defects, and the effective dosage for specific drugs. There are textbooks that deal entirely with the implications of racial differences for medical practice. Forensic pathologists also make extensive use of racial characteristics for identifying skeletal remains, body parts, hair, blood stains, etc.

6. Two of the most recent and important studies of genetic distances and human evolution are: (a) Cavalli-Sforza et al., 1994; (b) Nei & Roychoudhury, 1993. Although these major studies measured genetic distances by slightly different (but highly correlated) quantitative methods based on somewhat different selections of genetic polymorphisms, and they did not include all of the same subpopulations, they are in remarkably close agreement on the genetic distances between the several major clusters that form what are conventionally regarded as the world's major racial groups.