Tech News
← Back to articles

Everything Is Correlated

read original related products more articles

“Why summaries of research on psychological theories are often uninterpretable”, Meehl (also discussed in Cohen’s paper “The Earth is Round (p < 0.05)”):

Problem 6. Crud factor: In the social sciences and arguably in the biological sciences, “everything correlates to some extent with everything else.” This truism, which I have found no competent psychologist disputes given 5 minutes reflection, does not apply to pure experimental studies in which attributes that the subjects bring with them are not the subject of study (except in so far as they appear as a source of error and hence in the denominator of a significance test).6 There is nothing mysterious about the fact that in psychology and sociology everything correlates with everything. Any measured trait or attribute is some function of a list of partly known and mostly unknown causal factors in the genes and life history of the individual, and both genetic and environmental factors are known from tons of empirical research to be themselves correlated. To take an extreme case, suppose we construe the null hypothesis literally (objecting that we mean by it “almost null” gets ahead of the story, and destroys the rigor of the Fisherian mathematics!) and ask whether we expect males and females in Minnesota to be precisely equal in some arbitrary trait that has individual differences, say, color naming. In the case of color naming we could think of some obvious differences right off, but even if we didn’t know about them, what is the causal situation? If we write a causal equation (which is not the same as a regression equation for pure predictive purposes but which, if we had it, would serve better than the latter) so that the score of an individual male is some function (presumably nonlinear if we knew enough about it but here supposed linear for simplicity) of a rather long set of causal variables of genetic and environmental type X 1 , X 2 , … X m . These values are operated upon by regression coefficients b 1 , b 2 , …b m .

…Now we write a similar equation for the class of females. Can anyone suppose that the beta coefficients for the two sexes will be exactly the same? Can anyone imagine that the mean values of all of the Xs will be exactly the same for males and females, even if the culture were not still considerably sexist in child-rearing practices and the like? If the betas are not exactly the same for the two sexes, and the mean values of the Xs are not exactly the same, what kind of Leibnitzian preestablished harmony would we have to imagine in order for the mean color-naming score to come out exactly equal between males and females? It boggles the mind; it simply would never happen. As Einstein said, “the Lord God is subtle, but He is not malicious.” We cannot imagine that nature is out to fool us by this kind of delicate balancing. Anybody familiar with large scale research data takes it as a matter of course that when the N gets big enough she will not be looking for the statistically-significant correlations but rather looking at their patterns, since almost all of them will be significant. In saying this, I am not going counter to what is stated by mathematical statisticians or psychologists with statistical expertise. For example, the standard psychologist’s textbook, the excellent treatment by Hays (1973, page 415), explicitly states that, taken literally, the null hypothesis is always false.

20 ago David Lykken and I conducted an exploratory study of the crud factor which we never published but I shall summarize it briefly here. (I offer it not as “empirical proof”—that H 0 taken literally is quasi-always false hardly needs proof and is generally admitted—but as a punchy and somewhat amusing example of an insufficiently appreciated truth about soft correlational psychology.) In , the University of Minnesota Student Counseling Bureau’s Statewide Testing Program administered a questionnaire to 57,000 high school seniors, the items dealing with family facts, attitudes toward school, vocational and educational plans, leisure time activities, school organizations, etc. We cross-tabulated a total of 15 (and then 45) variables including the following (the number of categories for each variable given in parentheses): father’s occupation (7), father’s education (9), mother’s education (9), number of siblings (10), birth order (only, oldest, youngest, neither), educational plans after high school (3), family attitudes towards college (3), do you like school (3), sex (2), college choice (7), occupational plan in 10 years (20), and religious preference (20). In addition, there were 22 “leisure time activities” such as “acting”, “model building”, “cooking”, etc., which could be treated either as a single 22-category variable or as 22 dichotomous variables. There were also 10 “high school organizations” such as “school subject clubs”, “farm youth groups”, “political clubs”, etc., which also could be treated either as a single ten-category variable or as 10 dichotomous variables. Considering the latter two variables as multichotomies gives a total of 15 variables producing 105 different cross-tabulations. All values of χ2 for these 105 cross-tabulations were statistically-significant, and 101 (96%) of them were significant with a probability of less than 10−6.

…If “leisure activity” and “high school organizations” are considered as separate dichotomies, this gives a total of 45 variables and 990 different crosstabulations. Of these, 92% were statistically-significant and more than 78% were significant with a probability less than 10−6. Looked at in another way, the median number of significant relationships between a given variable and all the others was 41 out of a possible 44!

We also computed MCAT scores by category for the following variables: number of siblings, birth order, sex, occupational plan, and religious preference. Highly significant deviations from chance allocation over categories were found for each of these variables. For example, the females score higher than the males; MCAT score steadily and markedly decreases with increasing numbers of siblings; eldest or only children are statistically-significantly brighter than youngest children; there are marked differences in MCAT scores between those who hope to become nurses and those who hope to become nurses aides, or between those planning to be farmers, engineers, teachers, or physicians; and there are substantial MCAT differences among the various religious groups. We also tabulated the 5 principal Protestant religious denominations (Baptist, Episcopal, Lutheran, Methodist, and Presbyterian) against all the other variables, finding highly significant relationships in most instances. For example, only children are nearly twice as likely to be Presbyterian than Baptist in Minnesota, more than half of the Episcopalians “usually like school” but only 45% of Lutherans do, 55% of Presbyterians feel that their grades reflect their abilities as compared to only 47% of Episcopalians, and Episcopalians are more likely to be male whereas Baptists are more likely to be female. 83% of Baptist children said that they enjoyed dancing as compared to 68% of Lutheran children. More than twice the proportion of Episcopalians plan to attend an out of state college than is true for Baptists, Lutherans, or Methodists. The proportion of Methodists who plan to become conservationists is nearly twice that for Baptists, whereas the proportion of Baptists who plan to become receptionists is nearly twice that for Episcopalians.

In addition, we tabulated the 4 principal Lutheran Synods (Missouri, ALC, LCA, and Wisconsin) against the other variables, again finding highly significant relationships in most cases. Thus, 5.9% of Wisconsin Synod children have no siblings as compared to only 3.4% of Missouri Synod children. 58% of ALC Lutherans are involved in playing a musical instrument or singing as compared to 67% of Missouri Synod Lutherans. 80% of Missouri Synod Lutherans belong to school or political clubs as compared to only 71% of LCA Lutherans. 49% of ALC Lutherans belong to debate, dramatics, or musical organizations in high school as compared to only 40% of Missouri Synod Lutherans. 36% of LCA Lutherans belong to organized non-school youth groups as compared to only 21% of Wisconsin Synod Lutherans. [Preceding text courtesy of D. T. Lykken.]

These relationships are not, I repeat, Type I errors. They are facts about the world, and with N = 57,000 they are pretty stable. Some are theoretically easy to explain, others more difficult, others completely baffling. The “easy” ones have multiple explanations, sometimes competing, usually not. Drawing theories from a pot and associating them whimsically with variable pairs would yield an impressive batch of H 0 -refuting “confirmations.”

Another amusing example is the behavior of the items in the 550 items of the MMPI pool with respect to sex. Only 60 items appear on the Mf scale, about the same number that were put into the pool with the hope that they would discriminate femininity. It turned out that over half the items in the scale were not put in the pool for that purpose, and of those that were, a bare majority did the job. Scale derivation was based on item analysis of a small group of criterion cases of male homosexual invert syndrome, a significant difference on a rather small N of Dr. Starke Hathaway’s private patients being then conjoined with the requirement of discriminating between male normals and female normals. When the N becomes very large as in the data published by Swenson, Pearson, and Osborne ( ; An MMPI Source Book: Basic Item, Scale, And Pattern Data On 50,000 Medical Patients. Minneapolis, MN: University of Minnesota Press.), approximately 25,000 of each sex tested at the Mayo Clinic over a period of years, it turns out that 507 of the 550 items discriminate the sexes. Thus in a heterogeneous item pool we find only 8% of items failing to show a significant difference on the sex dichotomy. The following are sex-discriminators, the male/female differences ranging from a few percentage points to over 30%:7

Sometimes when I am not feeling well I am cross.

... continue reading