
Data defines the model by dint of genetic programming, producing the best decile table.


Karl Pearson: Everybody Knows His Correlation Coefficient, but Not How “Close” the Binomial Distribution is to a Normal Distribution Bruce Ratner, Ph.D. 

Karl Pearson was born 150 years ago on March 27, 1857; he died April 27, 1936. He made important contributions to statistics; he is usually remembered for two pathbreaking achievements: his “productmoment” estimate of the correlation coefficient (dating from 1896), and the chisquare test (introduced in 1900). But, on the 150th anniversary of his birth, there is a small striking discovery he made in 1895 that is virtually unknown today, yet well worth knowing.
Everybody knows that the binomial distribution is “like” a normal distribution if the number of independent trials (n) is large and the probability of success (p) on a single trial is not too near 0 or 1. Everybody knows this because Abraham De Moivre proved it to be true in 1733. Accordingly, every student knows that for most practical purposes, if you want to calculate the probability that a binomial count X will fall between two limits, P[a le X le b], where le means “less than or equals,” you would be foolish to do other than use a normal approximation.
Because we are always reminded that this is an approximation, there is a nagging doubt as to how close the binomial and normal distributions really are. Pearson discovered something quite remarkable: There is a case where the agreement is much, much closer than anyone would have expected. In fact, if one particular definition of “agreement” is adopted, and if p = ½, the “agreement” is actually exact for all n (even for n =1), providing one minor fudge factor is allowed (replacing n+1 instead of n). Thus, Pearson discovered in 1895 that the normal and symmetric binomial distributions are more similar than De Moivre realized. I suspect most modern statisticians are equally unaware of this surprising identity, but now you know. (This abstract draws from Stigler, S., (2008) “Remembering Karl Pearson After 150 Years,” STATS, Issue 49, 34.)
Related Articles 1. The Correlation Coefficient: Definition 2. Genetic Data Mining Method for the Proper Use of the Correlation Coefficient 3. Calculating the Average Correlation Coefficient 4. The Correlation Coefficient: Its Values Range Between Plus/Minus 1, or Do They? 5. Variable Selection Methods in Regression: Many Statisticians Know Them, But Few Know They Produce Poorly Performing Models 6. Different Data, Identical Regression Models: Which Model is Better? 7. A Trilogy of “Item” Biographies of Our Favorite Statisticians 8. Statistical Terms: Who Coined Them, and When?

For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT1; or email at br@dmstat1.com. 
Signup for a free GenIQ webcast: Click here. 

