Data defines the model by dint of genetic programming, producing the best decile table.


Confusion Matrix: Perhaps Confusing, but Definitely Biased
Bruce Ratner, Ph.D.

The traditional statistical paradigm for building a binary classification model is: The data analyst fits the data to the logistic regression model, whose equation is the sum of weighted predictor variables, which are declared statistically significant. The weights (better known as regression coefficients) are the main appeal of the statistical paradigm, as they provide the key to interpreting what the equation means. The information needed to assess the goodness of a classification model exists within the confusion matrix, whose construction is part of the traditional three-step approach: 1) Construction of the 2x2 table of actual versus predicted outcomes - the confusion matrix itself; 2) Calculation of the six standard terms based on the confusion matrix; and, 3) Rote understanding and inseeing interpretation of the sextuplet terms. The latter is what gives the modifier confusion to the term matrix. This article is focused on the database-marketing logistic regression model; accordingly, I use the binary (dichotomous) target variable Response, which assumes 0 and 1. (The treatment of this topic can easily be extended to a polychotomous (multinomial) target variable.)

Regarding the first step, in which the matrix entries are imbued directly from the freshly built classification model, there is no mention in the literature about bias creeping in the process of the matrix formation. The purpose of this article is to reveal the (creepy) bias as a serious weakness in the traditional confusion matrix, and its affect on the effectiveness of the model’s classification predictive ability. As well, I present a new first-step that creates an enriched matrix that proposes an honest, less-bias matrix to potentially provide a better tabular display for assessing the effectiveness of the model's predictive ability, whether it is a logistic regression model, or any classification system.

If you would like a prompt-and-concise (not quick-and-dirty) overview of the confusion matrix bias, please email me.
BR-sig

For more information about this article, call me at 516.791.3544, or e-mail, br@dmstat1.com.
My publisher owns the copyright of the article, about which this abstract addresses. The article will appear in my forthcoming book.
My publisher has granted me permission to discuss orally the article's content, but by no means provide an outline, a draft or proof-ready of the article.

Sign-up for a free GenIQ webcast: Click here.