Data defines the model by dint of genetic programming, producing the best decile table.


Binary Logistic Regression: A Model-free Approach
 Bruce Ratner, Ph.D.

The statistical paradigm for binary (two outcomes) response modeling is: The data analyst fits the data to the presumedly true binary logistic regression model (LRM), whose form (equation) is the sum of weighted predictor variables. The weights (better known as regression coefficients) are the main appeal of the statistical paradigm, as they provide the key to interpreting what the equation means. The well-established LRM variable selection methodology, which identifies the predictor variables for the LRM, is the inherent weakness in the statistical paradigm. The variable selection is exclusive of the data analyst's will and ability for constructing new variables with potential predictive power (data mining).

The antithetical machine learning (ML) paradigm is: The data suggests the "true" model form (a computer program), as the ML automatically data mines for new variables, performs variable selection, and then specifies the model equation without being explicitly programmed. The strengths of the ML paradigm are its flexibility within a nonparametric, assumption-free openwork that accommodates big data, and its serviceability as a data mining tool. The weakness in the ML paradigm is the difficulty in interpreting the abstruse computer program; this surely has accounted for the limited use of ML methods.

The purpose of this article is to present the Genetic Logistic Regression (GLR) as an assumption-free, nonparametric model – model-free where the data defines the predictor variables and the model equation itself – based on the machine learning paradigm of genetic programming. The GLR – known as the GenIQ Model© – determines the best set of predictor variables based on a simultaneous and virtually unbiased assessment of all variables, an achievement not possible with the statistical binary logistic regression. For the most compelling comparative illustration of the binary logistic regression and the GLR click here.

For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com.
Sign-up for a free GenIQ webcast: Click here.