Data defines the model by dint of genetic programming, producing the best decile table.


Predictor Variable Importance:
Multicollinearity is Not a Problem for a Genetic Regression Model

Bruce Ratner, Ph.D.
Live chat by Boldchat
Live chat by Boldchat

The unfailing question upon discussing the completion of a regression model: Which are the important predictor variables in rank order? The purpose of this article is to present an unparalleled feature of the nonstatistical, genetic regression GenIQ Model© that answers the question by providing: A virtually unbiased ranking of the relationship between each predictor variable with the target variable, accounting for all predictor variables jointly considered. As a consequence of being a nonstatistical model, the GenIQ Model has no need to measure the interrelationships among the predictor variables. Therefore, multicollinearity is not a problem for the genetic regression model. Multicollinearity refers to any linear (perfect or nearly perfect) relationship among two or more predictor variables in a regression model. In stark contrast, the statistical regression model cannot provide such an unbiased ranking because of the effects of multicollinearity. Due to the inevitable presence of multicollinearity, even in the best of regression models with the most modest degree of multicollinearity, the estimate (regression coefficient) of a predictor variable's importance (effect) on the target variable, controlling for the other predictor variables, tends to be less precise than if the predictor variables were uncorrelated with one another [1]. I use GenIQ in several examples comparing the logistic and ordinary regression coefficients, and the genetic variable importance ranking.

For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com.