Data defines the model by dint of genetic programming, producing the best decile table.


What is the GenIQ Model?
Bruce Ratner, Ph.D.

The data analyst typically approaches a problem directly with an "inflexible" designed procedure specifically for that purpose. For example, the statistical problem of prediction of a continuous target variable (e.g., sales or profit) is solved by the "old" classical standard ordinary least squares (OLS) regression model. This is in stark contrast to the newer machine learning approach that is a "flexible" nonparametric, assumption-free procedure that lets the data define the form of the model itself. The working assumption that today’s (big) data fit the OLS model – which was formulated within the small-data setting of the day over 200 years ago – is not tenable. A flexible, any-size data model that is self-defining clearly offers a potential for building a reliable, highly predictive model, which was unimaginable two centuries ago. The statistical problem of classification of a binary target variable (e.g., yes-no; 1-0) is solved by the logistic regression model (circa 1944). The commentary for the OLS regression model applies to the logistic regression model.

The GenIQ Model© is a machine learning alternative model to the statistical ordinary least squares and logistic regression models. GenIQ “lets the data define the model” – automatically 1) data mines for new variables, 2) performs variable selection, and 3) specifies the model – so as to "optimize the decile table," i.e., to fill the upper deciles with as much profit/many responses as possible. GenIQs fitness function is the decile table, which is maximized by the Darwinian inspired machine-learning paradigm of genetic programming (GP). Operationally, optimizing the decile table is creating the best possible descending ranking of the target variable (outcome) values. Thus, GenIQs prediction is that of identifying individuals, who are most likely to least likely to respond (for a binary outcome), or who contribute large profits to small profits (for a continuous outcome). Put differently, GenIQ seeks to maximize cum lift, a measure of model predictiveness of identifying the upper performing individuals often displayed in a decile table.

This textual matter is an intro-info about the GenIQ Model, a flexible, any-size data method (with unique scalability) that lets the data, exclusive of anything else, defines the model. Specifically, the GenIQ Model automatically and simultaneously performs the trinity of analysis and modeling techniques: 1) selecting important original variables, 2) finding patterns (data mining) within the data by constructing new important variables from the original variables, and 3) formulating a mathematical equation (model) based on the best set of original and constructed variables. GenIQ is based on the machine learning genetic paradigm inspired by Darwin’s Principle of Survival of the Fittest. It offers a clear advantage over current statistical methods, whose performance is dependent upon theoretical assumptions, predefined model formulations, and data-type restrictions. Moreover, GenIQ offers both a time-advantage and an intelligence-advantage over regression-based methods, as the latter require human intervention to perform the trinity of techniques.



For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com.
Sign-up for a free GenIQ webcast: Click here.