|
Data defines the model by dint of genetic programming, producing the best decile table.
|
|
A Genetic Imputation Method for Database Modeling Bruce Ratner, Ph.D. |
|
The problem of modeling data with missing values is well known to data analysts. Data analysts know that almost all standard statistical modeling techniques perform complete-case analysis, which discards cases with at least one missing value. They make every effort to impute the missing values, but are mindful that their first intention can leave a meager complete-case sample. The implication of complete-case analysis is twofold. First, the results are marked by caution (i.e., potential prediction bias), as the complete-case sample is questionably representative of the population under consideration. Second, the model built on a complete-case sample has limited utility, as it only provides estimated target scores for individuals with complete data. Data analysts must assign extra-model scores for individuals with missing data if they want an entire database scored. This article presents a genetic-based assumption-free imputation method - the GenIQ Model© - for database modeling based on all-case analysis, which includes all cases regardless of the missingness. This method should be a welcomed entry in the data analysis arsenal for building a better database model, as it has the distinctive feature of softening the effects of missing data by minimizing likely prediction bias and maximizing the model's utility.
|
For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com. |
Sign-up for a free GenIQ webcast: Click here. |
|
|