Data defines the model by dint of genetic programming, producing the best decile table.


How Large a Sample is Required to Build a Database Response Model?
Bruce Ratner, Ph.D.

Statistical consultants are frequently asked: “How large a sample is required to build a database response model?” This simply stated question is not so easily answered as the required input for the sample size calculation is arbitrary and sometimes not known. The “right” sample size is based on several concepts and conditions, such as the arbitrary Type I and Type II errors, the effect size, the number of predictor variables in the model, and the average correlation among the predictor variables. Because the latter two conditions are virtually never known before building a database response model, the calculated sample size is a guesstimate that is too large for most marketing solicitation budgets. Notwithstanding the effects of the input data used, the traditional sample size calculation paradigm is about testing for statistical significance, not practical importance. The latter is more in line with the building of a database response model; namely, what is the usefulness of the model predictions of rank-order likelihood of response? Thus, I raise the appropriate question: “How large a gain (increase in response) is expected from a database response model built with the sample size at hand, or with the sample size permitted by the budget?” The purpose of this article is to delate the original question, address the newly posited one, and present a methodology for answering the latter.

For more information about this article, call me at 516.791.3544, or e-mail, br@dmstat1.com.
My publisher owns the copyright of the article, about which this abstract addresses. The article will appear in my forthcoming book.
My publisher has granted me permission to discuss orally the article's content, but by no means provide an outline, draft or proof-ready of the article.


Sign-up for a free GenIQ webcast: Click here.