
Data defines the model by dint of genetic programming, producing the best decile table.


Statistical and MachineLearning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Third Edition 

June 7, 2017 by Chapman and Hall/CRC/Talyor&Francis Reference  662 Pages  200 B/W Illustrations ISBN 9781498797603  CAT# K30454
Features • One of only two books on big data on Intel's prestigous recommended reading list. • Provides stepbystep solutions to common problems facing data scientists, modelers, and marketers; other books typically provide outlinedsolutions. • Illustrations involve real problems, real data, and better solutions. • Uniquely introduces two new machinelearning methods specifically tailored to database assessment of optimal model performance. • Includes many new methodologies for unsolved, real problems along with corresponding SAS programs, easily converted into R and Python scripts.
Summary The third edition of a bestseller, Statistical and MachineLearning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machinelearning data mining. is a compilation of new and creative data mining techniques, which address the scalingup of the framework of classical and modern statistical methodology, for predictive modeling and analysis of big data. SMDM provides proper solutions to common problems facing the newly minted data scientist in the data mining discipline. Its presentation focuses on the needs of the data scientists (commonly known as statisticians, data miners and data analysts), delivering practical yet powerful, simple yet insightful quantitative techniques, most of which use the "old" statistical methodologies improved upon by the new machinelearning influence.
Table of Contents (click)
Preface to Third Edition Predictive analytics of big data has maintained a steady presence in the four years since the publication of the second edition. My decision to write this third edition is not a result of the success (units) of the second edition but is due to the countless positive feedback (personal correspondence from the readership) I have received. And, importantly, I have the need to share my work on problems that do not have widely accepted, reliable, or known solutions. As in the previous editions, John Tukey’s tenets, necessary to advance statistics, flexibility, practicality, innovation, and universality, are the touchstones of each chapter’s new analytic and modeling methodology.
My main objectives in preparing the third edition are to: 1. Extend the content of the core material by including strategies and methods for problems, which I have observed on the top of statistics on the table by reviewing predictive analytics conference proceedings and statistical modeling workshop outlines. 2. Reedit current chapters for improved writing and tighter endings. 3. Provide the statistical subroutines used in the proposed methods of analysis and modeling. I use Base SAS© and STAT/SAS. The subroutines are also available for downloading from my website: http://www.geniq.net. The code is easy to convert for users who prefer other languages.
I have added 13 new chapters (resulting in total 44 chapters, over 40% new material) that are inserted between the chapters of the second edition to yield the greatest flow of continuity of material. The titles of the new chapters are:
Chapter 2, Science Dealing with Data: Statistics and Data Science. Chapter 8, Market Share Estimation: Data Mining for an Exceptional Case. Chapter 11, Predicting Share of Wallet without Survey Data. Chapter 19, Market Segmentation Based on TimeSeries Data Using Latent Class Analysis. Chapter 20, Market Segmentation: An Easy Way to Understand the Segments. Chapter 21, The Statistical Regression Model: An Easy Way to Understand the Model. Chapter 23, Model Building with Big Complete and Incomplete Data. Chapter 24, Art, Science, Numbers, and Poetry, is a highorder blend of artwork, science, numbers, and poetry, all inspired by the Egyptian pyramids, da Vinci, and Einstein. Love it or hate it, this chapter makes you think. Chapter 27, Decile Analysis: Perspective and Performance. Chapter 28, Net TC Lift Model: Assessing the Net Effects of Test and Control Campaigns extends the practice of assessing response models to the proper use of a control group by offering a simple, straightforward, reliable model that is easy to implement and understand. Chapter 34, Opening the Dataset: A TwelveStep Program for Dataholics, has valuable content for statisticians as they embark on the first step of any journey with data. Set in prose, I provide a light reading on the expectant steps of what to do when cracking open the dataset. Enjoy. Chapter 43, Text Mining: Primer, Illustration, and TXTDM Software, has three objectives: First, to serve as a primer, readable, brief though detailed, about what text mining encompasses, and how to conduct basic text mining; second, to illustrate text mining with a small body of text, yet interesting in its content; and third, to make text mining available to interested readers. Chapter 44, Some of My Favorite Statistical Subroutines, includes subroutines referenced throughout the book and generic subroutines for some 2ndedition chapters for which I no longer have the data.

For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT1; or email at br@dmstat1.com. 
Signup for a free GenIQ webcast: Click here. 

