|
Data defines the model by dint of genetic programming, producing the best decile table.
|
|
The Importance of Straight Data: For Simplicity, Desirable for Good Modeling Bruce Ratner, Ph.D. |
|
For DMers (namely, statisticians, data analysts, data miners, knowledge discovers, and the like), exploratory data analysis, better known as EDA, places special importance on straight data, not in the least for the sake of simplicity itself. The paradigm of life is simplicity (at least for those of us who are older and wiser). In the physical world, Einstein uncovered one of life’s ruling principles using only three letters: In the visual world, however, simplicity is undervalued and overlooked. A smiley face is an unsophisticated, simple shape that nevertheless communicates effectively, clearly and efficiently. Why, then, should the DMers accept anything less than simplicity in their life’s work? Numbers, as well, should communicate clearly, effectively and immediately. In the DMer’s world there are two features that reflect simplicity – symmetry and straightness in the data. The DMer should insist that the numbers be symmetric and straight.
The straight-line relationship between two continuous variables, say X and Y, is as simple as it gets. As X increases (decreases) in its values so does Y increase (decrease) in its values, in which case it is said that X and Y are positively correlated. Or, as X increases (decreases) in its values so does Y decrease (increase) in its values, in which case it is said that X and Y are negatively correlated. As further demonstration of simplicity, Einstein’s E and m have a perfect positively correlated straight-line relationship.
The second reason for the importance of straight data: When straightening is successful, we almost always see more clearly what is going on in the data, which is desirable for good modeling.
The third reason for the importance of straight data is that most database models require it, as they belong to the class of innumerable varieties of the linear model. Moreover, it has been shown that nonlinear models, which pride themselves on making better predictions with non straight data, in fact, do better with straight data.
By the way, I have not ignored the feature of symmetry. Not accidentally, as there are theoretical reasons, symmetry and straightness go hand-in-hand. Straightening data often make data symmetric, and vice versa. Recall, symmetric data has values that are distributed in the same way above and below the middle of the sample. The iconic symmetric data profile in statistics is the bell-shaped curve.
I have a passion for technology and what it can do for us at the workplace, and our “lifespace.” I did well in kindergarten (“Works well with others, and likes to share his things.”) Sharing (technology) is a bighearted deed. It helps you learn faster, achieve more, simplify your work, deepens your life, and all the while you’re having fun. I want to share my experience of straightening data.
Objective: To straighten the relationship between X1 and Y1, in Table 1, below. I have a solution, but would be curious to see yours. The r (original data) = 0.84, and my r (Bruce's re-expressed data) = 1.0.
Please email me your straightening re-expression and I’ll email you mine.
Hope to hear from you!
  To download The Data in "txt" format, click here.
|
For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com. |
Sign-up for a free GenIQ webcast: Click here. |
|
|