CHAID, a technique whose original intent was to detect interaction between variables (i.e., find "combination" variables), recursively partitions a population into separate and distinct groups, which are defined by a set of independent (predictor) variables, such that the CHAID Objective is met: the variance of the dependent (target) variable is minimized within the groups, and maximized across the groups. CHAID stands for CHi-squared Automatic Interaction Detection:
- CHi-squared
- Automatic
- Interaction
- Detection (not detector)
Its advantages are that its output is highly visual, and contains no equations. It commonly takes the form of an organization chart, more commonly referred to as a tree display. As an illustration, consider the CHAID Tree, below. The tree can "loosely" be interpreted as: The overall Response of 10% (from a population of size 1000) is explained and predicted by primarily Martial Status, and secondarily Gender and Pet Ownership. Note: CHAID does not work well with small sample sizes as respondent groups can quickly become too small for reliable analysis.
In addition to CHAID detecting interaction between independent variables – for explanatory studies that are concerned with the impact that many variables have on each other (e.g., in the Response Tree above, Martial Status & Gender, and Martial Status & Pet Ownership are two interaction variables as they differentially affect response rates across the bottom respondent groups) – it is often used as a prediction method. Using CHAID, the data analyst can uncover relationships between a dependent variable, e.g., response to a mail solicitation, and a host of predictor variables such as product, price, promotion, recency, frequency, and prior purchases. Accordingly, the result is a CHAID regression tree that allows the data analyst to predict which individuals are most likely to respond in the future to a similar mail solicitation. The above describes CHAIDs original intent, and frequent usage.
Today in database marketing, CHAID primarily serves as a market segmentation technique. The Response Tree, above, represents a market segmentation of the population under consideration. The (five) bottom branch "boxes" called nodes, namely, the segments, represent the resultant market segmentation. The segments are prioritized for targeting based on first their level of responsiveness, and second on their size. The upper segments, defined by response rates larger than the overall response rate (10% in is case), are the "low-hanging" fruits, which are high-yielding (generate response greater than average) and require little effort to obtain. The lower segments, defined by response smaller than the average, are "high-floating" fruits, which are not high-yielding and require extra effort to acquire. However, the lower segments offer the marketer a challenge with a "juicy" yield if a high-octane strategy can be devised to efficiently tap into these segments. The middle segments, defined by response about equal to the average, offer the marketer a choice either to use the current business-as-usual strategy to yield average results (10%), or implement an unexpected forceful strategy (like for the lower segments) to efficiently stimulate these segments to produce greater than average results. Thusly, the priority of the five segments, three upper segments {1, 2 and 3}, one middle segment {4} and one lower segment {5}, for targeting are:
- {Married Males, 50% response rate, size 50}
- {Divorced with no Pets, 50% response rate, size 50}
- {Married Females, 26.7% response rate, size 150}
- {Divorced with Pets, 7.1% response rate, size 350}
- {Singles, 10% response rate, size 400}
(Hidden object lession: I noticed the tree has an arithmetic error in one of its nodes. I was going to correct it, but I leave it to the curious reader to find the error. This tree now provides an another obejective: to further the understanding of tree methodology itself. Nothwithstanding the errant node, the tree still carries the article's original intent of defining target markets.{For a correct version of the tree go to: http://www.geniq.net/res/OriginalCHAIDintent.html ; but, please note the "priority of the five segments" originally posted MUST be changed to reflect the correct tree.})
|