Data defines the model by dint of genetic programming, producing the best decile table.


Calculating the Average Correlation Coefficient: Why?
 Bruce Ratner, Ph.D.

The average correlation coefficient of a correlation matrix is a useful measure of the internal reliability of the set of variables in the matrix. Moreover, it is a measure of the degree of multi-collinearity among the predictor variables in a model. The smaller the value of the AVG_CORR the better: 1) the model's predictiveness; and 2) the assessment of a predictor variable's contribution to the dependent variable.

This report provides a SAS-code program for calculating the Average Correlation Coefficient. The program should be a welcomed entry in the tool kit of data analysts who frequently work with BIG data.



/************First Create Data IN ***** **/
data IN;
input ID 2.0 GENDER $1. MARITAL $1.;
cards;
01MS
02MM
03M
04
05FS
08FM
07F
08 M
09 S
10MD
;
run;

data IN;
set IN;
GENDER_ = GENDER; if GENDER =' ' then GENDER_ ='x';
MARITAL_= MARITAL;if MARITAL=' ' then MARITAL_='x';
run;

PROC TRANSREG data=IN DESIGN;
model class (GENDER_ / ZERO='x');
output out = GENDER_ (drop = Intercept _NAME_ _TYPE_);
id ID;
run;
proc print;
run;

proc sort data=GENDER_ ;by ID;
proc sort data=IN ;by ID;
run;

data IN;
merge IN GENDER_ ;
by ID;
run;
proc print data=IN;
run;

PROC TRANSREG data=IN DESIGN;
model class (MARITAL_ / ZERO='x');
output out=MARITAL_ (drop= Intercept _NAME_ _TYPE_);
id ID;
run;
proc print;
run;

proc sort data=MARITAL_;by ID;
proc sort data=IN ;by ID;
run;

data IN;
merge IN MARITAL_;
by ID;
run;
proc print data=IN;
run;
/***********End of Creating Data IN ******/

/************ SAS-code Program for Calculating Average Correlation Coefficient**********/

proc corr data=IN out=out;
var GENDER_M GENDER_F MARITAL_M MARITAL_S MARITAL_D;
run;

data out1;
set out;
if _type_='MEAN' or _type_='STD' or _type_='N' then delete;
drop _type_;
array vars (5)
GENDER_M GENDER_F MARITAL_M MARITAL_S MARITAL_D ;
array pos (5) x1 - x5;
do i= 1 to 5;
pos(i)=abs(vars(i));
end;
drop
GENDER_M GENDER_F MARITAL_M MARITAL_S MARITAL_D i;
run;

data out2;
set out1;
array poss (5) x1- x5;
do i= 1 to 5;
if poss(i) =1 then poss(i)=.;
drop i;
end;
run;
proc print;run;

proc means data=out2 sum;
output out=out3 sum=;
proc print;run;

data out4;
set out3;
sum_=sum(of x1-x5);
sum_div2= sum_/2;
bot= (((_freq_*_freq_) -_freq_))/2;
avg_corr= sum_div2/bot;
run;

data avg_corr;
set out4;
keep avg_corr;
proc print;run;

For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com.
Sign-up for a free GenIQ webcast: Click here.