Data defines the model by dint of genetic programming, producing the best decile table.


Calculating the Average Correlation Coefficient
 Bruce Ratner, Ph.D.
Live chat by Boldchat
Live chat by Boldchat

The average correlation coefficient of a correlation matrix is a useful measure of the internal reliability of the set of variables in the matrix. Moreover, it is a (gross) measure of predictiveness of the variables - when the correlations are with respect to a target variable. This report provides a SAS-code program for calculating the Average Correlation Coefficient. The program should be a welcomed entry in the tool kit of data analysts who frequently work with BIG data.



/************First Create Data IN ***** **/
data IN;
input ID 2.0 GENDER $1. MARITAL $1.;
cards;
01MS
02MM
03M
04
05FS
08FM
07F
08 M
09 S
10MD
;
run;

data IN;
set IN;
GENDER_ = GENDER; if GENDER =' ' then GENDER_ ='x';
MARITAL_= MARITAL;if MARITAL=' ' then MARITAL_='x';
run;

PROC TRANSREG data=IN DESIGN;
model class (GENDER_ / ZERO='x');
output out = GENDER_ (drop = Intercept _NAME_ _TYPE_);
id ID;
run;
proc print;
run;

proc sort data=GENDER_ ;by ID;
proc sort data=IN ;by ID;
run;

data IN;
merge IN GENDER_ ;
by ID;
run;
proc print data=IN;
run;

PROC TRANSREG data=IN DESIGN;
model class (MARITAL_ / ZERO='x');
output out=MARITAL_ (drop= Intercept _NAME_ _TYPE_);
id ID;
run;
proc print;
run;

proc sort data=MARITAL_;by ID;
proc sort data=IN ;by ID;
run;

data IN;
merge IN MARITAL_;
by ID;
run;
proc print data=IN;
run;
/***********End of Creating Data IN ******/

/************ SAS-code Program for Calculating Average Correlation Coefficient**********/

proc corr data=IN out=out;
var GENDER_M GENDER_F MARITAL_M MARITAL_S MARITAL_D;
run;

data out1;
set out;
if _type_='MEAN' or _type_='STD' or _type_='N' then delete;
drop _type_;
array vars (5)
GENDER_M GENDER_F MARITAL_M MARITAL_S MARITAL_D ;
array pos (5) x1 - x5;
do i= 1 to 5;
pos(i)=abs(vars(i));
end;
drop
GENDER_M GENDER_F MARITAL_M MARITAL_S MARITAL_D i;
run;

data out2;
set out1;
array poss (5) x1- x5;
do i= 1 to 5;
if poss(i) =1 then poss(i)=.;
drop i;
end;
run;
proc print;run;

proc means data=out2 sum;
output out=out3 sum=;
proc print;run;

data out4;
set out3;
sum_=sum(of x1-x5);
sum_div2= sum_/2;
bot= ((_freq_*_freq_)-_freq_);
avg_corr= sum_div2/bot;
run;

data avg_corr;
set out4;
keep avg_corr;
proc print;run;


For more information about this article, call Bruce Ratner at 516.791.3544 or 1 800 DM STAT-1; or e-mail at br@dmstat1.com.
Sign-up for a free GenIQ webcast: Click here.