AutoReg

The method

First of all, the procedure is based on logistic regression. Despite the great potential of this method, this choice is a limitation for the process: you can adjust this restriction by improving the code or you can wait for a future release.

The program is developed in some steps, whose core is regression.
Here is the process step by step, with reference to the code as it appears in the corresponding page (in particular, line numbers are quoted from the pdf).

Utility macros

Step 0 - Definition of utility macros (lines 25 - 200):
- definition of simpson_c macro, used in the code for the calculation of Simpson pseudo-correlation index (lines 25 - 130)
- definition of mod_b_meno_a macro, used to statistically compare two different models (lines 130 - 200)

Main macro (classizz)

Step 1 - Loading and controlling parameters and input data (lines 200 - 560):
- the program verifies the existence and validity of the information given in input by the user.
Step 2 - Management and reorganization of input information (lines 560 - 1370):
- data is copied in order to preserve the integrity of the starting dataset (lines 560 - 603)
- nominal variables 'K' are handled (lines 603 - 850)
- ordinal variables 'O' are handled (lines 850 - 1030)
- 'X' variables (that will be categorized) are handled (lines 1030 - 1275)
- single mode variables (statistically irrelevant) are erased (lines 1275 - 1370)

Step 3 - Correlation analysis (lines 1370 - 1700):

the correlation between pairs of variables is analyzed with the scheme presented here (the matrix below is a summary of correlation index used).

	O (Ordinal var.)	X (Numeric var. to compact)	Q (Quantitative var.)	C (Qualitative var.)
O (Ordinal var.)	Spearman (S)	Spearman (S)	Spearman (S)	Simpson (C)
X (Numeric var. to compact)	Spearman (S)	Pearson (P)	Pearson (P)	Simpson (C)
Q (Quantitative var.)	Spearman (S)	Pearson (P)	Pearson (P)	Simpson (C)
C (Qualitative var.)	Simpson (C)	Simpson (C)	Simpson (C)	Simpson (C)

Step 4 - Regression (lines 1700 - 3445):
- variables and tables used in estimation process are inizialized (lines 1700 - 1810)
- all variables that potentially could be used (all variables that are not already part of the model and which are not correlated to other variables already in the model) are tested to be introduced into the model. The variable that gives the most successful model, if statistically significant, is inserted into the model (stepwise - lines 1810 - 2980)
- the variables of the model are eliminated one at a time, so we can take the top performing model (if it is not statistically different from the previous model) (backward - lines 2980 - 3393)
- ending of regression cycle (lines 3393 - 3445)
Step 5 - Cleaning of the system and writing on output files (lines 3445 - 3960):
- cleaning up of temporary tables and creating output files to use the model


Main index	Programs index	Autoreg index
Vai alla versione Italiana

Creation date: 17 Sep 2010
Translation date: 30 Dec 2012
Last change: 17 May 2013

Translation reviewed by Giulia Di Lallo