AutoReg
The method
First of all, the procedure is based on logistic regression. Despite the great potential of this method,
this choice is a limitation for the process: you can adjust this restriction by improving
the code or you can wait for a future release.
The program is developed in some steps, whose core is regression.
Here is the process step by step, with reference to the code as it appears in the
corresponding page (in particular, line numbers are
quoted from the pdf).
Utility macros
- Step 0 - Definition of utility macros (lines 25 - 200):
- definition of simpson_c macro, used in the code for the calculation
of Simpson pseudo-correlation index (lines 25 - 130)
- definition of mod_b_meno_a macro, used to statistically compare
two different models (lines 130 - 200)
Main macro (classizz)
- Step 1 - Loading and controlling parameters and
input data (lines 200 - 560):
- the program verifies the existence and validity of the information
given in input by the user.
- Step 2 - Management and reorganization of input information (lines 560 - 1370):
- data is copied in order to preserve the integrity of the starting dataset (lines 560 - 603)
- nominal variables 'K' are handled (lines 603 - 850)
- ordinal variables 'O' are handled (lines 850 - 1030)
- 'X' variables (that will be categorized) are handled (lines 1030 - 1275)
- single mode variables (statistically irrelevant) are erased (lines 1275 - 1370)
- Step 3 - Correlation analysis (lines 1370 - 1700):
- the correlation between pairs of variables is analyzed with the scheme
presented here
(the matrix below is a summary of correlation index used).
|
O (Ordinal var.) |
X (Numeric var. to compact) |
Q (Quantitative var.) |
C (Qualitative var.) |
O (Ordinal var.) |
Spearman (S) |
Spearman (S) |
Spearman (S) |
Simpson (C) |
X (Numeric var. to compact) |
Spearman (S) |
Pearson (P) |
Pearson (P) |
Simpson (C) |
Q (Quantitative var.) |
Spearman (S) |
Pearson (P) |
Pearson (P) |
Simpson (C) |
C (Qualitative var.) |
Simpson (C) |
Simpson (C) |
Simpson (C) |
Simpson (C) |
- Step 4 - Regression (lines 1700 - 3445):
- variables and tables used in estimation process are inizialized (lines 1700 - 1810)
- all variables that potentially could be used (all variables that are not already part
of the model and which are not correlated to other variables already in the model)
are tested to be introduced into the model.
The variable that gives the most successful model, if statistically significant, is inserted
into the model (stepwise - lines 1810 - 2980)
- the variables of the model are eliminated one at a time, so we can take the top performing model
(if it is not statistically different from the previous model) (backward - lines 2980 - 3393)
- ending of regression cycle (lines 3393 - 3445)
- Step 5 - Cleaning of the system and writing on output files (lines 3445 - 3960):
- cleaning up of temporary tables and creating output files
to use the model
Creation date: 17 Sep 2010
Translation date: 30 Dec 2012
Last change: 17 May 2013
Translation reviewed by
Giulia Di Lallo