The stepAIC() function from the R package MASS can automate the submodel selection process. The authors state, on page 176 of their bookModern Applied Statistics with S (ISBN 0387954570), that “… selecting terms on basis of of AIC can be somewhat permissive in its choice of termsm being roughly equivalent to choosing an F-cutoff of 2″, and thus one have to proceed manually with iterative application of the dropterm() and update() function.

There is no doubt that the current implementation inspires good practice in model checking and thinking, but there are times where I want to completely automate the process. An example is when I want to quickly select the minimal submodel for many tens of phenotypes.

Here is a function that is capable of doing this. The speed is comparable with stepAIC. Use it at your own risk.

backstepAIC.glm <- function(fmla, data, family, AIC.p.cut=0.05, verbose=TRUE, …){

## this will only work reliably on datasets without missing values (same issue with stepAIC)
data <- data[ , all.vars(fmla)]
data[ complete.cases(data), ]

dt  <- data.frame(Pr=1)   # to initiate the loop

while( max(dt$Pr, na.rm=T) > AIC.p.cut ){

fit <- eval( substitute( glm( fmla, data=data, family=family, …) ), parent.frame() )

if( length( coefficients(fit) ) == 0 ) break()
## intercept only model, cannot drop anymore term so exit while loop

dt <- dropterm(fit , test=”Chisq” )
if(verbose) cat(“Attempting to drop term”, rownames(dt)[ which.max(dt$Pr) ], “\n”)
nm <- setdiff( rownames(dt)[ -which.max(dt$Pr) ], “<none>” )
nm <- c( “1″, nm )

fmla <- as.formula(paste(as.character(fmla)[2], “~”, paste(nm, collapse= “+”)))

}

return( fit )
}

For Cox models, replace the model fitting line with this:

fit <- eval( substitute( coxph( fmla, data=data, x=T, y=T, … ) ), parent.frame() )