Volume 69, Issue 3, Supplement , Page S70, 1 November 2007
Predicting Treatment Outcomes using a Genetic Algorithm
Article Outline
Purpose/Objective(s)
To develop a genetic algorithm for the variable selection in logistic regression for the statistical analysis of radiotherapy outcome based on physiological, biological, clinical and dose-volume factors.
Materials/Methods
An outcome analysis tool, EUCLID [1], was developed to perform univariate and multivariate analysis of clinical studies with a large number of variables. A given outcome can be correlated to a combination of input factors using a logistic regression model. In this work, a genetic algorithm (GA) was developed to determine the regression model with the highest predicting power, i.e. the set of variables that best predicts the outcome. The fitness of the model is determined by a combination of two functions: one that rewards the predictive power of the model, measured either by the area under the receiver-operator characteristic (ROC) curve or the Spearman correlation coefficient between the model output and the observation; and one that rewards the statistical significance of the selected variables, based on their p-value. For each generation, offsprings are produced using crossover and mutation genetic operators on the best individuals based on their fitness score. The algorithm was designed to run for different model orders, i.e. number of variables included in the model, and the best model for each order is presented in the output. The algorithm was tested on a data set from a prospective IRB-approved clinical study aimed at understanding RT-induced lung injury. This data set was analyzed concurrently with EUCLID and with DREES [2], which used sequential forward selection, and bootstrap or leave-one-out cross validation (LOO-CV) methods to create the model and ensure its robustness. The goal was to extract the best predictors of radiation pneumonitis (RP) from a pool of 20 clinical factors on 150 NSCLC patients treated with 3DCRT at Duke University Medical Center (Fig.).
Results
For model orders up to 4, all methods found the same set of variables to be the best predictors of RP. For higher model orders, the additional variables differed from one method to another, but the correlation of these variables with RP was not statistically significant (p > 0.05). The ranking of the model orders depended on the weight of the p-value function in the fitness score, and was different between the EUCLID GA and the DREES methods. However all methods agreed that V30 (p = 0.008) and the presence or absence of chemotherapy treatment (p = 0.036) were the best predictors, in a model of order 2.
Conclusions
Genetic algorithms provide a robust and efficient method for the selection of significantly predicting variables in logistic regression analysis of large clinical studies.
Supported in part by NIH grant R01 CA69579.
Author Disclosure: O. Gayou, None; S.K. Das, None; S. Zhou, None; L.B. Marks, None; D.S. Parda, None; M. Miften, None.
PII: S0360-3016(07)01310-7
doi:10.1016/j.ijrobp.2007.07.128
© 2007 Elsevier Inc. All rights reserved.
Volume 69, Issue 3, Supplement , Page S70, 1 November 2007

