Estimates of Alpha/Beta (α/β) Ratios for Individual Late Rectal Toxicity Endpoints: An Analysis of the CHHiP Trial

Purpose Changes in fraction size of external beam radiation therapy exert nonlinear effects on subsequent toxicity. Commonly described by the linear-quadratic model, fraction size sensitivity of normal tissues is expressed by the α/β ratio. We sought to study individual α/β ratios for different late rectal effects after prostate external beam radiation therapy. Methods and Materials The CHHiP trial (ISRCTN97182923) randomized men with nonmetastatic prostate cancer 1:1:1 to 74 Gy/37 fractions (Fr), 60 Gy/20 Fr, or 57 Gy/19 Fr. Patients in the study had full dosimetric data and zero baseline toxicity. Toxicity scales were amalgamated to 6 bowel endpoints: bleeding, diarrhea, pain, proctitis, sphincter control, and stricture. Lyman-Kutcher-Burman models with or without equivalent dose in 2 Gy/Fr correction were log-likelihood fitted by endpoint, estimating α/β ratios. The α/β ratio estimate sensitivity was assessed using sequential inclusion of dose modifying factors (DMFs): age, diabetes, hypertension, inflammatory bowel or diverticular disease (IBD/diverticular), and hemorrhoids. 95% confidence intervals (CIs) were bootstrapped. Likelihood ratio testing of 632 estimator log-likelihoods compared the models. Results Late rectal α/β ratio estimates (without DMF) ranged from bleeding (G1 + α/β = 1.6 Gy; 95% CI, 0.9-2.5 Gy) to sphincter control (G1 + α/β = 3.1 Gy; 95% CI, 1.4-9.1 Gy). Bowel pain modelled poorly (α/β, 3.6 Gy; 95% CI, 0.0-840 Gy). Inclusion of IBD/diverticular disease as a DMF significantly improved fits for stool frequency G2+ (P = .00041) and proctitis G1+ (P = .00046). However, the α/β ratios were similar in these no-DMF versus DMF models for both stool frequency G2+ (α/β 2.7 Gy vs 2.5 Gy) and proctitis G1+ (α/β 2.7 Gy vs 2.6 Gy). Frequency-weighted averaging of endpoint α/β ratios produced: G1 + α/β ratio = 2.4 Gy; G2 + α/β ratio = 2.3 Gy. Conclusions We estimated α/β ratios for several common late adverse effects of rectal radiation therapy. When comparing dose-fractionation schedules, we suggest using late a rectal α/β ratio ≤ 3 Gy.

funder and sponsor guidelines. Requests are via a standard proforma describing the nature of the proposed research and extent of data requirements. Data recipients are required to enter a formal data sharing agreement, which describes the conditions for release and requirements for data transfer, storage, archiving, publication, and Intellectual Property. Requests are reviewed by the Trial Management Group (TMG) in terms of scientific merit and ethical considerations, including patients' consent. Data sharing is undertaken if proposed projects have a sound scientific or patients' benefit rationale, as agreed by the TMG and approved by the Independent Data Monitoring and Steering Committee, as required. Restrictions relating to patients' confidentiality and consent will be limited by aggregating and anonymizing identifiable patients' data. Additionally, all indirect identifiers that could lead to deductive disclosures will be removed in line with Cancer Research UK data sharing guidelines.

Introduction
Moderately hypofractionated external beam radiation therapy (EBRT) for the curative treatment of nonmetastatic prostate cancer (PCa) has gained broad acceptance following reports of efficacy and safety from the CHHiP, PROFIT, and RTOG 0415 hypofractionation studies. [1][2][3] Each trial randomized between moderately hypofractionated and conventional dose-escalated EBRT regimens, and all showed noninferiority of the hypofractionated regimens for 5-year biochemical and clinical progression-free survival. A fourth study, HYPRO, unfortunately failed to establish superiority of a dose-escalated, hypofractionated schedule, which demonstrated increased toxicity. 4 Rectal toxicity endpoints are important late adverse effects of prostate EBRT. Models have been produced for many common individual rectal endpoints, such as bleeding, proctitis, stool frequency, and fecal incontinence. [5][6][7][8][9][10][11] These models incorporate dose-volume histogram (DVH)ederived values as dosimetric predictors. In the hypofractionation era, researchers have adjusted the rectal dose bins using the linear-quadratic model, 12 describing normal tissue fraction sensitivity by means of the a/b ratio. Commonly, a late rectal a/b Z 3 Gy is assumed 13,14 to produce equivalent dose in 2 Gy fractions (EQD2) and to enable comparison with standard 2 Gy/fraction treatments. 12 Similarly, EQD2 correction has been used when summating brachytherapy and EBRT doses, with a/b Z 3 to 5.4 Gy. [15][16][17] These EQD2-corrected comparisons of regimens are dependent on an accurate estimate of the a/b ratio. Researchers have previously provided human estimates for the a/b ratio of overall late rectal toxicity in the range 2.7 to 7.2 Gy. [18][19][20][21] However, individual rectal toxicity endpoints (eg, bleeding, urgency) are driven by different upstream pathophysiologic processes 22 and may thus have distinct sensitivity to fraction size, as manifest by the a/b ratio.
Although individual endpoint estimates have been produced for the central nervous system, 23 to our knowledge, such estimates have not previously been made for pelvic normal tissues.
Using data from a phase 3 trial of hypofractionated radiation therapy (RT), this study aims to estimate a/b ratios for individual rectal toxicity endpoints: bleeding, stool frequency, proctitis, sphincter control, and stricture or ulcer. It also aims to test whether such a/b ratio estimates are influenced by inclusion of other predictive clinical factors: age, diabetes, hypertension, inflammatory bowel disease (IBD) or diverticular disease, and hemorrhoids.

Methods and Materials
The CHHiP trial The CHHiP trial (ISRCTN97182923) has previously been described in detail. 1,24,25 Briefly, 3216 men were recruited, all with histologically confirmed T1beT3aN0M0 prostate adenocarcinoma, prostate specific antigen 40 ng/mL and risk of lymph node involvement <30%. Open-label randomization was performed 1:1:1 between conventional (74 Gy in 37 fractions [Fr] over 7.4 weeks), higher dose hypofractionated (60 Gy in 20 Fr over 4 weeks) or lower dose hypofractionated (57 Gy in 19 Fr over 3.8 weeks) EBRT. The primary endpoint of biochemical or clinical failure was met, with noninferiority of the 60 Gy/20 Fr regimen confirmed. 1 Ethics approval has been described previously. 1 The Institute of Cancer Research Clinical Trials and Statistics Unit (ICR-CTSU, London, UK) coordinated the study and managed the data used in this analysis.

Patient cohort and Digital Imaging and Communications in Medicine files
CHHiP trial patients who received all fractions of one of the protocol RT regimens were eligible for inclusion in this substudy. Those without centrally available Digital Imaging and Communications in Medicine (DICOM) data from computed tomography, structures, and dose cube were excluded. Non-DICOM treatment plan file types were converted to DICOM.

Rectal contouring and dose-volume histogram generation
The CHHiP trial protocol recommended, ideally, an empty rectum. Contouring for the rectum, as a solid structure, was "from the anus (usually at the level of the ischial tuberosities or 1 cm below the lower margin of the PTV whichever is more inferior) to the recto-sigmoid junction." 1 Quality assurance (ie, adherence to the CHHiP trial protocol specifications of rectal contour) was undertaken for the contoured rectums on all DICOM data sets obtained, by 1 of 5 trained observers. In particular, attention was paid to the inferior and superior extent of contour. Once the rectal contour was checked, and recontoured where necessary, the rectal DVH was recalculated for use in this study.

Endpoints
The CHHiP trial collected bowel toxicity information in the form of both clinician-reported outcomes 1 and patientreported outcomes (PROs). 25 Clinician-reported outcomes were chosen, because PRO measures changed during the course of the trial. These sources were Radiation Therapy Oncology Group (RTOG) late rectal toxicity, 26 the Royal Marsden Hospital (RMH) scale, 27 and Late Effects Normal Tissue e Subjective, Objective & Management (LENT-SOM). 28 Only the Royal Marsden Hospital and LENT-SOM data were collected at registration (baseline) and before RT. All scales were collected for late rectal toxicity at 6-, 12-, 18-, 24-, 36-, 48-, 60-month follow-up after the start of RT. The scales were merged into new amalgamated endpoints representing underlying separate symptomatic issues, using a methodology described previously. 29 Grading was simplified to grade 0 for no toxicity, grade 1 for toxicity not needing intervention, and grade 2 for any toxicity requiring intervention. The scores were dichotomized to consider: grade 0 versus grade 1 and grade 2 or greater (G1 þ comparison); grade 0 and grade 1 versus grade 2 or greater (G2 þ comparison). For bowel pain, sphincter control and stricture/ulcer, grade !2 events were rare (<5%); therefore, only a G1 þ comparison was performed. No attempt was made to amalgamate endpoints to generate G3þ models, both owing to the rarity of G3þ events and the difficulty of unifying such events between scales.
For each endpoint, patients were excluded if any relevant toxicity was reported at baseline or before RT assessments, or if both assessments were missing. The point of this exclusion was to avoid those with pre-existing symptoms registering as having treatment-induced toxicity events during follow-up. Patients were further excluded for an endpoint if they were missing the relevant follow-up data at more than 3 of the 7 (>50%) late toxicity assessments. Toxicity events were scored for any relevant toxicity of sufficient grade at any time point (ie, worst toxicity). A full description of the endpoint generation process is provided in Appendix E1.

Generalized Lyman-Kutcher-Burman model
A generalized Lyman-Kutcher-Burman (LKB) model has been described previously for rectal a/b ratio estimation. 20 Dose modifying factors (DMFs) were incorporated as modulators of each individual patient's effective dose parameter (D Eff ), per prior work by Tucker et al. 30 The model is expressed as a definite integral: where NTCP is the normal tissue complication probability. Furthermore: Here, TD 50 represents the tolerance dose for 50% toxicity, at the median (steepest) part of the NTCP dose response curve; m is a parameter inversely controlling the steepness at TD 50 . DMF is the dose modifying factor corresponding to either: ones and zeros for binary risk factors, or a positive integer for age; d is the dose modifying coefficient, used to adjust TD 50 in the presence of the risk factor specified by DMF. For binary DMFs, the coefficient is for presence of risk factor; for numerical DMFs (age only), it is evaluated on a per-unit basis. Note that a DMF covariate of zero will result in no change to the effective dose (D Eff ), which is defined by: where n represents the relative seriality of a tissue endpoint dose response, with values toward 0 being more serial and toward 1 being more parallel 31 ; z is the number of dose bins, iterated by dose bin i; and v i is the relative volume of an organ present in the dose bin i. EQD2 i , is the EQD2 for dose bin i, which is given by: where D i is the total dose in Gy, to a given DVH dose bin i; d i is the dose in Gy per fraction, to a given dose bin (ie, D i divided by number of fractions); and a/b (Gy) is the theoretical single fraction dose giving equal contribution for linear (a) and quadratic (b) components of the linearquadratic formula. 12 This model is termed LKB-EQD2, or LKB-EQD2-DMF with the inclusion of a DMF in Equation 2. The LKB-NoEQD2 model without EQD2 correction uses Equations 1 and 2 (without DMF inclusion), but substitutes physical dose bin dose for EQD2 i in Equation 3. This LKB-NoEQD2 model was fitted separately for patients receiving 2 Gy per fraction (74 Gy in 37 Fr) and 3 Gy per fraction patients (60 Gy in 20 Fr and 57 Gy in 19 Fr).

Initial grid search
For each model, initial fitting was done using the grid search method, as described previously. 7 Each unknown parameter was searched on a grid with dimensionality equal to number-of-fit parameters (Table E1). LKB-EQD2 models with fixed a/b were also produced, using the same parameter grid as those with fitted a/b, but fixing the a/b to 3 or 4.8 Gy, per prior estimates. 19,20 Model performance was assessed in 2 ways. First, the naive performance was assessed by calculating a log likelihood sum. Better model performance will produce a less negative log likelihood sum. It was calculated as: where c Z number of patients (with j as iterator through such patients).
The model parameter values generating the ten least negative performance metrics were recorded at the end of the grid search. The best (least negative) of these was noted as the naive model performance for later use in Equation 8.
The second action at each grid step was to assess performance of 2000 bootstraps, drawn with replacement, with unique bootstraps for each endpoint. The bootstrap performance was also assessed with Equation 6. At the end of the grid search, the parameters giving the 10 least negative performance metrics for each bootstrap were recorded. The parameters resulting in best bootstrap performance were noted, so that these could be used later for out-of-the-bag prediction in Equation 7. 32

Second-stage search
To account for the known sensitivity of fitting algorithms to initial starting parameters and hence to improve model performance, 33 a secondary optimization search for parameter values was undertaken. The values of n, m, TD50, a/b, and DMFs producing the 10 best performance metrics (by Equation 6) were used as the initial parameters in a constrained Nelder-Mead simplex algorithm search 34 to determine whether further improvement in performance could be found (ie, for each endpoint): 1 naive model and 2000 bootstraps with 10 searches Z 20010 algorithm searches. This algorithm was run with constraints: n Z 0.01 to 10; m Z 0.01 to 10; TD50 Z 0.01 to 1000 Gy.
Where freely fitted, a/b was searched in space 0.001 to 1000 Gy. The dose modifying factor covariate was searched in space e10 to 10, which when raised to the natural base e, searches a dose multiplier range of 4.54 Â 10 e4 to 22,026. This wide bounding of all fit parameters was chosen to prevent bootstrap distributions being inappropriately constrained, which would bias the coverage of the nonparametric 95% confidence interval. For the naive likelihood and each bootstrap, the final best model parameters were those resulting in best performance (by Equation 6) from any of the grid search positions or any of the subsequent 10 Nelder-Mead simplex algorithm searches.

Estimating test performance and model comparison
A model comprising more free parameters is always likely to improve naive likelihood performance, but this can be due to overfitting. 35 To address this difficulty, the 632 bootstrap estimator was used as an unbiased estimator of test performance. 36 It balances out the overoptimistic naive likelihood (fitted on the population) against the negatively biased out-of-the-bag bootstrap estimate. We preferred 632 over the 632þ bootstrap estimator, owing to faster calculation and the low risk of near-perfect prediction with a relatively simple model. 32 The first step calculated the outof-the-bag (OOB) performance for the model: where c is the total number of patients (iterated by j ), and z is the number of bootstraps not containing patient j (iterated by boot). The predicted likelihood is derived by inserting the predicted NTCP into Equation 5.
The 632 estimator was then calculated 32 : 632 Estimator Z 0:368,Naive Performance Models were compared by means of the likelihood ratio test of the 632 estimators. First, comparing whether the LKB-EQD2 model with free-fitted a/b ratio had significantly better 632 estimator than the model with the a/b ratio fixed at two reported literature values: a/b Z 3 Gy or 4.8 Gy. 19,20 Second, examining for significant improvement from LKB-EQD2 to LKB-EQD2-DMF, which was sequentially tested with each of the DMFs. Tests were planned only where log likelihood improvement occurred; with approximately 50 tests anticipated, a penalized P value of .001 was used for interpretation of significance. 37 Parameter estimates were obtained at the 50th centile of the bootstrap distribution; 95% bootstrap confidence intervals (CIs) for the optimal model parameter values were obtained as the 2.5th and 97.5th centiles of the corresponding parameter values producing the best summed log likelihood performance metric for each bootstrap.

Graphical outputs of calibration
Model calibration was fitted as a logistic regression of predicted NTCP values for each patient as a single predictor against observed binary outcomes (toxicity or no toxicity). The fitted model was then displayed graphically against ideal (perfect) predictiondtermed the calibration curve. Furthermore, binned calibration plots were examined, with patients grouped into deciles of predicted risk: average bin NTCP plotted against observed bin toxicity proportion.

Software
Processing of trial data into the endpoints used for this study was done with Stata (version 15; Statacorp). VODCA (version 5.4.1; Medical Software Solutions) was used to convert non-DICOM data to DICOM and for the checking of rectal contours. MATLAB (version 2018b; MathWorks) was used to import DVH data from DICOM files and for all modeling using custom scripts. Nelder-Mead simplex algorithm searches were performed with a modified bounded version of fminsearch (fminsearchbnd, version 1.4.0.0). 38 Tables were formatted in Excel 2019 and Word 2019 (Microsoft). All plots were produced in MATLAB.

Results
Two thousand two hundred fifteen patients from the CHHiP trial had appropriate data for this analysis. Figure 1 is a CONSORT-style flow diagram accounting for all patients who were originally randomized into the CHHiP study and their reasons for noninclusion in this analysis. Key relevant baseline and treatment characteristics for the included patients are shown in Table 1, which are similar to those in the CHHiP trial as a whole. These date indicate that patients in this study are representative of the whole trial cohort. The cumulative rectal DVH curves for all patients, separated by fractionation arm, are shown in Appendix E2. A summary of the number of patients meeting requirements (!50% follow-up form completion) for each endpoint modelled is shown in Table 2, with the proportion of patients expressing toxicity ranging from 3.6% for stricture/ulcer G1þ (79/ 2206) to 38.1% for stool frequency G1þ (771/2025). The influence of excluding patients with baseline toxicity on categorical DMF proportions is examined in Table E2. For some endpoints, patients with DMF present were overrepresented in those excluded for baseline toxicity versus those included in study: IBD/diverticular disease and both rectal bleeding G1þ and G2þ; pelvic surgery and stricture/ ulcer G1þ; hemorrhoids and rectal bleeding G1þ and G2þ, frequency G1þ and G2þ, pain G1þ, proctitis G1þ and G2þ. Table 3 (upper 2 sections) shows parameter estimates of n, m, and TD50 for fits of the LKB-NoEQD2 model to 2 groups: 74 Gy only or 57 and 60 Gy combined. Each endpoint is presented separately. Table 3 then shows LKB-EQD2 model fits for all patients combined, across the same endpoints, including estimates for the a/b ratio. We note that the a/b ratio estimates for most endpoints were <3 Gy, with the upper bound of the 95% CI for rectal bleeding G1þ being <3 Gy. The 95% CI for pain G1þ was extremely wide (a/b Z 0.0-840 Gy), suggesting a poor fit for this endpoint (ie, limited dose dependency). Table 3 also shows fits for the LKB-EQD2 model, with an a/b ratio fixed at 3 and 4.8 Gy. The P values for likelihood ratio test comparison between the LKB-EQD2 model (unfixed a/b) and the 2 fixed a/b models are shown. In many cases, the less flexible model (LKB-EQD2 with fixed a/b ratio) had a better fit (by 632 estimator), implying overfitting and making likelihood ratio testing inappropriate. The LKB-EQD2 model with free a/b ratio was significantly better than the model with fixed a/b 4.8 Gy for rectal bleeding G1þ (P Z .00032). Other comparisons, in which the LKB-EQD2 model with fitted a/b ratio was better, did not meet the adjusted significance threshold. The effect on model parameters of sequential inclusion of each DMF is reported in Table 4. For each endpoint, the LKB-EQD2 model results without inclusion of DMF are reproduced in the first row for ease of comparison. Where the goodness of fit (as assessed with the 632 estimator) was improved with inclusion of DMF, P values for likelihood ratio testing of the LKB-EQD2-DMF model against the LKB-EQD2 model are presented. Only 2 LKB-EQD2-DMF models improved on LKB-EQD2, by adjusted significance: IBD/diverticular disease for both stool frequency G2þ (DMF Z 1.37; 95% CI, 1.13-1.82; P Z .00041) and proctitis G1þ (DMF Z 1.27; 95% CI, 1.10-1.58; P Z .00046). In both cases, a/b ratio estimates of the LKB-EQD2 versus LKB-EQD2-DMF fits did not differ by a clinically relevant margin: stool frequency G2þ (2.7 vs 2.5 Gy), proctitis G1þ (2.7 vs 2.6 Gy). Although inclusion of other DMFs did not meet adjusted significance for model fit improvement, it can be seen in Table 4 that any differences between LKB-EQD2-DMF model and LKB-EQD2 model a/b ratio estimates are not clinically meaningful.
The calibration curve and binned calibration plot for the rectal bleeding G1þ LKB-EQD2 model is shown in Figure 2. Note that this is a well calibrated example. Calibration curves and binned calibration plots are presented for the LKB-EQD2 model fitted to each endpoint in Appendix E3 (Figs. E1-E16). The best calibrated models are those with the higher event rates (rectal bleeding G1þ, stool frequency G1þ, proctitis G1þ). For those with lowest event rates (pain G1þ, stricture/ulcer G1þ), the calibration bin separation is less pronounced. Similar plots for the LKB-EQD2-DMF model, where it provided a statistically significant improvement in fit (IBD/diverticular disease for stool frequency G2þ and proctitis G1þ) are presented in Appendix E4 (Figs. E17-E20). It can be seen that DMF inclusions cause higher decile risk bins to achieve better separation from other bins, compared with the equivalent LKB-EQD2 models without DMF (Figs. E6, E10).  One overall late rectal a/b ratio for use in the comparison of expected late rectal side effects between differing dose-fractionation schedules is desirable. The frequency weighted average for modelled late rectal G1þ events (excluding pain regarding poor fit) was a/b Z 2.4 Gy and the equivalent for G2þ events was a/b Z 2.3 Gy. Unfortunately, no transformation was found to normalize the highly positively skewed bootstrapped a/b ratio 95% CIs, meaning that pooling standard errors for a unified 95% CI is not appropriate. 39 We would advise caution in the application of any single figure, since as demonstrated, the true fraction size sensitivity may differ between endpoints. The calculation of these estimates is shown in Table E3.

Discussion
In this study, we have used data from a large phase 3 trial of moderately hypofractionated RT for nonmetastatic PCa.
Through fitting an EQD2-corrected LKB model, estimates of the relative fraction size sensitivity (expressed as a/b ratio) for various clinician reported late rectal endpoints         Our a/b ratio estimates are generally lower than previous published articles with estimates of late rectal a/b ratio in humans. Brenner estimated late rectal RTOG G2þ a/b ratio Z 5.4 Gy (95% CI, 3.9-6.9 Gy) using the proportions of patients experiencing toxicity from 8 dose-fractionation schedules in PCa EBRT studies in the United States and Japan. 18  Regarding the components of the traditional LKB model (n, m, TD50), it is reassuring that the LKB-NoEQD2 estimates for conventionally fractionated patients are similar to those previously reported for individual rectal endpoints. 7,[40][41][42] Estimates from these cohorts for bleeding, stool frequency, and proctitis are compared with our data in Table E4. The landmark QUANTEC study conducted a meta-analysis of LKB parameters from 4 of these studies, examining either G2 þ rectal bleeding or G2 þ late toxicity. 43 Comparing our G2 þ rectal bleeding LKB-NoEQD2 values for 74-Gy patients versus these QUANTEC meta-analysis values, we see fairly similar findings: n Z 0.13 (95% CI, 0.01-0.42) versus 0.09 (95% CI, 0.04-0.14); m Z 0.21 (95% CI, 0.06-0.43) versus 0.13 (95% CI, 0.10-0.17); and TD50 Z 74.0 Gy (95% CI, 67.2-96.6) versus 76.9 Gy (95% CI, 73.7-80.1). Separately, we note that our models for pain produced wide CIs (eg, LKB-EQD2 a/b ratio estimate, 3.6 Gy; 95% CI, 0.01-840), suggestive of poor model fit for this endpoint. This is perhaps expected, given the relative subjectivity of pain.
Strengths of this study are drawn from the nature of the recorded data. The CHHiP trial is the largest study of hypofractionated RT for PCa, with two thirds of patients' data being used for this analysis. We have included only patients reporting zero baseline toxicity, to reduce possible pre-existent toxicity noise. Furthermore, we have undertaken data quality assurance by checking every rectal contour for protocol adherence and recalculating DVHs. This large, clean sample, combined with multiple dosefractionation regimens, has permitted a/b ratio estimation with tight CIs and good calibration for more frequently occurring endpointsdwithout the need to fix any of the parameters when modeling, as done previously. 19 This study has also been aided by modern computing power facilitating the use of computationally intensive bootstrapping techniques. These techniques have facilitated nested model comparison using bootstrap-dependent estimates of test performance (632 estimate), reducing the potential influence of overfitting.
Limitations must also be considered, starting with the modeling approach itself. The LKB model is a traditional parametric method for the fitting of RT data, and more recent machine learning and artificial intelligence type modeling methodologies have been applied. 44 The model does, however, facilitate fitting of data, with and without EQD2 correction, to estimate endpoint a/b ratios. Future toxicity modeling work with newer methodologies could benefit from these a/b ratio estimates, when using the linear-quadratic model to rescale DVH data predictors from disparate dose-fractionation regimens.
For the DMF coefficient estimates, it must be remembered that these have been estimated on cohorts in which Model fits for the sequential inclusion of each dose modifying factor, including the 632 estimator for model performance. Each DMF model is compared against the LKB-EQD2 (no DMF) model for the same endpoint by likelihood ratio test. Note that "worse fit" implies that the more complicated LKB-EQD2-DMF has a worse 632 estimator fit than the simpler LKB-EQD2 (no DMF) model, implying overfitting and making likelihood ratio testing inappropriate.
* Bold P values are significant at adjusted P < .001.
those with baseline toxicity were excluded. Although the risk attributable to RT is hopefully more closely approximated, the absolute risk could be higher for those with a DMF for which disproportionately more patients were excluded for baseline toxicity (eg, hemorrhoids and rectal bleeding G1þ; refer to Table E2). An additional limitation is that motion has been demonstrated interfractionally for the rectum 45 during prostate RT; therefore, the use of CT planned doses in this study is a limitation. We acknowledge that the endpoints modeled here are unlikely to recur in future trials, because of the amalgamation of multiple scales. This was a pragmatic choice based on the toxicity scales available, so there would be benefit to confirmatory studies with modern clinician reported scales (eg, Common Terminology Criteria for Adverse Events) or patient reported scales (eg, EPIC). 46 Finally, despite the use of out-of-the-bag techniques, the data are from a single study, and future validation on another hypofractionated prostate RT data set would be desirable.
It is worth examining the a/b ratio assumptions (Table   E5) and subsequent toxicity outcomes (Table E6) of the published phase 3 hypofractionation trials. The CHHiP Trial assumed a late rectal a/b ratio of 3 Gy and isoeffective design, with the 60-and 57-Gy arms reflecting uncertainty in the prostate a/b ratio (assumed a/b of 2.5 Gy and 1.5 Gy, respectively). The 60-and 57-Gy arms both showed nonsignificantly reduced cumulative rectal grade 2þ toxicity by 5 years (11.9% and 11.3% vs 13.7% for the control arm), with the 60-Gy arm being shown to be noninferior for disease control. 1 PROFIT assumed late rectal a/ b ratio Z 3 to 5 Gy with isoeffective design (prostate a/b ratio, 1-3 Gy), achieving noninferior disease control with reduced late grade 2þ rectal toxicity in the test arm (8.9% vs 13.9%). 2 RTOG 0415 assumed both tumor and late rectal a/b Z 3 Gy, with the trial design escalating EQD2 to both. 3 The trial achieved noninferior disease control with hypofractionation. Given the rectal dose escalation, the increased G2þ rectal toxicity in the hypofractionated arm (22.4% vs 14.0%) is not surprising. The HYPRO trial adopted an isotoxic design, assuming the highest a/b ratio for late rectal toxicity (a/b Z 4-6 Gy). Unfortunately, this study demonstrated increased late G2þ rectal toxicity (21.9% vs 17.7%), without superior disease control. It is worth noting that HYPRO is the only phase 3 moderately hypofractionated study in which the relative test versus control late rectal toxicity was worse than anticipated in the trial design, most likely because of the higher assumed rectal a/b ratio and therefore dose delivered to the test arm.
Both large phase 3 randomized trials of prostate ultra-hypofractionationdPACE-B 47 and HYPO-RT-PC 48 dhave assumed a late rectal a/b of 3 Gy. The HYPO-RT-PC trial showed isoeffective cumulative grade 2 or worse late RTOG rectal toxicity for both arms: 42.7 Gy in 7 fractions (9.5%) and 78 Gy in 39 fractions (9.7%). 48 The QUANTEC study on rectal toxicity also recommended dose adjustment by an a/b ratio of 3 Gy, 43 an opinion that our data support. Corrected for multiple testing, our LKB-EQD2 models with freely fitted a/b ratios did not significantly outperform the same model with fixed a/b Z 3 Gy. We do note that the upper bound of 95% CI for rectal bleeding G1þ was less than 3 Gy and that the results were close to corrected significance. This is perhaps worth noting, given that the randomized ProtecT trial showed bloody stools to be the most common patient-reported adverse event after RT compared with radical prostatectomy, although the long-term effects on bowel habits and bother were minimal. 49 Future studies might use individual patient dataelevel analysis (accounting for baseline toxicity and dose distributions) of late toxicity from HYPO-RT-PC and, once released, PACE-B, 47 to more definitively confirm applicability of the LQ model to late toxicity in ultrahypofractionation, an area of some debate. 50 It is possible that improving RT delivery techniques could lower rectal doses to less than the level at which fraction size sensitivity meaningfully influences toxicity.

Conclusion
To our knowledge, this study is the first to provide a/b ratio estimates for individual late rectal toxicity endpoints seen after hypofractionated external beam RT for prostate cancer. Although symptom endpoints can occur concurrently, for G1þ rectal bleeding, one of the most objective endpoints, the a/b ratio 95% CI upper bound was <3 Gy. For G1þ endpoints, the frequency-weighted pooled estimate was late rectal a/b ratio Z 2.4 Gy. However, adjusting for multiple testing, no significant improvement from an LKB-EQD2 model with a/b Z 3 Gy was demonstrated. Future individual patient data level analysis on ultrahypofractionated trials is desirable, but for now we suggest that a late rectal a/b ratio of no more than 3 Gy be used when comparing dose fractionation regimens.