University of Houston University of Houston-Clear Lake ISSO Annual Report Y2002pp. 13-20
Evaluation of Cox Proportional Hazards Models for Time to Onset of Decompression Sickness in Hypobaric Environments
Laura A. Thompson (UHCL), Raj S. Chhikara (UHCL), and Johnny Conkin (NASA-JSC)
Abstract
Cox semi-parametric proportional hazards models are fitted to a subset of data taken from
the Hypobaric Decompression Sickness Databank.1 The data bank contains records
on the time to decompression sickness (DCS) for over 130,000 person-exposures to high
altitude in chamber tests. A research data subset contains 1,321 records, with 87 percent
right-censoring, and includes recent experimental tests on DCS made available from NASA. A
Cox model stratified on the quartiles of the final ambient pressure at altitude described
the data better than did Cox models without this stratification. Models were assessed
statistically and validated. The Cox model and its subsequent evaluation presented here
serve as guidelines for future analyses of DCS survival data.
ALTHOUGH THE SPACE SHUTTLE IS PRESSURIZED AT 14.7 pounds per square inch absolute (psia) and the airlock on the International Space Station (ISS) at 10.2 psia, crew members are exposed to a greatly reduced absolute pressure during extravehicular activity (EVA) because their spacesuits are pressurized to about 4.3 psia.2 Astronauts risk decompression sickness (DCS) each time they perform EVAs. The risk can be decreased or prevented through sufficient denitrogenation prior to EVA. Denitrogenation procedures involve prebreathing pure oxygen or oxygen-enriched mixtures. Modeling the time to DCS in hypobaric situations similar to those experienced by astronauts on EVAs is the focus of our study.
We build on previous analyses of survival data on DCS (e.g., Conkin et al., 1996; 1998; and Conkin and Powell, 2001).3-5 These studies primarily evaluated parametric survival models, which require a specific parametric form for the hazard function. The Cox semi-parametric model does not assume a particular form for the hazard function. However, the Cox model assumes that hazard rates over time are proportional across different covariate combinations that are independent of time. We show how to assess this assumption for our data, and we demonstrate one method for modifying the model to account for non-proportionality across one of the covariates in our data. We illustrate additional evaluation and validation techniques for the Cox model applied to DCS data.
Cox Proportional Hazards Model
Cox proposed a semi-parametric model for the hazard function that allows the addition of
explanatory variables, or covariates, but keeps the baseline hazard as an arbitrary,
unspecified, nonnegative functional of time.6 The hazard function for
fixed-time covariates, x, is then
| l(t; x) = 0(t)exp(x¢) | (1) |
The baseline hazard 0 in (1) corresponds to the hazard function when all covariates equal zero. Because the baseline hazard is not assumed to be of a parametric form, Coxs model is referred to as a semi-parametric model for the hazard function. The survival function corresponding to (1) is7
| |
(2) |
One of the restrictions to using the Cox model with time-fixed covariates is its proportional hazards (PH) property. It follows from (1) that the hazard ratio between two sets of covariates is constant over time, because the common baseline hazard function cancels out in the ratio of the two hazards. Thus, for fixed-time covariates, the exponent of a coefficient describes the relative change in the baseline hazard due to that covariate.
To apply Coxs model to DCS data, we analyzed a subset of data from NASAs Hypobaric Decompression Sickness Databank (HDSD).1 The subset contained observations from recent experimental tests on DCS from NASAto develop safe pre-breathe procedures for EVA. This subset of data has been analyzed previously.4,8,9 Cox models studied here differ from the previous models since a different set of covariates is included and stratification on different covariates is considered. Furthermore, we use an extensive set of diagnostics to evaluate model fits. We have shown that the models currently fitted account for more variability in the data than do previously fitted models.10
Data
Altitude test exposure records used for this analysis are from 1,321 individuals who
participated in chamber tests at Johnson Space Center and who were selected from the HDSD
for an analysis in Conkin et al.4 During a test, subjects were monitored for
Doppler-detectable gas bubbles, and the test was terminated for a subject either upon
reported incidence of a DCS symptom or, if the subject did not experience DCS, when the
test period was over. Thus, an observation for a subject was either his or her reported
DCS onset time or the test termination time, whichever came first. If it was the test
termination time, the observation was considered Type I right-censored.7 The
data set contained 1,154 right-censored observations, or a little over 87 percent of the
total records, and the remaining 167 observations had DCS symptoms.
In addition, a number of explanatory variables were measured. The first variable, EXER, is an indicator variable showing whether the subject exercised repetitively during his or her time at altitude. Repetitive exercise simulates vigorous work activity required on an actual EVA. The second variable, P2, is the final ambient pressure reached in the exposure. PN2360 is the calculated partial pressure of nitrogen in a designated theoretical tissue compartment after prebreathing oxygen-enriched mixture prior to ascent in the chamber. The calculation is obtained from a nitrogen-elimination model detailed in Conkin et al.,3 where a theoretical compartment with a half-time elimination of 360 minutes was used. The last variable, TR360, is a ratio of PN2360 to P2. The ratio represents a decompression stress index. In the models we consider, we do not need to use both PN2360 and TR360 as covariates. We only use TR360 as it is a unit-less index of decompression stress, and can be used to make direct comparisons.
Estimated Hazard Functions
We explore the influence of the explanatory variables on the time to onset of DCS. A
hazard function estimate shows how the instantaneous risk of DCS changes by hour. For each
of the explanatory variables, we plotted a smoothed estimated hazard function (see Hess et
al.11 and Collett12). For the plots, we categorized the continuous
variables P2 and TR360 into quartiles and split the binary variable EXER into its two
categories, and then computed a hazard estimate for each category. The curves are plotted
in Fig. 1.

Figure 1. Estimated hazard functions by EXER, TR360 quartiles, and P2 quartiles
The estimated hazard for EXER = 1 initially increases much more rapidly than that for EXER = 0 and has its peak between one and two hours. The hazard rate declines after this and eventually is below the rate for EXER = 0 after about three hours. The hazard estimates for the TR360 categories display the order that is expected based on their implied risk. The estimates for the first two categories starting from the smallest category are mostly flat. The hazard estimates for the two highest TR360 categories increase initially, then decline. Thus, risk of DCS is clearly related to TR360, but it may gradually have less impact over time. The declining pattern appears across exercise conditions, as well. But, the hazard rate estimate for a particular TR360 category differs depending on whether or not exercise is done at altitude (lower right panel). The hazard is much greater in the initial hours when people exercise. Only the last two TR360 categories are shown in the figure for clarity, as the hazard functions are all very small for lower TR360. However, the kernel density estimate of the hazard rate is actually higher for EXER = 0 than for EXER = 1 when TR360 £ 1.31, at almost all time points. Thus, there may be an interaction between EXER and TR360 in their influence on time to DCS. Finally, the hazard estimates for P2 clearly show that the most hazardous category for P2 is between 4.3 and 6.0 psia.
Thus, Fig. 1 shows some evidence against the PH assumption. That is, it appears that the covariates may not act persistently on the hazard in that the estimated hazard rate for high risk groups (e.g., EXER = 1 or high TR360) is not at a roughly constant distance across time from the hazard rates for other groups and sometimes falls lower than that for lower risk groups. Also, the converging hazard estimates for categories of P2 and EXER by TR360 are not consistent with a PH assumption. We examine the assessment of PH in more detail in the next section, where we fit several Cox models and assess their fit.
Initial Fit of a Cox PH Model
We fit the Cox PH model in (1) to the DCS data using partial maximum likelihood estimation
(see Thompson et al.10). We included the
covariates discussed above, with the exception of PN2360. We also include an interaction
term in TR360 and EXER. After checking for appropriate functional forms for the continuous
covariates by substituting restricted cubic splines for each in the model, we concluded
that a quadratic term in P2 was appropriate.10 Table 1 shows the maximum
partial likelihood estimates (MPLEs) of the coefficients of this model (called Model 1).
We used mean deviations (P2 - P2) in P2 to
reduce the correlation between estimated coefficients on the linear and quadratic terms.
The standard errors in parentheses are obtained from the inverse of observed information,
as computed from the partial likelihood.
Table 1. Maximum Partial Likelihood Estimates for Initial Cox Model
|
Because of the interaction between TR360 and EXER, the relative change in risk for repetitive exercisers versus non-exercisers depends on TR360. For TR360 greater than about 1.32 (= 2.096/1.583), the predicted risk for exercisers is higher than for non-exercisers. Otherwise, the reverse is true. This relationship is consistent with the hazard estimates in Fig. 1 because risk is not uniformly higher for exercisers. In particular, as mentioned in the discussion following Fig. 1, the kernel density estimate of the hazard rate for non-exercisers was higher than that for exercisers when TR360 £ 1.31, for all time points.
Assessment of Proportional Hazards
In order to use the Cox model, we must check the assumption of whether the effects of
covariates on risk remain constant over time. For binary covariates in general, a
comparison of nonparametric hazard estimates may be sufficient to decide PH because if the
hazards were proportional, the hazard estimates for the two binary conditions would not
cross each other. For continuous covariates, it is not sufficient to rely only on
stratified hazard estimates to assess PH because the choice of stratification points is
subjective and arbitrary in some cases. One alternative is to use time-varying
coefficients,13 where one or more coefficients multiplying their respective
covariates in the linear predictor x¢ in (1) may vary
with time. If the coefficient multiplying a covariate is not constant over time, then the
impact of that covariate on the hazard varies over time, leading to non-PH. If PH holds, a
plot of the coefficient versus time will be a horizontal line.
Test of Time-Varying Coefficients in a Cox PH Model
For the jth covariate, Grambsch and Therneau13 express a time
varying coefficient as
| bj(t) = b j + g jg j(t) | (3) |
where gj(t) is a specific function of time. They show that the scaled Schoenfeld14 residuals from a Cox model have, for the jth covariate, a mean at time t of approximately g jg j(t). Thus, a plot of the scaled Schoenfeld residuals by the event times may assess whether the coefficient gj is zero (i.e., the residuals show no pattern), or what the function gj(t) might be. A linear regression line can also be fitted to the plot along with a test for zero slope (PH).
In Fig. 2, we have added the estimate of the regression coefficient to the scaled Schoenfeld residual to get a scatter-plot of the scaled Schoenfeld residuals by time for each single covariate from Model 1 of Table 1. In the figure, D.P2 = (P2 - P2). The smoothed curve in the plot is a natural spline with four degrees of freedom. It gives an indication of the path of the regression coefficient by time. Ninety-five percent confidence bands are also given by dotted lines, (Therneau and Grambsch, 2000, p. 134-135). The horizontal dashed line is the MPLE from the Cox model in Table 1. The p-values given in the right-hand corners come from a test of significant linear change in the coefficient over time (Therneau and Grambsch, 2000, p. 131-134). Because the p-values are all greater than 0.19, the residual plots do not indicate significant linear change in any of the coefficients over time. However, not all forms of non-PH can be detected using this technique.15 Thus, because the hazard rate plots in Fig. 1 suggest some non-PH, we conducted additional graphical assessments of PH in Thompson et al.10 These assessments suggested possible non-PH for the variables EXER and P2.

Figure 2. Test of time-varying coefficients for Model 1
To correct for possible non-PH, we stratify the model across levels of the covariates EXER and P2. We then assume that PH holds within each stratum.
Stratified Cox Proportional Hazards Model
The model in (1) can be extended to account for stratification. The strata divide the
subjects into disjoint groups, each of which has a distinct (arbitrary) baseline hazard
function but common values for the coefficients.15 The hazard function for an
individual i who belongs to stratum k
is then
| l(t; xi) = lk(t)exp(x¢ib). | (4) |
With a stratified Cox model, a PH structure is assumed to hold within each stratum. The relative effect of each predictor is assumed the same across strata, unless there is a significant strata-by-covariate interaction, which implies that the effect of the particular covariate differs within strata.
One disadvantage of using a stratified model is that an effect of the stratification covariate cannot be estimated in the model, at least in the sense of a coefficient estimate. However, if a model has been stratified on a continuous variable that has been categorized, it is possible to also include the continuous variable in the model and thus estimate a relative effect for that covariate. This is done for the continuous covariate P2.
Table 2 shows the MPLEs of two stratified Cox models. In the first model, we stratified on EXER conditions. In the second model, we stratified on quartiles of P2. Standard errors in parentheses were computed using the robust sandwich variance estimator of Lin and Wei.16 The first model also includes a coefficient for the interaction between EXER strata and TR360, implying that the relative effect of TR360 on the hazard differs within each EXER stratum. This coefficient is indicated by b 5 (TR360:strata(EXER)) in the table. This notation implies that the term b 5 is only present in the prognostic index, exp(x¢b), for exercisers. That is, 100exp(b1 + b5)% = 100exp(2.71 + 0.718)% » 3,081% represents the increase in risk per unit of TR360 for exercisers at altitude, and 100exp(b1)% = 100exp(2.710)% » 1,503% represents the increase in risk per unit of TR360 for non-exercisers at altitude. Thus, there is a substantial increase in risk if one is exercising repetitively at altitude.
Table 2. Maximum Partial Likelihood Estimates for Stratified Cox Models
|
The second model (Model 3) stratifies on quartiles of P2, but it also includes P2 as a continuous covariate, with both linear and quadratic terms. By including a coefficient for a quadratic effect of a stratified continuous covariate, Model 3 hypothesizes that the quadratic effect of P2 is the same within the range of P2 values for each stratum. According to AIC, both models fit the data better than does Model 1. The model stratifying on P2 (Model 3) appears to fit better, although some of the standard errors are higher than they are for Model 2. Model 3 may be preferable because it allows a coefficient estimate for the relative effect of EXER on the hazard rate.
The exponents of the MPLEs in the Cox model are estimates of the relative risk of the respective covariate as compared to a baseline. For a stratified model, these estimates apply to all strata. Model 3 says that for exercisers, for each one-unit increase in TR360, when P2 is held constant within a stratum, the hazard increases over 73-fold (100exp(2.581 + 1.711) » 73). For non-exercisers, the DCS risk increases only over 13-fold for each one unit increase in TR360. The modeled effect of P2 is always nonpositive but quadratic. Without considering stratification, the estimates indicate that as P2 rises to its mean of 6.20, with TR360 kept constant, the effect on the hazard of DCS symptoms increases. As P2 increases beyond its mean, the effect on the hazard decreases. But within a stratum, the continuous values of P2 are limited to those values the stratum represents.
Plots of expected survival for Model 3 are shown and discussed in Thompson et al. 2003.
Test of the PH Assumption for Stratified Cox Models
The assumption of PH within strata can be checked using timevarying coefficients. Due
to space considerations we only show the plots for Model 3. However, the conclusion is
that stratifying only on EXER does not seem to remove evidence of non-PH.
Figure 3 shows the results of this test for Model 3 for coefficients that vary linearly with time in hours. None of the p-values show significant linear change in the coefficient, and the fitted splines through the residuals show little deviation from horizontal lines. Furthermore, the MPLEs are contained within the 95 percent confidence intervals on the spline fit.

Figure 3. Test of time-varying coefficients when stratifying on P2 (Model 3)
Note that Fig. 3 does not reflect a model with P2 as both a stratifying variable and a continuous covariate. Thus, we cannot test for a time-varying P2 coefficient within each P2 stratum.
Lack of Fit in the Stratified Cox Model
A formal test of overall goodness-of-fit of the Cox model was proposed by Parzen and
Lipsitz.17 The test compares observed and model-based expected numbers of
events within covariate risk groups and computes a chi-square test. The covariate regions
are defined by predicted risk scores,
, where
is the MPLE from the fit of a Cox model. The cut-points of G regions
are defined by percentiles of the
values, called percentiles of risk, such that each category ideally
contains roughly the same number of observations. Each observation is classified into one
of these G categories depending on its risk score, and dummy variables are
introduced into a Cox model. The score test of the resulting set of coefficients
constitutes a significance test for overall fit of the Cox model. The sum of the observed
number of events minus the sum of the estimated martingale residuals (at infinity) within
each category gives the estimated expected count for that category.17
We performed this test for Model 3. We partitioned the risk scores into seven categories, as defined by break points given in Table 3. The categories were chosen to achieve the expected sample size rules given by Parzen and Lipsitz. The value of the score test was 7.80 with p = 0.253 (df = 6). Thus, we do not reject the hypothesis of model fit, at a significance level of 0.05. Furthermore, all other groupings we tried also did not reject the hypothesis of model fit. A comparison of the observed and expected counts by risk group appears in Table 3. The two risk groups representing roughly the fifth and sixth septile groups, are the worst predicted. Thus, although the model fits significantly well, there is room for improvement, particularly in the high risk area. We discuss lack of fit at the observation level in Thompson et al. (2003). The results there show that observations that are poorly fit by this model tend to be uncensored cases that experienced DCS sooner than expected.
Table 3. Observed Counts and Expected Counts for Model 3
|
|||||||||||||||||||||||||||||||||||||
| *Includes censored times. |
Model ValidationPredictive Accuracy of Cox Models
Verweij and Van Houwelingen18 describe a measure of predictive value from the
fit of a Cox PH model. Predictions from the model are made using the prognostic index (PI)
. First, a PI
is obtained for each individual i, based on a model without that observation,
. Then a Cox regression
is performed on the complete data set with
as the only covariate. The partial
log likelihood (PLH) from this model is denoted by l*(c), where c is
the regression coefficient for the PI. If l(c) denotes the PLH from
the fit of the original data to the PI
, then c = 1 maximizes l(c).
Thus, l(1) is considered a measure of fit to the data from which the model was
derived. Similarly, l*(1) measures the fit to future data. In addition, Verweij and
Van Houwelingen note that the coefficient estimate
that maximizes l*(c)
is a shrinkage factor that can be used to estimate the amount by which regression
coefficients are overestimated in the original regression. Applying this shrinkage factor
to the linear combinations
will give adjusted survival estimates.
The values of l*(1) and
for Models 1 through 3 are given in
Table 4. Standard errors for
are given in parentheses. Stratification from Models 2 and 3 was also
included in the calculation of l*(1). So, to interpret the values in Table 4
correctly, the future data set would have to be stratified on these variables as well.
Furthermore, the same cut-points for P2 are assumed to apply to the new data set.
Table 4. Predictive Accuracy of Models
|
According to Table 4, Model 3 has the highest measure of predictive value but the lowest shrinkage factor. However, the standard errors on the shrinkage factors are large enough to indicate that any "true" difference among them is very slight. The order in the estimates reflects model complexity. The most complex model is Model 3 because it has four strata and five coefficients, and the least complex model is Model 1. The most complex model requires the greatest amount of shrinkage when applied to a future data set. We have used the shrinkage factors to calibrate the model predictions in Thompson et al. (2003).
Frailty Models for DCS Data
The model stratified on quartiles of P2 suffers some lack of fit. We showed that some
individuals are not fit well by the model, and some risk groups are not as well-described
as others. There may also be unmeasured prognostic factors. The addition of
subject-specific frailty terms into the partial likelihood may help the fit of a
stratified Cox model. A frailty is an unobserved continuous random variable that describes
excess risk or frailtyfor distinct groups, such as families or even single
individuals, in addition to that described by measured covariates.19 Thus,
frailties are like unobserved covariates. Individuals with greater frailty are expected to
experience the event earlier than those with lower frailties. The addition of frailties to
the Cox model is similar to the addition of random effects. Previous work on the use of
random effects in DCS research is found in Thompson et al. (2002).20
To use frailties in a model, we must specify a distribution from which the frailties are assumed a random sample. Then, a confidence interval on the variance of the frailty distribution may be used to determine whether frailties can be used to account for unmeasured covariates. Thompson et al. (2003) give details on fitting a gamma frailty model to the DCS data described here.
They also show that models that exclude certain covariates that we include here have frailty variances that differ significantly from zero, implying that those terms are needed in the model. For example, the model fit by Chhikara et al. (1998) included only the covariates EXER, P2, and PN2360. With gamma frailties added to this model, the frailty variance is estimated at 1.48, and the likelihood ratio statistic is 5.92 with a p-value of 0.02. As one major role of frailties is to pick up excess variation that is not already modeled, we conclude that previously considered Cox models are likely inadequate for the DCS data.
References
1J. Conkin, S. Bedahl, and H. Van Liew. "A Computerized Databank of
Decompression Sickness Incidence in Altitude Chambers," Aviation, Space, and
Environmental Medicine 72 (1992): 202-14.
2P. Foster, J. Conkin, M. Powell, J. Waligora, and R. Chhikara. "Role of
Metabolic Gases in Bubble Formation during Hypobaric Exposures," J. Applied
Physiology (1998): 1088-95.
3J. Conkin, K. Kumar, M. Powell, P. Foster, and J. Waligora. "A
Probabilistic Model of Hypobaric Decompression Sickness Based on 66 Chamber Tests," Aviation,
Space, and Environmental Medicine 67 (1996): 1-8.
4J. Conkin, M. Powell, P. Foster, and J. Waligora. "Information About
Venous Gas Emboli Improves Prediction of Hypobaric Decompression Sickness," Aviation,
Space, and Environmental Medicine 69 (1998): 8-16.
5J. Conkin and M. Powell. "Lower Body Adynamia as a Factor to Reduce the
Risk of Hypobaric Decompression Sickness," Aviation, Space, and Environmental
Medicine 72 (2001): 202-14.
6D. Cox. "Regression Models and Life Tables," J. Royal Statistical
Soc. B 34 (1972): 187-202.
7J. Lawless. Statistical Models and Methods for Lifetime Data, New York:
Wiley, 1982.
8R. Chhikara, K. Koti, and F. Spears. "Cox Regression Modeling and
Analysis of Decompression Sickness Data," ISSO Y2000 Annual Report, University
of Houston, 1998.
9T. English. Goodness of Fit Test for the Cox Proportional Hazards Model,
M.A. Thesis, University of Houston-Clear Lake, Houston, TX, 2000.
10L. Thompson, R. Chhikara, and J. Conkin. "Cox Proportional Hazards
Models for Modeling the Time to Onset of Decompression Sickness," NASA Technical
Publication 2003-210791, Houston: Johnson Space Center, 2003.
11K. Hess, D. Serachitopol, and B. Brown. "Hazard Function Estimators: A
Simulation Study," Statistics in Medicine 18 (1999): 3075-88.
12D. Collet. Modeling Survival Data in Medical Research. London: Chapman
& Hall, 1994.
13P. Grambsch and T. Therneau. "Proportional Hazards Tests and Diagnostics
Based on Weighted Residuals," Biometrika 81 (1994): 515-26.
14D. Schoenfeld. "Partial Residuals for the Proportional Hazards
Regression Model," Biometrika 69 (1982): 239-41.
15T. Therneau and P. Grambsch. Modeling Survival Data: Extending the Cox
Model. New York: Springer-Verlag, 2000.
16D. Lin and L. Wei. "The Robust Inference for the Cox Proportional
Hazards Model," J. American Statistical Association 84 (1989): 1074-78.
17M. Parzen and S. Lipsitz. "AGlobal Goodness-of-Fit Statistic for Cox
Regression Models, Biometrics 55 (1999): 580-84.
18P. Verweij and H. Van Houwelingen. "Cross-Validation in Survival
Analysis," Statistics in Medicine 12 (1993): 2305-14.
19T. Therneau, P. Grambsch, and S. Pankrantz. Penalized Survival Models and
Frailty. Technical Report Series No. 66, Department of Health Science Research, Mayo
Clinic, Rochester, MN, (2000).
2 0L. Thompson, J. Conkin, R. Chhikara, and M. Powell. "Modeling Grade IV
Venous Gas Emboli Using a Limited Failure Population Model with Random Effects," NASA
Te c h n i c a l Publication 2002-210781, Houston: Johnson Space Center, May 2002.
21J. Klein and M. Moeschberger. Survival Analysis: Techniques for Censored and
Truncated Data. New York: Springer-Verlag, 1997.
Publications
Thompson, L., R. Chhikara, and J. Conkin.
"Cox Proportional Hazards Models for Modeling the Time to Onset of Decompression
Sickness," NASA Technical Publication 2003-210791, Houston: Johnson Space Center,
(2003).
Thompson, L. and R. Chhikara. "Evaluation of Cox Proportional Hazards Models for Time
To Onset of Decompression Sickness in Hypobaric Environments," J. Data Sciences
(2003). (Submitted.)
Investigative Team UHCL PI: Raj S. Chhikara, Ph.D., Professor NASA-JSC Co-PI: Johnny Conkin, Ph.D. UHCL PDAF: Laura A. Thompson, Ph.D. |
PDF (K)
Table of Contents
Institute for Space Systems Operations - Y2002
Annual Report
Copyright © 2003
|
|