Skip to content
Surf Wiki
Save to docs
science/medicine

From Surf Wiki (app.surf) — the open knowledge base

Performance of the score systems Acute Physiology and Chronic Health Evaluation II and III at an interdisciplinary intensive care unit, after customization


Authors: Rainer Markgraf, Gerd Deutschinoff, Ludger Pientka, Theo Scholten, Cristoph Lorenz, KF Bodmann, B Ehlers, U Häbel, P Ritschel, K Rühmkorf, WA Knaus, EA Draper, DP Wagner, JE Zimmerman, SM Shortell, JE Zimmerman, DM Rousseau, RR Gillies, DP Wagner, EA Draper, WA Knaus, J Duffy, DH Beck, BL Taylor, B Millar, GB Smith, R Moreno, DR Miranda, V Fidler, R Van Schilfgaarde, KM Rowan, JH Kerr, E Major, K McPherson, A Short, MP Vessey, KM Rowan, JH Kerr, E Major, K McPherson, A Short, MP Vessey, S Lemeshow, JR Le Gall, R Moreno, G Apolone, JR Le Gall, S Lemeshow, G Leleu, J Klar, J Huillard, M Rué, D Teres, A Artigas, X Sarmiento, M Rué, JJ Guardiola, JM Toboso, M Soler, A Artigas, WA Knaus, EA Draper, DP Wagner, JE Zimmerman, WA Knaus, DP Wagner, EA Draper, JE Zimmerman, M Bergner, PG Bastos, CA Sirio, DJ Murphy, T Lotring, A Damiano, FE Harrell, DW Hosmer, S Lemeshow, JA Hanley, BJ McNeil, J Rapoport, D Teres, S Lemeshow, S Gehlbach, S Lemeshow, DW Hosmer, R Markgraf, G Deutschinoff, L Pientka, T Scholten

Journal: Critical Care (2001)

DOI: 10.1186/cc975

Abstract

Mortality predictions calculated using scoring scales are often not accurate in populations other than those in which the scales were developed because of differences in case-mix. The present study investigates the effect of first-level customization, using a logistic regression technique, on discrimination and calibration of the Acute Physiology and Chronic Health Evaluation (APACHE) II and III scales. Probabilities of hospital death for patients were estimated by applying APACHE II and III and comparing these with observed outcomes. Using the split sample technique, a customized model to predict outcome was developed by logistic regression. The overall goodness-of-fit of the original and the customized models was assessed. Of 3383 consecutive intensive care unit (ICU) admissions over 3 years, 2795 patients could be analyzed, and were split randomly into development and validation samples. The discriminative powers of APACHE II and III were unchanged by customization (areas under the receiver operating characteristic [ROC] curve 0.82 and 0.85, respectively). Hosmer-Lemeshow goodness-of-fit tests showed good calibration for APACHE II, but insufficient calibration for APACHE III. Customization improved calibration for both models, with a good fit for APACHE III as well. However, fit was different for various subgroups. The overall goodness-of-fit of APACHE III mortality prediction was improved significantly by customization, but uniformity of fit in different subgroups was not achieved. Therefore, application of the customized model provides no advantage, because differences in case-mix still limit comparisons of quality of care.

Background:

Mortality predictions calculated using scoring scales are often not accurate in populations other than those in which the scales were developed because of differences in case-mix. The present study investigates the effect of first-level customization, using a logistic regression technique, on discrimination and calibration of the Acute Physiology and Chronic Health Evaluation (APACHE) II and III scales.

Method:

Probabilities of hospital death for patients were estimated by applying APACHE II and III and comparing these with observed outcomes. Using the split sample technique, a customized model to predict outcome was developed by logistic regression. The overall goodness-of-fit of the original and the customized models was assessed.

Results:

Of 3383 consecutive intensive care unit (ICU) admissions over 3 years, 2795 patients could be analyzed, and were split randomly into development and validation samples. The discriminative powers of APACHE II and III were unchanged by customization (areas under the receiver operating characteristic [ROC] curve 0.82 and 0.85, respectively). Hosmer-Lemeshow goodness-of-fit tests showed good calibration for APACHE II, but insufficient calibration for APACHE III. Customization improved calibration for both models, with a good fit for APACHE III as well. However, fit was different for various subgroups.

Conclusions:

The overall goodness-of-fit of APACHE III mortality prediction was improved significantly by customization, but uniformity of fit in different subgroups was not achieved. Therefore, application of the customized model provides no advantage, because differences in case-mix still limit comparisons of quality of care.

Introduction

].

Calibration measures how closely mortality prognosis fits the observed mortality. Poor calibration in a patient sample does not necessarily mean that the quality of care in that particular ICU is better or worse than in the development sample.

]. In the latter case, the customized score can only be used in this subset of patients. If a customized model derived from the whole population is used, then uniformity of fit for the relevant subgroups should still be tested. Knowledge of the influence of subgroups is important, because future changes in case-mix may compromise the improvement achieved by customization.

The aim of the present study was to test the performance of APACHE II and III, after customization of these scales for use in future assessment of quality of care in our unit.

Patients

Over a 3-year period (October 1991-October 1994), 3382 patients were consecutively admitted to the 12-bed interdiciplinary ICU of a 571-bed, university-affiliated community hospital. For the APACHE II analysis, 274 patients who were readmitted to the ICU, 208 patients who were in the ICU for less than 4 h, 16 patients who were admitted for dialysis only, two patients who were younger than 16 years and 87 patients with missing data were excluded. Thus, 2795 patients were included in the analysis. For APACHE III, 79 patients who were admitted to rule out myocardial infarction and 55 cardiosurgical patients were excluded, leaving 2661 for analysis.

Data collection

]. The data were collected by ward doctors after 4 weeks training in how to use the APACHE system. They had access to a detailed manual, including definitions and procedures. Constant supervision by a documentation assistant included regular comparison of the original with the collected data, and review of completeness. In order to assess reliability of data collection, data from a random sample of 50 patients were recorded by two data collectors independently. Interobserver reliability was analyzed by Kendall's coefficient of concordance and κ statistics. In addition, data collection software, which was provided by APACHE Medical Systems Inc (Washington, DC), automatically checked that the data were plausible. The whole data set was tested using a box-plot technique in order to analyze extreme values seperately. Vital status at hospital discharge was recorded.

Statistical analysis

] led to the following equation:

(1)

encompasses the various patient factors that are included in the model. The probability of hospital death is calculated as follows:

(2)

]. The APACHE III equation was provided by APACHE Medical Systems Inc, and it has not been published for commercial reasons. In customizing the scales, the original logit was used as the independent variable and hospital death was used as the dependent variable. The new probability of hospital death was calculated as follows:

(3)

and logit(cust) is calculated as follows:

× logit(or)]     (4)

.

test and Mann-Whitney U test, because values were not normally distributed.

< 0.05 was considered statistically significant.

Results

Completeness of data was good; excluding just one variable (24-h urine), 94.6% of all necessary data were collected on average for each patient; 24-h urine was available in only 78.1% of patients. Reliability analysis revealed Kendall's coefficients for clinical and laboratory data above 0.9 except for blood gas values (0.878) and 24-h urine (0.870). κ values were low only for diagnosis of renal failure (0.49) and Glasgow Coma Scale score (0.54). Despite that, differences in calculated scores were very low, with Kendall's coefficients above 0.92. Thus, overall reliability of data collection was good.

), and no significant differences were detected. Both models showed good discrimination, which was unchanged by customization.

) reveal that calibration after customization was good for APACHE II up to the 70-80% mortality risk decile, but was still far from ideal for APACHE III. When interpreting the greater deviations from the ideal line in the 80-90% and 90-100% deciles, the small numbers of cases in these groups have to be borne in mind.

for APACHE III. Fit was not uniform for APACHE II, with varying SMRs. Goodness-of-fit was insufficient for patients younger than 65 years and for those directly admitted. Although goodness-of-fit improved for most subgroups after customization, it was still not uniform. These findings were similar for APACHE III. Goodness-of-fit was insufficient for medical, younger, directly admitted and cardiovascular patients. Fit was improved for all but younger and transferred patients. However, it was still not uniform after customization.

Discussion

Customization of APACHE II and III in a large patient population from a single unit led to an improvement in the overall goodness-of-fit of APACHE III, which showed poor calibration in its original version. Despite a similar improvement of fit in several subgroups that were large enough to be tested, good uniformity of fit was not achieved.

] that analyzed customization of the Mortality Prediction Model. In that study, a second-level customization, in which new coefficients were developed for all single patient factors included in the original model, improved calibration even further. Second-level customization was not attempted in the present patient sample because there were not enough patients for that purpose. Time to collect data in a sufficiently large patient sample in a single unit would probably be so great that real changes in case-mix or ICU treatment might occur during the study, which would confound the results. First-level customization will probably be a more practical method for single units to improve the overall fit of score systems that are to be used for quality assessment.

At present, however, we would not recommend customization routinely. This is because a major problem is still unresolved; although good calibration can be achieved for the whole patient sample, uniformity of fit remains unsatisfactory. This is the case even for APACHE III, which accounts for more case-mix factors, such as diagnostic categories and lead time, than do the other models. Nevertheless, achievement of uniformity is important, because change in case-mix over time will otherwise lead to a loss of accuracy of a customized model. It would be difficult to interpret whether a change in the mortality ratio over time would be due to a change in quality of care or in case-mix.

]. This could be attempted in medical and cardiovascular patients at our unit for APACHE III, because these groups are sufficiently large and because general customization did not lead to a good fit. However, the practicality of such an approach is questionable.

]. Data collection is ongoing to test variation in the original and customized scores over time in a second large sample from our unit.

Figures and Tables

Calibration curves for APACHE II and APACHE III before and after customization for development and validation samples. The diagonal line is the line of ideal correspondence between observed and expected mortality. The solid line represents the development sample and the dotted line the validation sample. Case numbers for the development sample are represented by white bars, and those of the validation sample by grey bars.

Customized coefficients used in the calculation of probability of hospital death

Demographic and clinical data for APACHE II sample

There were no differences for the APACHE III sample (1772/889 cases). Direct admission is defined as admission from emergency room, operating theatre, or recovery room; and transferral is defined as admission from other hospital, other ICU, or floor. DOS, duration of stay.

Discrimination and calibration of APACHE II and III before and after customization

< 0.001. C, Hosmer-Lemeshow goodness-of-fit C test; H, Hosmer-Lemeshow goodness-of-fit H test; CI, confidence interval.

Discrimination and calibration of APACHE II before and after customization for various subgroups of the validation sample

= 0.01.

Discrimination and calibration of APACHE III before and after customization for various subgroups of the validation sample

= 0.01.

Keywords

  • Acute Physiology and Chronic Health Evaluation
  • customization
  • logistic regression
  • mortality prediction
  • severity of illness
Want to explore this topic further?

Ask Mako anything about Performance of the score systems Acute Physiology and Chronic Health Evaluation II and III at an interdisciplinary intensive care unit, after customization — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report