ADVERTISEMENT

Issue Navigator

Volume 14 No. 04
Earn CME
Accepted Papers





Scientific Investigations

High Interrater Reliability of Overnight Pulse Oximetry Interpretation Among Inexperienced Physicians Using a Structured Template

Mirna Ayache, MD, MPH1,2; Kingman P. Strohl, MD1,2
1Division of Pulmonary, Critical Care and Sleep Medicine, University Hospitals Cleveland Medical Center, Cleveland, Ohio; 2Division of Pulmonary, Critical Care and Sleep Medicine, Louis Stokes Cleveland VA Medical Center, Cleveland, Ohio

ABSTRACT

Study Objectives:

To assess the interrater reliability and accuracy of overnight pulse oximetry (OPO) interpretations by pulmonary fellows using a comprehensive structured template and after a brief educational session.

Methods:

Using the template, four pulmonary and critical care (PCC) fellows interpreted OPO saturation waveforms and parameters extracted from 50 adult consecutive in-laboratory sleep studies. The template included three saturation parameters (mean saturation, oxygen desaturation index [ODI], and cumulative desaturation time) and description of the saturation waveform. A scoring system was proposed combining waveform characteristics and ODI to determine the suspicion for moderate to severe sleep apnea. Waveform description and mean saturation determined the suspicion for cardiopulmonary disease (CPD). Cumulative desaturation time determined need for oxygen prescription. Apnea-hypopnea index was extracted from the sleep study results.

Results:

The overall interrater reliability for final recommendations (sleep apnea suspicion, presence of CPD, and oxygen prescription) was high (kappa = .81, 95% confidence interval [CI] .76–.88). Good agreement was noted in CPD evaluation and suspicion of moderate to severe sleep apnea (kappa = .70, 95% CI .46–.86 and kappa = .65, 95% CI .56–.77 respectively). The interrater reliability for oxygen prescription was in an excellent range (kappa = .98, 95% CI .91–1.00). The accuracy of a high sleep apnea suspicion score in detecting apnea-hypopnea index ≥ 15 events/h ranged from 88.0% to 94.0% (sensitivity 91.3% to 95.7%, specificity 81.5% to 92.6%). Desaturations due to CPD were identified by 75% of the raters as desaturations due to conditions other than sleep apnea.

Conclusions:

A structured template for OPO interpretation can produce a high interrater agreement and good accuracy, and make it a reliable clinical tool.

Commentary:

A commentary on this article appears in this issue on page 497.

Citation:

Ayache M, Strohl KP. High interrater reliability of overnight pulse oximetry interpretation among inexperienced physicians using a structured template. J Clin Sleep Med. 2018;14(4):541–548.


BRIEF SUMMARY

Current Knowledge/Study Rationale: There are no professional guidelines or formal training of pulmonary and critical care (PCC) fellows in overnight pulse oximetry (OPO) interpretation. Therefore, it is not surprising that there is significant variation and poor reliability across PCC physicians' interpretation of this relatively common test.

Study Impact: This study provides a tool (a comprehensive structured template for OPO interpretation) to reduce variation and produce a reliable interpretation of OPO by physicians without sleep training or experience in OPO interpretation. Implementation of this tool will help physicians across the country to systematically interpret OPO tests within its limitations.

INTRODUCTION

Overnight pulse oximetry (OPO) is a measure of a patient's arterial oxygen saturation and heart rate continuously over time.1 Empirically, its major use is for the detection of nocturnal hypoxemia requiring oxygen therapy and is commonly arranged by durable medical equipment providers and done by a third party. Results are sent to the physician to support an oxygen prescription. OPO is also classified as a type 4 sleep study (unattended with one or two channels)2 and some have suggested that it is an effective, objective test in a clinical management pathway for sleep-disordered breathing.3 However, there is little attention to physicians' interpretation of OPO when a specified disease is not known. As previously reported, there is a significant variation in OPO interpretations among pulmonary physicians that questions its clinical utility and underscores the need for standardization of training and interpretation.4 One explanation for this variation is believed to be that OPO is frequently presented to the clinician without immediate access to clinical indications or comorbidities leading to incomplete or misleading contexts. However, an alternative explanation is that there is a lack of training or awareness of published evidence. In a pilot for this study, interrater reliability in OPO interpretation among pulmonary and critical care (PCC) physicians remained poor despite the assistance of an evidence-based worksheet.5 Here, we address this challenge by developing and testing the feasibility of a structured template for the interpretation of any OPO report, one with numeric cutoffs and specific instructions for saturation waveform description. The hypothesis of this study is that the structured template along with an educational session will allow reliable, comprehensive OPO interpretation by PCC fellows who have no prior experience in OPO interpretation. The second aim of the study is to correlate the validity of the physician's interpretations of OPO to a polysomnography (PSG) report.

METHODS

Because continuous oximetry is part of an overnight PSG, an OPO-like record was constructed by extracting oximetry data over time from 50 adult (age older than 18 years) consecutive in-laboratory sleep studies done at an academic center over a 2-week period in January 2016. All studies were included regardless of whether they were diagnostic or all-titration studies, so that a range of waveform patterns could be presented for interpretation. In the case of split-night studies, only the diagnostic part was used to prevent interpreter confusion about abrupt change in saturation waveform pattern. In addition to the time trace of arterial oxygen saturation and pulse rate, physicians were provided with patient demographics (age, body mass index), study characteristics (duration and whether done on room air or oxygen), mean oxygen saturation, recording time spent below saturation 88% and 90%, oxygen desaturation index (ODI) defined as the number of 4% desaturations or greater divided by recording time, and mean pulse rate (Figure 1). Institutional review board approval was obtained prior to study initiation.

A sample of the pulse oximetry reports presented to the physicians for interpretation (showing sleep apnea waveform pattern in this case).

BMI = body mass index, HR = heart rate, ODI = oxygen desaturation index.

jcsm.14.4.541a.jpg

jcsm.14.4.541a.jpg
Figure 1

A sample of the pulse oximetry reports presented to the physicians for interpretation (showing sleep apnea waveform pattern in this case).

(more ...)

After obtaining informed consent, four PCC fellows without prior experience or training in OPO interpretation were independently asked to interpret the 50 OPO reports using the template shown in Figure 2. This template was developed using the available evidence from the literature on pulse oximetry in cardiopulmonary disease (CPD), sleep-related breathing disorders, and oxygen prescription.615 Oxygen prescription was based on Medicare criteria, which is cumulative time 5 minutes or more with saturation ≤ 88%.15 The template was designed to guide the interpreters to decide whether the saturation waveform description pattern is consistent with sleep apnea desaturation and whether it is consistent with desaturation due to disorders other than sleep apnea. Such waveforms are characterized by slow recovery compared to abrupt resaturation in sleep apnea and are due to disorders such as CPD and hypoventilation. The template asked for recommendations for CPD evaluation through the interpretation of two parameters: mean saturation and saturation waveform description. A moderate to severe sleep apnea suspicion scoring system was developed based on ODI and saturation waveform description for sleep apnea69 (Table 1).

Highly structured template for overnight pulse oximetry interpretation.

jcsm.14.4.541b.jpg

jcsm.14.4.541b.jpg
Figure 2

Highly structured template for overnight pulse oximetry interpretation.

(more ...)

Scoring system for suspicion of moderate to severe sleep apnea.

jcsm.14.4.541.t01.jpg

table icon
Table 1

Scoring system for suspicion of moderate to severe sleep apnea.

(more ...)

Prior to OPO report interpretation, each fellow received an approximately 45-minute education session about the template structure and saturation waveform interpretation. The fellows were not informed that the OPO reports were extracted from sleep studies and were blinded to the sleep study results. Apnea-hypopnea index (AHI) was extracted from the official reports of PSG scored using Centers for Medicare and Medicaid Services criteria. The PSG was considered positive for nocturnal desaturations due to CPD if the decreases in oxygen saturation were not associated with scored apnea or hypopnea events, and artifact was excluded. The diagnosis of nocturnal desaturations not related to sleep apnea was decided by board-certified sleep specialists who interpreted the PSG tests.

Sample Size and Data Analysis

The number of oximetry reports was chosen based on spectrum of oximetry tracings and the time of voluntary participation of PCC fellows. After extracting 50 pulse oximetry reports from 50 consecutive sleep studies, we reviewed more reports that did not increase variation in the oximetry profiles. Moreover, assuming 40% prevalence of AHI ≥ 15 events/h in the sample, 80% power and value of P < .05, a sample size of 50 reports was considered a sufficient number for sensitivity and specificity of 50% as null hypothesis and 80% as alternate hypothesis.16

Multiple-rater kappa statistic was used to measure the interrater reliability. Confidence intervals (CIs) for kappa statistic were calculated using the bootstrap method. In the subanalysis of agreement in the suspicion of sleep apnea, smaller sample sizes did not allow for calculation of meaningful CI of kappa statistic. Therefore, P values were calculated for kappa = 0 (agreement is the same as would be expected by chance) with significance level alpha = .01. The sensitivity, specificity, accuracy, and likelihood ratios were calculated using STATA 12 software (StataCorp LLC, College Station, Texas, United States).

RESULTS

Interrater Reliability of OPO Interpretations

Of 50 sleep studies, 25 were diagnostic, 13 were continuous titration, and 12 were split studies. Mean age was 47.8 (± 15.9) years, mean body mass index 35.0 (± 7.8) kg/m2, mean AHI 24 (± 30.4) events/h, and median AHI 9.0 events/h. The overall interrater reliability for final recommendations (CPD evaluation, sleep apnea suspicion, and oxygen prescription) was high (kappa = .81, 95% CI .76–.88; Table 2). Good agreement was noted in CPD evaluation and suspicion of sleep apnea (kappa = .70, 95% CI .46–.86 and kappa .65, 95% CI .56–.77 respectively). The interrater reliability of oxygen prescription was in an excellent range (kappa = .98, 95% CI .91–1.00). In subanalysis of agreement in the suspicion of sleep apnea, excellent agreement was noted in the high suspicion category and good agreement was noted in the low suspicion group (kappa = .91, P < .0001 and kappa = .72, P < .0001 respectively). However, agreement was fair in the high and low moderate suspicion category (kappa = .24, P < .0001 and kappa = .22, P = .001 respectively) (Table 3).

Diagnostic accuracy of saturation waveform for detection of AHI ≥ 15 events/h.

jcsm.14.4.541.t02.jpg

table icon
Table 2

Diagnostic accuracy of saturation waveform for detection of AHI ≥ 15 events/h.

(more ...)

Interrater reliability of physicians' interpretations for each level of suspicion of moderate to severe sleep apnea.

jcsm.14.4.541.t03.jpg

table icon
Table 3

Interrater reliability of physicians' interpretations for each level of suspicion of moderate to severe sleep apnea.

(more ...)

Diagnostic Accuracy for Moderate to Severe Sleep Apnea

The sensitivity, specificity, accuracy, and likelihood ratios of saturation waveform description for each rater are depicted in Table 2. Positive or indeterminate waveform for sleep apnea was associated with high sensitivity for detection of AHI ≥ 15 events/h (range 95.65% to 100.00%) but low specificity (range 48.15% to 74.07%). The sensitivity, specificity, and accuracy for ODI ≥ 10 events/h to detect AHI ≥ 15 events/h was 95.65%, 81.48%, and 88.00% respectively (Table 4). Diagnostic accuracy metrics of moderate to severe sleep apnea suspicion score (defined in Table 1) for detection of AHI ≥ 15 events/h are depicted in Table 5. The accuracy for score 3 in detection of AHI ≥ 15 events/h ranged from 88.00% to 94.00% with sensitivities ranging from 91.30% to 95.65% and specificities ranging from 81.48% to 92.59%. A lower score cutoff ≥ 1 was associated with high sensitivity (100.0%) for all raters but low specificity ranging between 44.44% and 62.97%.

Diagnostic accuracy of ODI ≥ 10 events/h for detection of AHI ≥ 15 events/h.

jcsm.14.4.541.t04.jpg

table icon
Table 4

Diagnostic accuracy of ODI ≥ 10 events/h for detection of AHI ≥ 15 events/h.

(more ...)

Diagnostic accuracy of moderate to severe sleep apnea suspicion score for detection of AHI ≥ 15 events/h.

jcsm.14.4.541.t05.jpg

table icon
Table 5

Diagnostic accuracy of moderate to severe sleep apnea suspicion score for detection of AHI ≥ 15 events/h.

(more ...)

Detection of Desaturations Due to Disorders Other Than Sleep Apnea

Desaturations due to CPD were rare (2/50 = 4%) in the sample and were in patients with chronic obstructive pulmonary disease (COPD) (Figure 3). Each of these reports was identified as consistent with desaturations due to disorders other than sleep apnea by 75% (3/4) of the raters. However, the four raters identified 2, 5, 1, and 7 other reports, respectively, of 50 as also consistent with desaturations due to disorders other than sleep apnea. This suggests poor specificity of waveform desaturations not typical for sleep apnea in diagnosing desaturations due to CPD.

Nocturnal desaturations due to COPD.

Nocturnal desaturations due to COPD in patients with mild sleep apnea (report 13, AHI 4.8 events/h) and no sleep apnea (report 50, AHI 0 events/h). In report 13, the first arrow points to starting oxygen 2 L/min and second arrow points to lowering oxygen to 1 L/min nasal cannula. AHI = apnea-hypopnea index, COPD = chronic obstructive pulmonary disease, HR = heart rate.

 

jcsm.14.4.541c.jpg

jcsm.14.4.541c.jpg
Figure 3

Nocturnal desaturations due to COPD.

(more ...)

DISCUSSION

The results of this study confirm the hypothesis that a template approach would be associated with good interrater reliability among pulmonary fellows without prior experience in OPO interpretations. In addition, there was overall good agreement on final recommendations (ie, suspicion for CPD, suspicion for moderate to severe sleep apnea, and oxygen prescription).

The second aim of the study was to address validity. In this sample, the sensitivity, specificity, and accuracy for ODI ≥ 10 events/h to detect AHI ≥ 15 events/h was 95.65%, 81.48%, and 88.00%, respectively. Chung et al. found comparable results with a sensitivity of 93%, a specificity of 75%, and an accuracy of 82% for ODI > 10 events/h to detect moderate to severe sleep apnea in preoperative patients who underwent portable PSG and pulse oximetry.9 Although it remains undetermined whether the saturation waveform description improves the diagnostic accuracy of sleep-disordered breathing compared to ODI alone, the proposed moderate to severe sleep apnea scoring system provides a way of incorporating the waveform description with greater emphasis on ODI because it is better validated.69 Further studies with larger sample sizes are required to evaluate such a scoring system. It is important to note that a theoretical advantage of waveform interpretation is detection of moderate to severe sleep apnea when ODI is falsely low due to decreased sleep to recording time percentage. In the setting of low actual sleeping time during OPO recording, moderate to severe sleep apnea will still likely manifest as waveform desaturations positive for sleep apnea, but with low ODI as calculated by dividing the relatively small number of desaturations over recording time. However, baseline saturation will affect the relationship between ODI and apnea and/or hypopnea events based on the oxygen hemoglobin dissociation curve. For example, young healthy patients who have high baseline saturation approaching 99% may have major apnea and/or hypopnea events without a significant (3% or 4%) drop in their oxygen saturation, resulting in a false-negative pulse oximetry. Those with a baseline saturation of 92% to 93% are on the edge of the curve and may drop 4% even with a short event duration. This is an intrinsic limitation of ODI in the diagnosis of sleep apnea, and the interpretation of saturation waveform is unlikely to mitigate this problem. Similarly, high altitude may enhance detection of events with ODI but no normative data are available on this issue.

Because the desaturations due to CPD were rare in our sample, we are unable to calculate diagnostic accuracy metrics. However, the two reports with desaturations due to COPD were detected by 75% of the raters. The raters, however, identified other reports (range 1–7) as positive for desaturations due to disorders other than sleep apnea. This could be due to overinterpreting artifact versus desaturations because of other causes such as hypoventilation. This observation supports a template organization to evaluate the saturation waveform pattern for disorders other than sleep apnea such as COPD, interstitial lung disease, neuromuscular disorders, and hypoventilation. Further studies are needed to accurately distinguish specific diseases among disorders causing desaturations not typical for sleep apnea.

The presented format of the pulse oximetry reports focuses on the parameters with established diagnostic importance in the literature. Other parameters not mentioned in the template and usually present in OPO reports, such as minimum saturation, average event duration, and number of events < 88%, can be distracting and overwhelming but should be presented to the physicians because they could be useful in certain clinical scenarios. Additional comments can be added per the physician's clinical judgement.

The strengths of this study include the inclusion of inexperienced physicians in training to interpret the OPO reports. This makes the template practical to PCC physicians who do not have advanced sleep medicine training. Moreover, the template can be easily implemented in institutions without the need for special software. Although the diagnostic value of OPO is probably best when ordered for a specific purpose and interpreted by the physician taking care of the patient, the template allows standardized interpretation in the circumstances in which the patient is not well known to the interpreter. Finally, the incorporation of the saturation waveform description into the template, with relatively good accuracy for the detection of moderate to severe sleep apnea, sheds light on whether PCC physicians could be trained in recognizing desaturation patterns and then testing whether this would be helpful in practice. Under appropriate conditions, an OPO waveform may be obtained in the in-patient setting where a sleep study cannot be done and sleep specialists may not be available.

There are limitations to our study. First, the pulse oximetry reports were extracted from attended in-laboratory sleep studies and therefore are likely to have less artifact than many OPO tests that are conducted in patients' homes. Moreover, portable pulse oximetry devices may not have the PSG oximeter high-resolution 1-second signal averaging times, and this may affect detection of desaturations.17 Second, the study did not address the effect of the pretest probability of the OPO interpretations. The OPO reports were extracted from sleep studies that were ordered by sleep specialists for suspicion of sleep disorders, and therefore the overall pretest probability for sleep apnea was high in this sample. Further studies are needed to evaluate the template combined with a clinical suspicion related to this intent of the OPO. Third, the prevalence of desaturations due to CPD was low in this sample and there were no examples of Cheyne-Stokes breathing. However, this template approach provides an opportunity to conduct more research and have a better understanding of desaturations due to disorders other than sleep apnea. The desaturations due to COPD in our sample ranged between 20 minutes to more than 1 hour, and had low nadir saturation (< 90%); however, further studies are needed to define such desaturations in terms of duration, drop in saturation, and nadir saturation. Fourth, the outcomes (AHI and desaturations due to CPD) were extracted from the official reports of the PSG and are subject to inherent errors. Fifth, the template did not address pulse (heart rate) variations and association with desaturations, and therefore additional comments about the pulse may be needed when using the template for interpretation.

The template was designed to provide a simple evidence-based tool for OPO interpretation. However, additions can be made to expand and improve the template. For instance, recognizing artifact in different forms is probably best achieved by providing examples in the educational session, but a detailed description of artifact can also be added to the template. Moreover, desaturation clustering in three or four groups may indicate rapid eye movement–related events and the concentration of desaturations in one part of the study may be due to a positional effect.

In conclusion, a highly structured template, along with a brief educational session, allows the comprehensive interpretation of OPO with good interrater reliability among PCC fellows with no prior experience in OPO interpretation. This study suggests that the template is also associated with good diagnostic accuracy for moderate to severe sleep apnea detection. Further studies are needed regarding accurately characterizing desaturations due to other disorders and differentiating these desaturations from sleep apnea pattern.

DISCLOSURE STATEMENT

The authors have seen and approved the manuscript. Work for this study was performed at University Hospitals Cleveland Medical Center. Dr. Ayache reports no conflicts of interest. Dr. Strohl reports consulting for Inspire Medical Systems, Sommetrics, and Galvani Bioelectronics. He reports no conflicts of interest relevant to this manuscript.

ABBREVIATIONS

AHI

apnea-hypopnea index

CPD

cardiopulmonary disease

COPD

chronic obstructive pulmonary disease

CI

confidence interval

ODI

oxygen desaturation index

OPO

overnight pulse oximetry

PCC

pulmonary and critical care

PSG

polysomnography

REFERENCES

1 

Netzer N, Eliasson AH, Netzer C, Kristo DA. Overnight pulse oximetry for sleep-disordered breathing in adults: a review. Chest. 2001;120(2):625–633. [PubMed]

2 

Collop NA, Anderson WM, Boehlecke B, et al. Clinical guidelines for the use of unattended portable monitors in the diagnosis of obstructive sleep apnea in adult patients. Portable Monitoring Task Force of the American Academy of Sleep Medicine. J Clin Sleep Med. 2007;3(7):737–747. [PubMed Central][PubMed]

3 

Kuna ST, Badr MS, Kimoff RJ, et al. An official ATS/AASM/ACCP/ERS workshop report: Research priorities in ambulatory management of adults with obstructive sleep apnea. Proc Am Thorac Soc. 2011;8(1):1–16. [PubMed]

4 

Ramsey R, Mehra R, Strohl KP. Variations in physician interpretation of overnight pulse oximetry monitoring. Chest. 2007;132(3):852–859. [PubMed Central][PubMed]

5 

Ayache M, May AM, Strohl KP. Poor inter-rater reliability in interpretation of overnight oximetry despite worksheet assistance [abstract]. Am J Respir Crit Care Med. 2017;195:A2619.

6 

Williams AJ, Yu G, Santiago S, Stein M. Screening for sleep apnea using pulse oximetry and a clinical score. Chest. 1991;100(3):631–635. [PubMed]

7 

Chiner E, Signes-Costa J, Arriero JM, Marco J, Fuentes I, Sergado A. Nocturnal oximetry for the diagnosis of the sleep apnea hypopnea syndrome: a method to reduce the number of polysomnographies? Thorax. 1999;54(11):968–971. [PubMed Central][PubMed]

8 

Magalang UJ, Dmochowski J, Veeramachaneni S, et al. Prediction of the apnea-hypopnea index from overnight pulse oximetry. Chest. 2003;124(5):1694–1701. [PubMed]

9 

Chung F, Liao P, Elsaid H, Islam S, Shapiro CM, Sun Y. Oxygen desaturation index from nocturnal oximetry: a sensitive and specific tool to detect sleep-disordered breathing in surgical patients. Anesth Analg. 2012;114(5):993–1000. [PubMed]

10 

Staniforth AD, Kinnear WJ, Starling R, Cowley AJ. Nocturnal desaturation in patients with stable heart failure. Heart. 1998;79(4):394–399. [PubMed Central][PubMed]

11 

Series F, Kimoff RJ, Morrison D, et al. Prospective evaluation of nocturnal oximetry for detection of sleep-related breathing disturbances in patients with chronic heart failure. Chest. 2005;127(5):1507–1514. [PubMed]

12 

Plywaczewski R, Sliwinski P, Nowinski A, Kaminski D, Zielinski J. Incidence of nocturnal desaturation while breathing oxygen in COPD patients undergoing long-term oxygen therapy. Chest. 2000;117(3):679–683. [PubMed]

13 

Marrone O, Salvaggio A, Insalaco G. Respiratory disorders during sleep in chronic obstructive pulmonary disease. Int J Chron Obstruct Pulmon Dis. 2006;1(4):363–372. [PubMed Central][PubMed]

14 

Lacasse Y, Series F, Vujovic-Zotovic N, et al. Evaluating nocturnal oxygen desaturation in COPD--revised. Respir Med. 2011;105(9):1331–1337. [PubMed]

15 

Centers for Medicare & Medicaid Services website. Home Oxygen Therapy [PDF]. https://www.cms.gov/Outreach-and-Education/Medicare-Learning-Network-MLN/MLNProducts/MLN-Publications-Items/ICN908804.html. Published October 2017. Accessed December 11, 2017.

16 

Bujang MA, Adnan TH. Requirements for minimum sample size for sensitivity and specificity analysis. J Clin Diagn Res. 2016;10(10):YE01–YE06. [PubMed Central][PubMed]

17 

Zafar S, Ayappa I, Norman RG, Krieger AC, Walsleben JA, Rapoport DM. Choice of oximeter affects apnea-hypopnea index. Chest. 2005;127(1):80–88. [PubMed]