The formulation of these recommendation statements was guided by evidence from twenty-six validation studies that evaluated the diagnostic accuracy of HSAT against PSG,35,53,62,67,81,86–106 as well as seven RCTs that compared clinical outcomes from management pathways.83–85,107–110 Four of these RCTs were determined to be most relevant to clinical practice, as they did not require oximetry testing as a criterion for inclusion and used conventional methods for determination of PAP pressures (i.e., APAP or attended titration).83–85,110 This subset of studies will be referred to as “RCTs most generalizable to clinical practice” for the remainder of this discussion section.
ACCURACY: The following paragraphs are organized by type of HSAT device and components or combinations of components, as described in the literature.
A total of twenty-six validation studies were identified that reported accuracy outcomes. The data from these validation studies are summarized in the supplemental material, Table S37 through Table S58. In two studies that evaluated the performance of Type 2 HSAT devices against PSG,67,86 when using an AHI ≥ 5 cutoff, accuracy in a high-risk population (assuming a prevalence of 87%) ranged from 84% to 91%. Using a cutoff of AHI ≥ 15, the accuracy of these devices was 88% in a high-risk group (see supplemental material, Table S37 and Table S38).67,86 Seven studies evaluated the performance of Type 3 HSAT devices against PSG, but the AHI cutoffs employed varied across studies, resulting in sub-grouping by AHI cutoffs for our analyses.87–93 When using an AHI ≥ 5 cutoff, accuracy in a high-risk population (assuming a prevalence of 87%) ranged from 84% to 91%, whereas in a low-risk population (assuming a prevalence of 55%) accuracy ranged from 70% to 78% based on the seven studies (see supplemental material, Table S39). Using a cutoff of AHI ≥ 15, the accuracy of these devices in a high-risk population ranged from 65% to 91%, based on six studies87,89–92,94 (see supplemental material, Table S40). Using a cutoff of AHI ≥ 30, the accuracy of the devices in the high-risk population was 88% (95% CI: 81% to 94%), based on five studies (see supplemental material, Table S41).
Five studies evaluated the performance of 2–3 channel HSAT devices against PSG. In a high-risk population using cutoffs of AHI ≥ 5,95–97 AHI ≥ 15,95–99 and AHI ≥ 30,96,97 accuracy ranged from 81% to 93%, 72% to 87%, and 71% to 90%, respectively. Using the same cutoffs in a low-risk population, accuracy ranged from 77% to 88%, 68% to 95%, and 88% to 91%, respectively (see supplemental material, Table S42 through Table S44). When the performance of 2–3 channel HSAT was evaluated against unattended in-home PSG, using a cutoff of AHI ≥ 15, accuracy in a high-risk population was 86% (95% CI: 76% to 93%);53 using a cutoff of AHI ≥ 30, accuracy ranged from 83% to 91% (see supplemental material, Table S45 and Table S46).53,81
Six studies evaluated the performance of single channel HSAT against attended or unattended PSG (see supplemental material, Table S47 through Table S50, and Table S51 through Table S53, respectively).73,100–103,111
A single study evaluated the performance of oximetry against unattended in-home PSG.111 Using a cutoff of AHI ≥ 5, accuracy was 73% (95% CI: 68 to 78%) in a high-risk population, and 79% (95% CI: 74 to 84%) in a low-risk population. Using oximetry to identify OSA at an AHI ≥ 5 cutoff, and assuming a prevalence of 87% in a high-risk population, the findings of the study111 would result in an estimated average of 274 misdiagnosed patients out of 1,000 tested, and 210 misdiagnosed patients out of 1,000 tested in a low-risk group (assuming a prevalence of 55%). Using a cutoff of AHI ≥ 15 and AHI ≥ 30, oximetry has an accuracy of 86% (95% CI: 83 to 91%) and 74% (95% CI: 71 to 76%) in a high-risk population, and an accuracy of 80% (95% CI: 75 to 84%) and 63% (95% CI: 59 to 67%) in a low-risk population, respectively (see supplemental material, Table S51 through Table S53).
A single study evaluated the performance of PAT, oximetry, and actigraphy against simultaneous unattended in-home PSG and reported a sensitivity of 0.88 (95% CI: 0.47 to 1.00), specificity of 0.87 (95% CI: 0.66 to 0.97) and accuracy of 88% (95% CI: 50 to 100%) in high-risk patients using a cutoff of AHI ≥ 5.104 These findings would result in 121 misdiagnosed patients out of 1,000 tested in a high-risk population (based on a prevalence of 87%), and 125 misdiagnosed patients out of 1,000 tested in a low-risk population (based on a prevalence of 55%) (see supplemental material, Table S54).104 Two cross-over studies randomized patients to home-based PAT, and in-laboratory simultaneous PSG and PAT.105,106 For comparison to in-laboratory PSG, only the home-based PAT data were used for this recommendation. A single study that evaluated the performance of the PAT device in the home against in-laboratory PSG using a cutoff of AHI ≥ 5,106 reported a specificity of 0.43 (95% CI: 0.22 to 0.66). When two studies evaluated the home-based PAT device against in-laboratory PSG at an AHI cutoff of ≥ 15, specificity ranged from 0.77 to 1.00 and sensitivity ranged from 0.92 to 0.96.105,106 A single study evaluated the PAT device at an AHI cutoff of ≥ 30, and reported a specificity of 0.82 (95% CI: 0.57 to 0.96) and sensitivity of 0.92 (95% CI: 0.62 to 1.00)105 (see supplemental material, Table S55 through Table S57).
The quality of evidence for diagnostic accuracy was downgraded due to indirectness, imprecision, or inconsistency. The quality ranged from low to high based on different tools and algorithms, diagnostic cutoffs, and risk groups.
The potential consequences for patients classified in true and false positive or negative categories are summarized in the supplemental material, Table S1. The TF concluded that the numbers of patients potentially misclassified by HSAT was high enough to be of clinical concern, particularly when tests were inconclusive or negative. In a population that has increased risk of moderate to severe OSA, both the increased likelihood of false negatives and the significant impact of missed diagnoses on patient outcomes can cause significant harm. This reasoning supports required use of a diagnostic test with higher sensitivity (PSG) in this population if HSAT provides a negative or non-diagnostic result.
CLINICAL OUTCOMES ASSESSMENT: The TF concluded that evaluating the impact of diagnostic accuracy on clinical outcomes is complicated by a number of factors that can cause discordance between tests, including night-to-night variability and inconsistent definitions of respiratory events (e.g., hypopneas) between HSAT and PSG. In addition, there is uncertainty regarding clinical outcomes for patients misclassified by HSAT.
For these reasons, studies that compared clinical outcomes in patients randomized to management pathways based on PSG or HSAT diagnostic assessment, within the same research protocol, provide the best opportunity to assess the acceptability of clinical outcomes using HSAT.
SUBJECTIVE SLEEPINESS: A meta-analysis of seven RCTs compared changes in patient-reported sleepiness, using the ESS, in patients diagnosed by HSAT or PSG, followed by PAP titration (see supplemental material, Figure S18).83–85,107–110 The meta-analysis showed a clinically and statistically insignificant difference of 0.38 points (95% CI: −1.07 to 0.32 points) greater improvement in patients randomized to the HSAT pathway versus the attended PSG pathway. This difference indicates that subjective sleepiness is similarly improved in patients who initiate PAP treatment based on diagnosis using either HSAT or PSG. The quality of evidence for subjective sleepiness was high.
QUALITY OF LIFE: Six RCTs, using various validated instruments (i.e., FOSQ, SAQLI, and SF-36), compared QOL in patients diagnosed by HSAT or PSG, followed by PAP titration.84,85,107–110 Meta-analysis demonstrated differences in pooled effects between pathways that were not significant (see supplemental material, Figure S19 through Figure S23, and Table S58). The quality of evidence ranged from moderate to high based on the measure used to assess QOL. The quality of evidence for the SF-36 physical and mental summary scores was downgraded due to imprecision. The TF considered the overall quality of evidence for QOL to be high as FOSQ and SAQLI measures of QOL were considered more critical for decision-making than the SF-36 measures.
CPAP ADHERENCE: Six RCTs evaluated CPAP adherence (mean hours of use per night); meta-analysis found no significant difference between the two assessment pathways (see supplemental material, Figure S24).83–85,108–110 When determining adherence by number of nights with greater than 4 hours of use, meta-analysis of five RCTs found a clinically insignificant trend towards increased CPAP adherence in the HSAT arm versus the PSG arm (see supplemental material, Figure S25).83–85,107,110 The quality of evidence for CPAP adherence was moderate to high across different AHI cutoffs after being downgraded due to imprecision. The TF determined that the overall quality of evidence across AHI cutoffs was high.
FAILURE TO COMPLETE DIAGNOSTIC ALGORITHM: Among the four RCTs most generalizable to clinical practice, three studies83–85 required use of PSG if HSAT was inconclusive (did not provide adequate data or showed a low AHI after 1 or 2 unsuccessful attempts) and after 1 or 2 failed APAP trials (e.g., insufficient use, elevated residual AHI, persistent large leak). Based on data reported by a multicenter RCT there was concern regarding risk of non-completion of diagnostic testing when initial HSAT did not provide a definitive result. Rosen et al. 201284 reported that 30% (10/33) of subjects with technically inadequate HSATs and 16% (14/88) of subjects with low AHI on HSAT failed to proceed per protocol to PSG. There was also evidence indicating reduced effectiveness of repeated HSAT attempts for technical failures: 82% (147/180) of initial HSAT attempts were technically acceptable, whereas only 60% (12/20) of second attempts resulted in a technically acceptable study. Although failure to complete the diagnostic algorithm was not originally considered a critical outcome, the TF ultimately determined that it was critical for decisions regarding follow-up for inconclusive HSAT attempts. The quality of evidence regarding performance of PSG after a single inconclusive HSAT (versus multiple attempts) was low.
OVERALL QUALITY OF EVIDENCE: The TF determined that the critical outcome for diagnostic accuracy assessment was the number of false negative results. The quality of evidence for accuracy was downgraded to moderate due to imprecision, inconsistency, or indirectness. The quality of evidence for the clinical outcomes of sleepiness, quality of life, and CPAP adherence was high. Depression and cardiovascular outcomes were also considered critical outcomes; however, evidence for these outcomes was not available. Therefore, the overall quality of evidence for recommendation 2 is moderate.
In addition to accuracy and clinical outcomes, the TF determined that failure to complete the diagnostic algorithm was a critical outcome for repeat testing after a negative, inconclusive or technically inadequate HSAT. The quality of evidence for performing PSG after a single inconclusive HSAT was determined to be low, as only one study addressed this outcome. Therefore, the overall quality of evidence for recommendation 3 is low.
RESOURCE USE: Though a single night of HSAT is less resource-intensive than a single night of PSG, the relative cost-effectiveness of management pathways that incorporate each of these diagnostic strategies is unclear. Economic analyses have compared the cost-effectiveness of management pathways that incorporate diagnostic strategies using HSAT or PSG.112–114 All have concluded that PSG is the preferred diagnostic strategy from an economic perspective for adults suspected to have moderate to severe OSA. An important factor in these analyses is the favorable cost-effectiveness of OSA treatment in patients with moderate to severe OSA, particularly when longer time horizons are considered. As a result, diagnostic strategies that lead to increased false negatives, and leave patients untreated, or increase false positives, and unnecessarily treat patients, have less favorable cost-effectiveness. It is important to note that these economic analyses are susceptible to error because of imprecision in modelling of management pathways and limitations in the quality of data available to estimate parameters. The impact of errors can be magnified when extrapolated over long time horizons.
Relative cost-effectiveness of management pathways that use HSAT or PSG for diagnosis can be assessed in the context of a RCT, if resource utilization is measured. Among the four RCTs most generalizable to clinical practice,83–85,110 only one provided this information.84 The study reported that in-trial costs were 25% less in the home-arm than the in-laboratory-arm.84 These estimates were based on the Medicare Fee Schedule for the various study procedures, including office visits and diagnostic testing, and take into account the need to repeat studies.84 A subsequent cost minimization analysis of this RCT also considered costs from a provider perspective.115 While provider costs (capital, labor, overhead) were generally less for the home program, this was not true for all modelled scenarios. The provider perspective highlighted the large number of cost components necessary to ensure high quality home-based OSA management, which narrowed the cost difference relative to lab management.
The available studies indicate that the potential cost advantages of HSAT over PSG are not as high as reflected by the cost difference of a single night of testing. Even when HSAT is used in appropriate populations and conditions, additional HSAT and PSG are needed for patients with technically inadequate or inconclusive studies, in order to achieve an accurate diagnosis. In addition, if a home management pathway is used in a manner that results in reduced effectiveness relative to PSG, use of HSAT could in fact be less cost effective than using PSG. Examples of this include use in patient populations with predominantly mild OSA in which there are a higher proportion of negative or indeterminate HSAT results that require follow-up PSG, or use in patients at risk for non-obstructive sleep-related breathing disorders that may not be accurately diagnosed with HSAT. The TF determined that if HSAT is used in the recommended context and management pathway, it would be more cost-effective than if it is used outside this framework.
BENEFITS VERSUS HARMS: Use of HSAT may provide potential benefits to patients with suspected OSA. Such benefits could include convenience, comfort, increased access to testing, and decreased cost. HSAT can be performed in the home environment with fewer attached sensors during sleep. The availability of HSAT for diagnosis may improve access to diagnostic testing in resource-limited settings, or when the patient is unable to leave the home or healthcare setting for testing. In addition, HSAT may be less costly when used appropriately. These benefits must be weighed against the potential for harm. Harms could result from the need for additional diagnostic testing among patients with technically inadequate or inconclusive HSAT findings, or from misdiagnosis and subsequent inappropriate therapy or lack of therapy. As summarized above, the use of HSAT has not been demonstrated to provide inferior clinical benefit, compared to PSG when used in the appropriate context. Therefore, the TF determined that if HSAT is used in the context described in the recommendations and remarks, the risk of harm is minimized and the probability of potential economic benefits increased.
The TF was concerned that, in clinical practice (in contrast to the RCT setting) there would be higher levels of drop out from diagnostic testing, among patients with initial study attempts that did not result in diagnoses of OSA. In particular, there was concern that patients with false negative HSAT results may not complete additional testing after learning of a negative result, despite the presence of symptoms of OSA. In addition, as described above, HSAT is less accurate than PSG and more likely to result in false negative results. For these reasons, the TF recommends that if the initial HSAT shows a negative or inconclusive result, PSG, rather than a second HSAT, should be performed. There are similar concerns that, following a technically inadequate HSAT, repeat HSAT may be associated with a higher rate of technical failure on the second study, and with increased risk of drop out from the diagnostic process. Therefore, the TF also recommends that if the initial HSAT is technically inadequate, PSG rather than a second HSAT should be performed. On the other hand, the TF recognizes that there may be specific circumstances in which repeat HSAT is appropriate after an initial failed HSAT. These circumstances would include cases in which both of the following are present: the clinician determines that there is a high likelihood of successful recording on a second attempt, and the patient expresses a preference for this approach.
The TF recognizes that HSAT may have value to patients in some contexts beyond what is covered by these recommendations, but has limited the recommendations to apply to situations where there is sufficient evidence to guide evaluation of benefits versus harms.
PATIENTS' VALUES AND PREFERENCES: Individual patient preference for PSG or HSAT will differ depending on circumstances and values. In one of the four RCTs most generalizable to clinical practice, both HSAT and PSG were performed for each patient, and 76% preferred HSAT.110 This means that a significant percentage (24%) still preferred PSG. Unfortunately, there is insufficient data about diagnostic testing preferences in clinical practice, where preferences may differ from what is seen in the RCT setting. The availability of different options for diagnosis may increase satisfaction, if patient preferences are included in the process of choosing the diagnostic test type. If HSAT is used, the TF determined that patients would value accurate diagnosis, good clinical outcomes, and increased convenience. Based on their clinical judgment, the TF also determined that patients would prefer not having a repeat HSAT if the initial test result is negative, as repeated HSAT would be less likely to produce a definitive result and would unnecessarily inconvenience the patient. In this situation, proceeding directly to PSG, which has greater sensitivity to detect OSA, would be preferred by most patients. The TF also determined that most patients would prefer not to have a repeat HSAT if the initial test was technically inadequate, to avoid inconvenience, but that some patients may desire this option, in specific cases in which there was high likelihood of an adequate result with repeat testing.
SPECIAL CONSIDERATIONS: The following sections describe special considerations when using HSAT for the diagnosis of OSA. They provide additional support for, and explanation of the Remarks, and are based on specifications used by studies that support the recommendation statements.
CLINICAL POPULATION: A review of RCTs that met inclusion criteria indicated that the following criteria should be used to establish the presence of increased risk of moderate to severe OSA and to determine if HSAT use is reasonable: excessive daytime sleepiness occurring on most days, AND the presence of at least two of the following three criteria: habitual loud snoring; witnessed apnea or gasping or choking; or diagnosed hypertension. Among the four RCTs most generalizable to clinical practice, two of the four studies83,84 required ESS > 12 as an entry criterion: One110 required at least two out of three criteria (i.e., sleepiness (ESS > 10), witnessed apnea, snoring) for participation; and one, which was performed in a Veteran's Administration population, did not specify any specific entry criteria besides suspected OSA (though the average ESS for participants was elevated at > 12 and 95% were men).83 In the latter study, 9.9% of individuals in the PSG arm were found to have AHI < 5.83 In addition to sleepiness, at least two studies in this subset had specific inclusion criteria such as snoring, witnessed apnea, gasping or choking at night, or hypertension.83,85 One study incorporated neck circumference in the determination of high risk of OSA.84
EXCLUDED PATIENT POPULATIONS: Three of the four RCTs most generalizable to clinical practice excluded patients with significant cardiopulmonary disease and other significant sleep disorders.83,84,110 Two studies excluded patients taking opioids, having uncontrolled psychiatric disorder, neuromuscular disease, and patients with significant safety-related issues related to driving or work. Other notable exclusion criteria, specified by at least one of the studies, included lack of an appropriate living situation, pregnancy, and alcohol abuse. The single study that did not mention exclusion criteria noted that 3 of 148 individuals in the HSAT arm were diagnosed with CSA and 4 of 148 individuals required supplemental oxygen or bi-level PAP and exited the study.85 In the PSG arm of the study, 6 of 148 individuals were diagnosed with CSA and 12 of 148 required supplemental oxygen or bi-level PAP. Studies outside the four RCTs most generalizable to clinical practice had similar inclusion/exclusion criteria.
Therefore, based on information from three of the four RCTs most generalizable to clinical practice that specified exclusion criteria, and for the reasons discussed above in Resource Use, Benefits and Harms, and Patients' Values and Preferences sections, the TF determined that HSAT should be used in an uncomplicated clinical population. This is defined as the absence of significant cardiopulmonary disease (e.g., heart failure, chronic obstructive pulmonary disease [COPD]), potential respiratory muscle weakness due to neuromuscular conditions, chronic opiate medication use, history of stroke, concern for a significant sleep disorder other than OSA (e.g., CSA, parasomnia, narcolepsy, severe insomnia), and environmental or personal factors that preclude the adequate acquisition and interpretation of data from HSAT.83,84,110
FOLLOW-UP: Based on information from the four RCTs most generalizable to clinical practice,83–85,110 the TF determined that HSAT should be used in the context of an OSA management pathway that incorporates a PAP therapy initiation protocol for APAP or PSG titration, early follow-up after initiation of therapy, and PSG titration studies for patients failing APAP therapy. All RCTs incorporated early follow-up of APAP titration (within 2–7 days after HSAT) by skilled technical staff.83–85,110 As described above, the recommendation for using HSAT to diagnose OSA is based on clinically significant improvements in clinical outcomes. Therefore, the TF determined that HSAT should be used in the context of an OSA management pathway that incorporates a PAP therapy initiation protocol and early follow-up after initiation of therapy.
CLINICAL EXPERTISE: All four RCTs that were most generalizable to clinical practice administered HSAT at academic or tertiary sleep centers with highly skilled sleep medicine providers and technical staff.83–85,110 HSAT recordings were reviewed by a sleep medicine specialist. One RCT that was not included in this subset (because an overnight oximetry was used as entry criteria) used a simplified nurse-led model of care involving nurse specialists experienced in management of sleep disorders (mean of 8.3 years of experience with CPAP therapy). Therefore, the TF determined that HSAT should be administered by an accredited sleep center under the supervision of a board-certified sleep medicine physician, or a physician who has completed a sleep fellowship, but is awaiting the next opportunity to take the board examination.
HOME SLEEP APNEA TESTING DEVICE: Among the four RCTs that were most generalizable to clinical practice, three used conventional Type 3 devices (nasal pressure, thoracic and abdominal excursion using RIP technology, oxygen saturation, EKG, body position, and oral thermistor in some cases),84,85,110 and one used a 4-channel device83 based on PAT with three additional channels (heart rate, pulse oximetry, and actigraphy). The TF determined that testing should be performed using these types of HSAT devices that have been demonstrated to be technically adequate. Additional guidance on technical specifications regarding HSAT is provided in The AASM Manual for the Scoring of Sleep and Associated Events.24
RECORDING TIME: In the four RCTs most generalizable to clinical practice, the minimum requirement for an acceptable study was 4 hours of adequate flow and oximetry signals.83–85,110 Whereas one HSAT study83 used PAT as a surrogate of flow, two studies recorded nasal pressure flow85,110 and one study recorded thermistor in addition to nasal pressure flow.84 The latter three studies also recorded thoracic and abdominal movements.84,85,110 All of these studies showed at least equivalence of adherence to PAP therapy and functional improvement in the home versus in-laboratory management pathways.84,85,110 Therefore, the TF determined that a protocol requirement of a minimum of 4 hours of good quality data from HSAT recording, during the habitual sleep period, is warranted to diagnose OSA.
Additionally, nine non-RCT validation studies reported minimum requirements for duration of acceptable signal quality.35,53,54,81,86,88,93,96,116 The required signals and minimum durations included nasal pressure flow and oximetry for at least 3 hours88,93,116 or 4 hours53,81,86,96 and single-channel nasal airflow recording for a minimum of 3 hours35 or only 2 hours.54 The diagnostic accuracy of the cardiorespiratory devices compared against PSG for the detection of OSA at different AHI cutoff points was relatively high. One study reported a sensitivity and specificity of 0.88 and 0.84, respectively, for a HSAT AHI cutoff point of ≥ 9 events/h.53 In a separate study, the sensitivity and specificity for unattended in-home PSG was 0.91 and 0.89 for an AHI cutoff of > 10 events/h, but 0.88 and 0.55, respectively for an AHI cutoff of > 5 events/h.86 In another study, at an AHI cutoff of > 10 events/h, HSAT had a sensitivity of 0.87, and a specificity of 0.86.88
Overall, the body of evidence investigating the minimum number of hours of adequate data on HSAT required to accurately diagnose OSA is very limited. There are no data to suggest that fewer than 4 hours of technically adequate recording compromises the accuracy of test results, and there is no direct evidence on the impact of a minimum number of recording hours of HSAT on clinical outcomes. Based on available indirect evidence, the TF weighed the “risk” of undergoing less than the required duration of good quality HSAT with resultant false negative (or false positive) results, against the “benefit” of potentially increasing the accuracy by performing PSG. Performing PSG in the scenario of a “positive” diagnosis of OSA is less likely to alter clinical decision-making and may, in fact result in unnecessary delays in care with increased cost. Conversely, a “negative” HSAT, in the scenario of a high pretest probability of OSA, will justify PSG even when the test is of adequate quality and duration. The TF believes that the goals of establishing an accurate diagnosis, while minimizing patient inconvenience and cost, align with patient preferences.
NIGHTS OF RECORDING TIME: The adequacy of a single night HSAT performed for the diagnosis of OSA in the context of an appropriate clinical population and management pathway is supported by published evidence. Our literature review only identified two studies relevant to the question of whether multiple nights of recording is superior to a single night.35,73 These studies evaluated the performance of multiple nights (3) of single channel HSAT device (i.e., nasal pressure transducer or oximetry) to the first night of recording. Utilizing PSG as the reference, the studies found that recording over three consecutive nights may decrease the probability of insufficient data and marginally improve accuracy when compared against a single night of recording. However, the TF considered this evidence insufficient to establish the superiority of multiple-night HSAT protocol over a single-night HSAT protocol, as the studies only included a single channel recording and did not evaluate clinically meaningful outcomes or efficiency of care.
A single HSAT recording encompassing multiple nights may have potential advantages or drawbacks relative to only a single night of recording. For example, if multiple-night HSAT improved accuracy or resulted in fewer inconclusive or inadequate studies, patient outcomes or costs might improve. On the other hand, the potential for multiple-night recordings to increase cost and patient inconvenience must be considered. Insufficient evidence exists to support routine performance of more than a single night's recording for HSAT.