This study evaluated a novel artificial neural network (ANN) based sleep-disordered breathing (SDB) screening tool incorporating nocturnal pulse oximetry with demographic, anatomic, and clinical data. The tool was compatible with 6 categories of apnea-hypopnea index (AHI) with 4% oxyhemoglobin desaturation threshold, ≥ 5, 10, 15, 20, 25, and 30 events/h.
Using a general population dataset, the training set included 2,280 subjects, whereas the test set included 470 subjects. The input of this tool was a set of 22 variables. The tool had six neural network models for each AHI threshold. Several metrics were explored to evaluate the performance of the tool: area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, and 95% confidence interval (CI).
The AUC was 0.904, 0.912, 0.913, 0.926, 0.930, and 0.954, respectively, with models of AHI ≥ 5, 10, 15, 20, 25, and 30 events/h thresholds. The sensitivities of all neural network models were higher than 95%. The AHI ≥ 30 events/h model had the maximum sensitivity: 98.31% (95% CI: 95.01%–100%).
The results of this study suggested that the ANN based SDB screening tool can be used to identify the presence or absence of SDB. Future validation should be performed in other populations to determine the practicability of this screening tool in sleep clinics and other at-risk populations.
Li A, Quan SF, Silva GE, Perfect MM, Roveda JM. A novel artificial neural network based sleep-disordered breathing screening tool. J Clin Sleep Med. 2018;14(6):1063–1069.
Sleep-disordered breathing (SDB) is a potentially remedial risk factor for hypertension, diabetes, stroke, coronary artery disease, and heart failure.1 In one study, using a heart failure population, the prevalence of SDB was 76%.2 Various SDB screening tools based on questionnaires and anthropometric data, such as the Berlin,3 STOP,4 STOP-BANG,4 NoSAS,5 and 4-Variable tool have been developed over the past 20 years.6 However, the accuracy of these existing screening tools to diagnose SDB is relatively low.7 Therefore, confirmation requires a diagnostic study, either an overnight laboratory-based polysomnogram (PSG) or a home sleep apnea test. An overnight PSG is expensive, complex, and inconvenient. Although a home sleep apnea test is less expensive,8 false-negative studies can occur and SDB severity tends to be underestimated. Consequently, many patients who potentially have SDB never receive a diagnosis. One approach to address this deficiency would be a more accurate, convenient method of facilitating SDB screening.
Artificial neural networks (ANN) are increasingly being used in biomedical fields to aid in tasks such as the classification of biologic specimens, the prediction of pharmacokinetics of drugs, and the diagnosis and prognosis of diseases.9 For example, they have been used in cardiology to predict the presence of coronary artery and congenital heart disease,10,11 and in pulmonology to classify pulmonary nodules.12 Some attempts have been made to use an ANN in the diagnosis of SDB based solely on anthropometric, demographic, and historical clinical data with modest success. In this paper, we propose the inclusion of nocturnal physiologic data to potentially enhance accuracy. Pulse oximetry is a physiologic signal that is widely available. It has been considered as a screening tool for SDB, but is insufficient by itself to confirm a diagnosis of sleep apnea with an accuracy for an apnea-hypopnea index (AHI) > 15 events/h of 86% and 80% in a high- and low-risk population, respectively.13 We hypothesized that use of an ANN in combination with pulse oximetry would result in a more accurate screening tool for SDB. Therefore, in this study, we developed and tested a novel ANN-based SDB screening tool using a large general population database, the Sleep Heart Health Study (SHHS).
The SHHS database was used to develop the neural network– based screening tool.14–16 The SHHS is an ideal resource for this purpose because of its large database of 6,441 subjects with PSG results and associated anthropometric and medical history data. A complete description of the SHHS has been previously published.14–16 Only the baseline examination cycle between November 1, 1995 and January 31, 1998 was used as the study dataset. Six hundred two American Indian subjects were excluded because consent was withdrawn. The publicly accessible database included 1,280 variables and 5,804 subjects, with 2,765 males (47.6%) and 3,039 females (52.4%). We manually selected 22 SDB-related variables from the 1,280 variables as the candidate variables for the screening tool. A total of 1,866 subjects were missing responses for some of the 22 SDB related variables in the baseline examination cycle. Additionally, 879 subjects responded do not know for frequency of snoring question; therefore, we removed these subjects from our final dataset. We also removed the subjects who had poor pulse oximeter signal quality or a short PSG duration (< 5 hours). The resulting dataset thus included 2,850 subjects. Shown in Table 1 are the demographics and other relevant variables of the resulting dataset. The body mass index was computed as weight in kilograms over height in meters squared. Neck circumference (cm) was measured at the me-dial point. Frequency of snoring was defined as not snoring, 1 night/wk, 1 or 2 nights/wk, 3 to 5 nights/wk, and 6 or 7 nights/wk. Fall asleep while in a car, fall asleep while sitting inactive in a public place, and fall asleep while sitting and talking were defined as no chance, slight chance, moderate chance, and high chance.
Demographics and related variables of the Sleep Heart Health Study Dataset.
Demographics and related variables of the Sleep Heart Health Study Dataset.
The resulting dataset was further randomly separated into two datasets, training set (80% of cases) and test set (20% of cases). Table 2 shows the number of subjects classified as having SDB (positive) or not having SDB (negative) in the training set and the test set, and their respective prevalence at each AHI threshold. The AHI mean of the overall training set is 11.0 ± 13.7 events/h. The AHI mean of the test set is 12.3 ± 15.8 events/h. Table 3 compares the characteristics of the input variables in the training and test sets. There were no significant differences between the two sets in regard to the characteristics of the variables.
Prevalence and number of SDB positive and negative subjects.
Prevalence and number of SDB positive and negative subjects.
Key variables in the training set and test set.
Key variables in the training set and test set.
Development of the Models
We classified subjects as having SDB (positive = 1) or not having SDB (negative = 0) by 6 thresholds of AHI with 4% oxyhemoglobin desaturation as ground truth. Our team empirically selected 22 candidate variables based on their known association with SDB from the 1,280 variables as features. We developed 6 neural network models corresponding to the 6 levels of AHI threshold: AHI ≥ 5, 10, 15, 20, 25, and 30 events/h. We normalized the features by a min-max normalization strategy to the range [0, 1], and used the extremely randomized trees algorithm to select input features of the neural network models.17,18 The extremely randomized trees algorithm is a tree-based ensemble method to build 10 total randomized trees.18 Features used at the top of the trees have higher important weight.17 The important weight was used as the criteria. For more details about normalization and feature selection, see the supplemental material.
A neural network is a category of mathematical model with optimizable parameters. The multilayer perceptron (MLP), a specific type of neural network model (Figure 1), was used in this study. In this paper, we used the training set to teach each of the six MLP neural network models. Each neural network model corresponds to one of the six levels of AHI threshold. The backpropagation algorithm in conjunction with the limited memory version of Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithm was used to train the neural network models.19 The training process is an optimization process: it optimized the parameters in a neural network model to reduce the output error. The output of the neural network model was a value between 0 and 1. The output error was computed by comparing the neural network output with the ground truth. The training process was repeated for all subjects in the training set over several iterations. After sufficient training, the neural network model learned how to accurately compute the output (see supplemental material for more details).
Multilayer perceptron neural network.
Multilayer perceptron neural network.
Evaluation and Statistical Analysis
The performance of the screening tool was evaluated by using the test set to estimate the tool's area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and 95% confidence interval (CI). The receiver operating curve is a curve of true-positive rate (sensitivity) versus false-positive rate (1 – specificity). The CIs were calculated by the normal approximation interval formula (as in equation 1). Here, P is the probability, n is the sample size, and z is the z-value of 95% CI.
Evaluation of the screening tool is displayed in Table 4 and Figure 2. The AUC was 0.904, 0.912, 0.913, 0.926, 0.930, and 0.954 when SDB was defined at the AHI threshold ≥ 5, 10, 15, 20, 25, 30 events/h, respectively. The results showed that the AHI ≥ 30 events/h threshold model had the highest AUC although there were no large differences among the AUC for each model. As shown in Table 4, we selected the operating points with high AUC and sensitivities for each model. The maximum sensitivity was 98.31% when the threshold was AHI ≥ 30 events/h. The minimal sensitivity was 95.12% when the threshold was AHI ≥ 5 events/h. The specificities of all MLP neural network models were greater than 60%. The minimum specificity was 62.81% when the threshold is AHI ≥ 5 events/h. The maximum specificity was 72.0% when the threshold was AHI ≥ 20 events/h. The AHI ≥ 30 events/h model had 99.73% NPV based on an 8.52% prevalence of positive subjects in the dataset. The AHI ≥ 5 events/h model had 77.61% PPV based on 55.93% prevalence of positive subjects.
ROC curves of each AHI threshold model.
AHI = apnea-hypopnea index, AUC = area under the receiver operating characteristic curve, ROC = receiver operating characteristic.
ROC curves of each AHI threshold model.
In this study, we used ANN modeling of demographic, anthropometric, clinical, and pulse oximetry data to develop a tool that can be used for screening individuals for the presence of SDB. We found that at commonly used thresholds of AHI, the sensitivity, negative predictive value and AUC were greater than 90%. This suggests that addition of pulse oximetry in an MLP neural network model can be a useful screening tool for SDB in a general population.
In our study, we found 90% to 99% sensitivity, NPV, and AUC for AHI thresholds ranging from 5 to 30 events/h using our MLP neural network models. These results exceed those published using other commonly used screening instruments when tested in the same SHHS dataset and in comparison with other studies. The STOP-BANG questionnaire consists of 8 yes/ no items.4 It has been tested in the SHHS dataset and has 87.0% sensitivity and 43.4% specificity for subjects with moderate to severe disease (15 ≤ AHI ≤ 30 events/h), and 70.4% sensitivity and 59.5% specificity for subjects with severe disease (AHI ≥ 30 events/h).7 The 4-Variable tool includes age, blood pressure, body mass index, and snoring as input data.6 It has 24.7% sensitivity and 93.2% specificity for patients with moderate to severe disease, and 41.5% sensitivity and 93.2% specificity for patients with severe disease when tested in the SHHS dataset.7 The Berlin questionnaire is another commonly used instrument. In a recent review, it was reported to have a 69% to 93% sensitivity and 19% to 54% specificity using an AHI threshold of 30 events/h with a 4% oxygen desaturation requirement.20 As with the STOP-BANG and 4-Variable tool, these validation statistics indicate a number of patients will be misclassified. From a clinical perspective, patients screened as positive using these instruments will still need a confirmatory PSG or home sleep apnea test, and patients deemed to be at high-risk who were screened as negative will also need further testing. Furthermore, in most validation studies, the test dataset consists of patients recruited from sleep clinics or those with a high suspicion of SDB, and results may not be applicable to a more diverse population.
There have been other efforts to apply ANN modelling of clinical data to predict the presence of SDB. El-Solh et al.21 developed a neural network model using 12 clinical input variables to predict AHI values. In the 80 subjects used to test their predictive model, they found comparable AUC at AHI thresholds of 10, 15, and 20 events/h.21 Kirby et al.22 introduced a generalized regression neural network model to predict AHI values. There were 150 subjects used to test their generalized regression neural network model, which had 23 input variables. This model acquired high sensitivity at 98.9% when AHI ≥ 10 events/h was applied to define obstructive sleep apnea (OSA).22 Teferra et al.23 used 9 input variables in an ANN model that had only 74% sensitivity and 78% specificity to predict SDB at an AHI threshold of 15 events/h. In a recent study by Karamanli et al.24 an ANN model was developed using 4 input variables and correctly classified 86.6% of subjects. However, all previous ANN research efforts used clinical population datasets with relatively small numbers of subjects in the development of ANN models. The largest test dataset was 150 subjects,22 and therefore may not have had enough subjects to adequately validate the neural network models. Furthermore, unlike our study, not all commonly employed thresholds of AHI were evaluated.
In this study, our novel ANN based screening tool was developed and tested using a general population dataset. The AUC of all MLP neural network models was over 0.9. The sensitivities of all MLP neural network models were over 95%. The test results validate that the screening tool has high performance and its high NPV of 97.61% at an AHI threshold of 15 events/h indicates that it can be used in the general population to exclude the presence of moderate to severe OSA. This is clinically relevant because a recent comprehensive review concludes that it is unclear whether mild OSA is associated with an increase in cardiovascular or cerebrovascular events.25
Our study is not the first one to incorporate physiologic data in an ANN model to predict the presence of SDB. In a study by Lweesy et al.,26 features of the electrocardiogram were used with more than 90% accuracy in classifying a small number of subjects with symptoms of OSA. Although it is possible to record ambulatory electrocardiogram signals, correct placement of the leads is important and prone to error; thus, use of pulse oximetry may be easier for the layperson. Nevertheless, addition or substitution of other physiologic signals in our ANN model could produce better or comparable results.
This study does have some limitations. First, although the SHHS database is derived from the general population, it is oversampled with snorers and is limited to subjects older than 40 years.14 Second, in clinical practice, some patients may not identify their own sleep problems in questionnaires. Therefore, data from the SHHS population may not have the same predictive accuracy of responses from patients regarding sleep problems. Third, some patients deny sleep problems and will not voluntarily report them. Both the second and third limitations may reduce the performance of the screening tool in clinical use. Fourth, despite the large number of subjects in SHHS, there were a relatively small number of subjects with high values of AHI. This reduces the reliability of the evaluation results in the high threshold models. Finally, subjects with missing data were excluded from the analysis. We believe that this was nondifferential, and thus did not bias the results.
Despite these limitations, our study has important strengths. We used a large database of well-characterized subjects. It is one of the first studies to incorporate easy-to-implement physiologic monitoring to ANN modeling to predict the presence of SDB. Thus, it has the potential to be implemented in primary care physicians' offices to screen populations at high risk for SDB such as those with obesity, snoring, diabetes, and heart failure, and thus decrease the need for referral to a sleep physician. Used by sleep physicians, results from the tool may be sufficient in some patients to determine whether or not a patient has SDB. Thus, the tool could result in a decrease in health care costs by reducing the need for both PSG and home sleep apnea tests.
In summary, we have developed ANN models that incorporate clinical, anatomic, and pulse oximetry input data to accurately screen for the presence or absence of SDB. This tool may have utility in identifying patients with SDB. Future studies should be done in other populations to determine the feasibility of applying this screening tool in clinics and other at-risk populations.
Work for this study was performed at the University of Arizona. All authors have seen and approved the manuscript. Ao Li, Stuart F. Quan, and Janet M. Roveda declare they filed a patent for the artificial neural network based sleep-disordered breathing screening tool. All other authors report no conflicts of interest.
The Emerging Technologies section focuses on new tools and techniques of potential utility in the diagnosis and management of any and all sleep disorders. The technologies may not yet be marketed, and indeed may only exist in prototype form. Some preliminary evidence of efficacy must be available, which can consist of small pilot studies or even data from animal studies, but definitive evidence of efficacy will not be required, and the submissions will be reviewed according to this standard. The intent is to alert readers of Journal of Clinical Sleep Medicine of promising technology that is in early stages of development. With this information, the reader may wish to (1) contact the author(s) in order to offer assistance in more definitive studies of the technology; (2) use the ideas underlying the technology to develop novel approaches of their own (with due respect for any patent issues); and (3) focus on subsequent publications involving the technology in order to determine when and if it is suitable for application to their own clinical practice. The Journal of Clinical Sleep Medicine and the American Academy of Sleep Medicine expressly do not endorse or represent that any of the technology described in the Emerging Technologies section has proven efficacy or effectiveness in the treatment of human disease, nor that any required regulatory approval has been obtained.
artificial neural network
area under the receiver operating characteristic curve
negative predictive value
obstructive sleep apnea
positive predictive value
receiver operating characteristic
Sleep Heart Health Study
The database used in the study was developed using the following National Heart, Lung and Blood Institute cooperative agreements: U01HL53940 (University of Washington), U01HL53941 (Boston University), U01HL53938 (University of Arizona), U01HL53916 (University of California, Davis), U01HL53934 (University of Minnesota), U01HL53931 (New York University), U01HL53937 and U01HL64360 (Johns Hopkins University), U01HL63463 (Case Western Reserve University), and U01HL63429 (Missouri Breaks Research).
Sleep Heart Health Study (SHHS) acknowledges the Atherosclerosis Risk in Communities Study (ARIC), the Cardiovascular Health Study (CHS), the Framingham Heart Study (FHS), the Cornell/Mt. Sinai Worksite and Hypertension Studies, the Strong Heart Study (SHS), the Tucson Epidemiologic Study of Airways Obstructive Diseases (TES), and the Tucson Health and Environment Study (H&E) for allowing their cohort members to be part of the SHHS and for permitting data acquired by them to be used in the study. SHHS is particularly grateful to the members of these cohorts who agreed to participate in SHHS as well. SHHS further recognizes all of the investigators and staff who have contributed to its success. A list of SHHS investigators, staff and their participating institutions is available on the SHHS website, www.jhucct.com/shhs.
This material is based on work partially supported by the National Science Foundation under Grant No. 1433185. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Redline S, Quan SF. Sleep apnea: a common mechanism for the deadly triad—cardiovascular disease, diabetes, and cancer? Am J Respir Crit Care Med. 2012;186(2):123–124. [PubMed]
Oldenburg O, Lamp B, Faber L, Teschler H, Horstkotte D, Töpfer V. Sleep-disordered breathing in patients with symptomatic heart failure: a contemporary study of prevalence in and characteristics of 700 patients. Eur J Heart Fail. 2007;9(3):251–257. [PubMed]
Netzer NC, Stoohs RA, Netzer CM, Clark K, Strohl KP. Using the Berlin Questionnaire to identify patients at risk for the sleep apnea syndrome. Ann Intern Med. 1999;131(7):485–491. [PubMed]
Chung F, Yegneswaran B, Liao P, et al. STOP questionnaire: a tool to screen patients for obstructive sleep apnea. Anesthesiology. 2008;108(5):812–821. [PubMed]
Marti-Soler H, Hirotsu C, Marques-Vidal P, et al. The NoSAS score for screening of sleep-disordered breathing: a derivation and validation study. Lancet Respir Med. 2016;4(9):742–748. [PubMed]
Takegami M, Hayashino Y, Chin K, et al. Simple four-variable screening tool for identification of patients with sleep-disordered breathing. Sleep. 2009;32(7):939–948. [PubMed Central][PubMed]
Silva GE, Vana KD, Goodwin JL, Sherrill DL, Quan SF. Identification of patients with sleep disordered breathing: comparing the four-variable screening tool, STOP, STOP-Bang, and Epworth Sleepiness Scales. J Clin Sleep Med. 2011;7(5):467–472. [PubMed Central][PubMed]
Kim SH, Collop N. Cost of therapy. Sleep Med Clin. 2013;8:557–569.
Amato F, López A, Peña-Méndez EM, Vaňhara P, Hampl A, Havel J. Artificial neural networks in medical diagnosis. J Appl Biomed. 2013;11:47–58.
Li H, Luo M, Zheng J, et al. An artificial neural network prediction model of congenital heart disease based on risk factors: a hospital-based case-control study. Medicine. 2017;96(6):e6090. [PubMed Central][PubMed]
Isma'eel HA, Cremer PC, Khalaf S, et al. Artificial neural network modeling enhances risk stratification and can reduce downstream testing for patients with suspected acute coronary syndromes, negative cardiac biomarkers, and normal ECGs. Int J Cardiovasc Imaging. 2015;32(4):687–696. [PubMed]
Li W, Cao P, Zhao D, Wang J. Pulmonary nodule classification with deep convolutional neural networks on computed tomography images. Comput Math Methods Med. 2016;2016:6215085. [PubMed Central][PubMed]
Kapur VK, Auckley DH, Chowdhuri S, et al. Clinical practice guideline for diagnostic testing for adult obstructive sleep apnea: an American Academy of Sleep Medicine clinical practice guideline. J Clin Sleep Med. 2017;13(3):479–504. [PubMed Central][PubMed]
Quan SF, Howard BV, Iber C, et al. The Sleep Heart Health Study: design, rationale, and methods. Sleep. 1997;20(12):1077–1085. [PubMed]
Redline S, Sanders MH, Lind BK, et al. Methods for obtaining and analyzing unattended polysomnography data for a multicenter study. Sleep Heart Health Research Group. Sleep. 1998;21(7):759–767. [PubMed]
Lind BK, Goodwin JL, Hill JG, Ali T, Redline S, Quan SF. Recruitment of healthy adults into a study of overnight sleep monitoring in the home: experience of the Sleep Heart Health Study. Sleep Breath. 2003;7(1):13–24. [PubMed]
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;2825–2830.
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63:3–42.
Liu DC, Nocedal J. On the limited memory BFGS method for large scale optimization. Math Program. 1989;45:503–528.
Senaratna CV, Perret JL, Matheson MC, et al. Validity of the Berlin questionnaire in detecting obstructive sleep apnea: a systematic review and meta-analysis. Sleep Med Rev. 2017;36:116–124. [PubMed]
El-Solh AA, Mador MJ, Ten-Brock E, Shucard DW, Abul-Khoudoud M, Grant BJ. Validity of neural network in sleep apnea. Sleep. 1999;22(1):105–111. [PubMed]
Kirby SD, Eng P, Danter W, et al. Neural network prediction of obstructive sleep apnea from clinical criteria. Chest. 1999;116(2):409–415. [PubMed]
Teferra RA, Grant BJ, Mindel JW, et al. Cost minimization using an artificial neural network sleep apnea prediction tool for sleep studies. Ann Am Thorac Soc. 2014;11(7):1064–1074. [PubMed Central][PubMed]
Karamanli H, Yalcinoz T, Yalcinoz MA, Yalcinoz T. A prediction model based on artificial neural networks for the diagnosis of obstructive sleep apnea. Sleep Breath. 2016;20(2):509–514. [PubMed]
Chowdhuri S, Quan SF, Almeida F, et al. An official American Thoracic Society research statement: impact of mild obstructive sleep apnea in adults. Am J Respir Crit Care Med. 2016;193(9):e37–e54. [PubMed]
Lweesy K, Fraiwan L, Khasawneh N, Dickhaus H. New automated detection method of OSA based on artificial neural networks using P-wave shape and time changes. J Med Syst. 2011;35(4):723–734. [PubMed]