Issue Navigator

Volume 15 No. 08
Earn CME
Accepted Papers

Scientific Investigations

Tracheal Sound Analysis Using a Deep Neural Network to Detect Sleep Apnea

Hiroshi Nakano, PhD1; Tomokazu Furukawa, PhD1; Takeshi Tanigawa, PhD2
1Sleep Disorders Centre, National Hospital Organization Fukuoka National Hospital, Yakatabaru, Minmi-ku, Fukuoka City, Japan;; 2Department of Public Health, Graduate School of Medicine, Juntendo University, Hongo, Bunkyo-ku, Tokyo, Japan


Study Objectives:

Portable devices for home sleep apnea testing are often limited by their inability to discriminate sleep/wake status, possibly resulting in underestimations. Tracheal sound (TS), which can be visualized as a spectrogram, carries information about apnea/hypopnea and sleep/wake status. We hypothesized that image analysis of all-night TS recordings by a deep neural network (DNN) would be capable of detecting breathing events and classifying sleep/wake status. The aim of this study is to develop a DNN-based system for sleep apnea testing and validate it using a large sampling of polysomnography (PSG) data.


PSG examinations for the evaluation of sleep-disordered breathing (SDB) were performed for 1,852 patients: 1,548 PSG records were used to develop the system, and the remaining 304 records were used for validation. TS spectrogram images were obtained every 60 seconds and labeled with the PSG scoring results (breathing event and sleep/wake status), then introduced to DNN learning. Two different DNNs were trained for breathing status and sleep/wake status, respectively.


A DNN with convolutional layers showed the best performance for discriminating breathing status. The same DNN structure was trained for sleep/wake discrimination. In the validation study, the DNN analysis was capable of discriminating the sleep/wake status with reasonable accuracy. The diagnostic sensitivity, specificity, and area under the receiver operating characteristic curves for diagnosis of SDB with apnea-hypopnea index of > 5, 15, and 30 were 0.98, 0.76, and 0.99; 0.97, 0.90, and 0.99; and 0.92, 0.94, and 0.98, respectively.


The developed system using the TS DNN analysis has a good performance for SDB testing.


Nakano H, Furukawa T, Tanigawa T. Tracheal sound analysis using a deep neural network to detect sleep apnea. J Clin Sleep Med. 2019;15(8): 1125–1133.


Current Knowledge/Study Rationale: Portable monitors for home sleep apnea testing can underestimate sleep apnea severity because of inability for sleep/wake discrimination. Tracheal sound carries information about the breathing status and sleep/wake status that can be easily perceived visually if provided as a spectrogram.

Study Impact: We tried image analysis of tracheal sound spectrogram using a convolutional deep neural network (DNN). As a result, the DNN-based system could detect apnea/hypopnea events and discriminate sleep/wake status with reasonable accuracy from tracheal sound alone, indicating that this method may become a basis for an innovative sleep apnea testing device.


Obstructive sleep apnea (OSA) is prevalent worldwide and is associated with significant morbidity and mortality.1,2 Various simplified methods for detecting OSA have been developed as a substitute for polysomnography (PSG), which is considered the diagnostic gold standard but is very cumbersome. These simplified methods are widely used at home (home sleep apnea test, HSAT)3 and have been officially accepted as diagnostic tools4 when administered appropriately. However, HSAT devices often have limited diagnostic ability. Most devices do not have the ability to discriminate the wake/sleep status. Therefore, they use the total recording time, instead of the total sleep time, as the denominator of the respiratory event index, often resulting in an underestimation of OSA severity.5

Recently, new artificial intelligence (AI) technologies, including deep neural networks (DNNs), are being increasingly used for medical diagnosis with successful results.6 For the diagnosis of OSA, AI analyses of clinical features,7,8 facial photographs,9 oximetry,10 and electrocardiogram data11 have been proposed. A few studies have applied machine learning to acoustic analyses for the diagnosis of OSA.12,13 We previously reported the usefulness of tracheal sound (TS) analysis for the diagnosis of OSA14 and the evaluation of snoring.15 A recent report demonstrated that snoring sound analysis can discriminate non-rapid eye movement OSA and rapid eye movement OSA.16 However, our previous method as well as the reported AI-assisted acoustic analyses for OSA have the same weakness as HSAT tools in that they cannot discriminate the sleep/wake status, nor can they classify the types (central/obstructive) of breathing events.

TS carries abundant information about the breathing status and can be easily perceived visually if provided as spectrogram. Recently, a spectrogram image analysis of lung sounds using DNN was proposed as a powerful method for classifying lung sounds.17 Thus, we hypothesized that DNN analyses of all-night TS spectrogram recordings would be capable of detecting breathing events accurately and would provide information about the sleep/wake status. In a pilot study (unpublished), we noticed that a DNN analysis of TS spectrogram data had such abilities. Therefore, the current study was aimed to develop a new AI program using DNN to yield diagnostic variables for OSA, including sleep/wake status and event type discrimination, from all-night TS recordings and to validate the developed method using a large sampling of PSG data.


Study Population

This study used PSG data previously obtained at the Fukuoka National Hospital. The study population comprised consecutive patients who had undergone PSG examinations for the diagnosis (including ruling out) or follow-up of sleep-disordered breathing between January 2008 and December 2016. We decided to use this dataset because during this period of time, the detection of hypopnea events was mainly based on the nasal pressure signal amplitude.18 PSG data from patients younger than 20 years and from those who were undergoing continuous positive airway pressure (CPAP) therapy during PSG testing, tracheostomy, or oxygen therapy were excluded. A total of 2,042 PSG examinations were reviewed; 139 cases in which TS recordings were not adequately acquired were subsequently excluded. In addition, 36 cases were excluded because of other signal failures that made sleep stage or respiratory event scoring impossible. Data obtained between 2008 and 2013 (n = 1,548) were used to develop the neural network for the detection of apnea/hypopnea events and the sleep/wake status. The remaining data obtained between 2014 and 2016 were used to validate the neural network (n = 304). Patients whose data were used in the development group (n = 15) were excluded from the validation group. This retrospective study was approved by the Institutional Review Board (Approval No. F30-10); patient consent was waived under the condition that an opportunity to opt out had been provided. Information about the study and the means of opting out was provided on the hospital’s homepage.


PSG data were recorded using a polygraph system (EEG7414, EEG1200, or EEG1524; Nihon Kohden, Kobe, Japan). Electroencephalography (C3-A2, O2-A1), bilateral electrooculography, submental electromyography, electrocardiography, and bilateral anterior tibial electromyography were recorded. Oronasal airflow was monitored using a thermocouple sensor and a nasal prong pressure transducer (PTAF; Pro-Tec, Mukilteo, Washington, USA). Thoracic and abdominal respiratory movements were monitored using respiratory inductive plethysmography (RIP; Inductotrace; Ambulatory Monitoring Inc., Brown Deer, Wisconsin, USA). Oxyhemoglobin saturation was monitored using pulse oximetry (OLV-3100; Nihon Kohden, Kobe, Japan). TS was recorded from an air-coupled microphone (RP-VC3, Panasonic; MC-TP2, Sharp; AT9904, Audio-Technica; or ECM-PC60, Sony, depending on the time period; all devices were made in Japan) attached to the neck over the trachea to quantify snoring.

TS was analyzed using a personal computer (PC) system that had been developed to detect snoring.15 In brief, TS was digitized using a sound system incorporated in the PC at a sampling frequency of 11,025 Hz and a 16-bit resolution, from which the power spectra for 1,024 points data (processed by Hanning window) were calculated using a fast Fourier transform. The power spectra were log-transformed and stored as decibel values in the PC every 0.2 sec. Although this procedure discards half the data, it seems sufficient to obtain compressed sound spectrogram image of breathing events.

Sleep stages and apnea/hypopnea events were scored according to the standard criteria.18 Hypopnea was defined as an episode of airflow amplitude reduction (nasal pressure signal > 50% or square root transformed nasal pressure signal > 30%) lasting ≥ 10 seconds and associated with ≥ 3% oxygen desaturation or an arousal. We assumed that a > 50% reduction in nasal pressure signal correspond to a > 30% reduction19 in square root transformed nasal pressure signal. A RIP-sum signal (> 30% reduction) was used to detect hypopnea in case of a nasal pressure signal failure. An apnea with or without respiratory effort was scored as obstructive or central, respectively. A hypopnea was scored as obstructive if it was associated with increased snoring, thoracoabdominal paradox, flattened nasal pressure contour, or increased pulse transit time swing. Otherwise it was scored as central. A mixed apnea was classified as obstructive in this study. The apnea-hypopnea index (AHI) was defined by the number of apnea/hypopnea events per hour of total sleep time (TST).

Development of the Neural Network Structure

Several thousand typical breathing events were sampled as pilot data, for which TS spectrogram images (64 rows corresponding to 22 to 700 Hz, and 300 columns corresponding to 60 seconds; 64 × 300 dots, 24-bit color; see Figure 1 legend) were constructed. The breathing events were classified into eight patterns: normal breathing, snoring, snoring with hypopnea, obstructive apnea, central apnea, body movement, vocalization, and irregular breathing. We tried to discriminate these images using several types of DNNs, including convolutional (two to four convolutional layers with one to two fully connected layers) and recurrent DNNs, on a PC-based deep learning tool (Neural Network Console; Sony Network Communications Inc., Tokyo, Japan). Based on the trial results, a DNN structure was selected for use in subsequent analyses.

Representative tracheal sound spectrogram images for 60 seconds.

Each image consists of pixels in 64 rows and 300 columns. Each pixel represents the power spectrum in dB by its color: dark blue to red colors correspond to low to high dB values. The vertical axis indicates the frequency with a resolution of 10.77 Hz, whereas the horizontal axis indicates the time with a resolution of 0.2 seconds. The numbers 1 to 9 and 0 at the left of each image indicate the type of event (see text for details). The arrows labeled “R” indicate the resumption of breathing events. Breathing events for which the resumption point occurred within 15 to 45 seconds were labeled as 1 to 4 depending on the event type, and these events were counted to obtain the tsAHI.


Figure 1

Representative tracheal sound spectrogram images for 60 seconds.

(more ...)

DNN Training

Sound spectrogram images obtained every 60 seconds (same as previously discussed) were constructed from all the PSG records from the patients in the development group. A total of 763,236 images were constructed. These image frames were automatically labeled with the breathing event status and the sleep/wake status using a custom-made PC program. Using these labels, two different DNN models were trained independently: one was for breathing event status classification and the other for sleep/wake status classification.

The breathing statuses were as follows (Figure 1): (1) central apnea with termination (Figure S1 in the supplemental material), (2) obstructive apnea with termination (Figure S1), (3) central hypopnea with termination (Figure S1), (4) obstructive hypopnea with termination (Figure S1), (5) apnea without termination (Figure S2 in the supplemental material), (6), hypopnea without termination (Figure S2), (7) irregular breathing or body movement (Figure S2), (8) normal breathing (Figure S3 in the supplemental material), (9) snoring (Figure S3), and (0) others. Mixed apnea was regarded as obstructive apnea. These labels were based on stored records of manual scoring for apnea/hypopnea events (for statuses 1 to 6) or automated analyses using RIP-sum, electromyogram and TS signals (for statuses 7 to 9 and 0, the details are described in the supplemental material). Image frames that included an apnea/hypopnea event were labeled as 1 to 4 when the event terminated within 15 to 45 seconds during the 60-second frame. This labeling method can prevent detection duplication when event detection is performed by the trained DNN using half-overlapping shifting windows (Figure 2).

Relationship between tracheal sound image frames (60 seconds) and polysomnography epochs (30 seconds).

To estimate breathing events or sleep stages using the trained DNN, 60-second image frames were successively shifted by 30 seconds for the all-night data. The breathing events were labeled or detected only when the event termination was located in the gray zone of the figure. The sleep stage of the image frame was labeled by the gray zone epoch. Therefore, the sleep stage estimated by the trained DNN is for the latter half of the image frame. DNN = deep neural network.


Figure 2

Relationship between tracheal sound image frames (60 seconds) and polysomnography epochs (30 seconds).

(more ...)

For the sleep/wake status, each 60-second image frame was assigned a label of 0, 1, 2, or r based on the sleep stage (W, N1, N2 + N3, and R, respectively) of the 30-second PSG epoch corresponding to the latter half of the image frame (Figure 2).

Twenty thousand images of each status were randomly selected and used for breathing event discrimination training, and 50,000 images of each status were used for sleep/wake status discrimination training.

Validation of the Trained DNN

A 60-second TS spectrogram image was successively constructed every 30 seconds (50% overlap) for each patient in the validation group. The images were separately introduced to the DNN for sleep/wake status classification and breathing event classification.

The DNN-determined sleep/wake status (0, 1, 2, r) and breathing status (1-9,0) were compared with the PSG manual scoring result (W, N1, N2/3, R) and the PSG breathing status (1-9,0) on an epoch-by-epoch basis, respectively. The agreement was assessed using kappa statistics.

The TST calculated from the TS DNN analysis (tsTST; sum of DNN-determined sleep status time) was compared with that determined using manual scoring of the PSG data. The agreement between both TSTs was then assessed using the intraclass correlation coefficient (ICC) and a Bland-Altman plot.

The time of occurrence of breathing statuses 1 to 4 was marked and was counted as an apnea/hypopnea event if the epoch or the preceding epoch was estimated to be a sleep status of 1, 2, or r. This analysis was used to determine the TS-derived apnea-hypopnea index (tsAHI: number of apnea/hypopnea events per hour of tsTST), which was then compared with the PSG-derived AHI. The agreement between the two AHIs was assessed using the ICC and a Bland-Altman plot. For the purpose of evaluating the event type discrimination ability, the proportion of apnea/hypopnea events detected as central type by the DNN analysis was compared with that by PSG results.


Patient Characteristics

Of the 1,852 patients, 1,488 (80%) were male. The mean age of the patients was 52 years, and the mean body mass index was 25.7 kg/m2. The median AHI was 20.1 events/h. Overall, 262 patients were not apneic (AHI < 5 events/h), 466 patients had mild OSA (AHI, 5–14.9 events/h), 468 patients had moderate OSA (AHI, 15–29.9 events/h), and 659 patients had severe OSA (AHI ≥ 30 events/h). The characteristics of the development and validation groups are shown in Table S1 in the supplemental material. The validation group consisted of patients with more severe OSA compared with the development group.

Development of Neural Network Structure

The use of both convolutional and recurrent neural networks to discriminate breathing events was examined. As a result, a five-layered neural network was found to have an excellent ability to discriminate these events. The main structure of the network consisted of three convolutional layers and two fully connected layers. To maximize the network’s ability, three max-pooling layers, two batch-normalization layers, and four activation function layers were included in the structure (Figure 3).

Neural network structure.

The output map size was 10 for breathing events and 4 for sleep/wake discrimination.


Figure 3

Neural network structure.

(more ...)

DNN Training

DNN training (or machine learning) for discriminating breathing events was performed 100 times for a total of 200,000 training images. The training progress was monitored using the outputs of loss functions of the training dataset and another dataset of 76,289 randomly selected images from the development data set. As a result, the network parameters were considered to be most efficient when training was performed 90 times, and these findings were adopted as the final parameters for the DNN network.

The training for discriminating sleep stage was also performed in the same manner. As a result, network parameters were considered to be most efficient when training was performed 100 times; these findings were adopted as the final DNN parameters.

Validation of Trained DNNs

Sleep/Wake Status

The epoch-by-epoch agreement for sleep stages between the DNN and PSG results is shown in Figure 4 and Table S2 in the supplemental material. The accuracy was 0.67 with a kappa statistic of 0.53 (95% confidence interval [CI], 0.528–0.533), indicating a moderate agreement. Regarding the sleep detection ability irrespective of the specific sleep stage, the DNN had a sensitivity of 0.92, a specificity of 0.72, an accuracy of 0.88, and a kappa statistic of 0.63 (95% CI, 0.627–0.634), indicating a substantial agreement with the PSG results. The positive predictive value of the DNN for sleep detection was 0.93.

Validation of the DNN for sleep stage discrimination.

The columns indicate the epoch number of each sleep stage estimated by the trained DNN for the validation group polysomnography. DNN = deep neural network.


Figure 4

Validation of the DNN for sleep stage discrimination.

(more ...)

Breathing Status

The epoch-by-epoch agreement for breathing status between the DNN and PSG results is shown in Table S3 in the supplemental material. The accuracy was 0.60 with a kappa statistic of 0.54 (95% CI, 0.539–0.542), indicating a moderate agreement. Table 1 shows the epoch-by-epoch agreement for central and obstructive breathing events (apnea + hypopnea) between the DNN and PSG results. The accuracy was 0.82 with a kappa statistic of 0.70 (95% CI, 0.692–0.697), indicating a good agreement.

Epoch-by-epoch agreement for central and obstructive breathing events between the deep neural network analysis and polysomnography results.


table icon
Table 1

Epoch-by-epoch agreement for central and obstructive breathing events between the deep neural network analysis and polysomnography results.

(more ...)

Total Sleep Time

The ICC between the individual TST determined by PSG and the tsTST determined by DNN analysis was 0.67 (95% CI, 0.603–0.727). The ICC was higher for the groups with severe OSA, with values of 0.50, 0.60, 0.71, and 0.74 for the no, mild, moderate, and severe OSA groups, respectively. The mean difference between both TSTs was −3.4 minutes (PSG-TST > tsTST), with a standard deviation of 55.4 minutes (Figure 5).

Agreement between the TST determined using PSG and the tsTST determined by the deep neural network analysis.

The left panel is the TST versus tsTST scatterplot with an identity line. The right panel is a Bland-Altman plot indicating the difference between the TST and the tsTST. A prominent underestimation of the TST was seen among patients without obstructive sleep apnea. PSG = polysomnography, TST = total sleep time, tsTST = tracheal sound-derived total sleep time.


Figure 5

Agreement between the TST determined using PSG and the tsTST determined by the deep neural network analysis.

(more ...)

Apnea-Hypopnea Index

The ICC between the individual AHI and the tsAHI was 0.95 (95% CI, 0.931–0.956). The Bland- Altman analysis showed that the mean difference between the two AHIs was 0.6 (95% agreement limit, −16.4 to 17.6) (Figure 6). A relatively high agreement was preserved even for patients of normal body weight and for patients with a low sleep efficiency. The accuracies of the tsAHI for the diagnosis of patients with an AHI > 5, > 15, or > 30 events/h are shown in Table 2.

Agreement between AHI determined by PSG and the tsAHI determined by the deep neural network analysis.

The left panel is the AHI versus tsAHI scatterplot with an identity line. The right panel is a Bland-Altman plot indicating the difference between the AHI and the tsAHI. AHI = apnea-hypopnea index, PSG = polysomnography, tsAHI = tracheal sound-derived apnea-hypopnea index.


Figure 6

Agreement between AHI determined by PSG and the tsAHI determined by the deep neural network analysis.

(more ...)

Performance of tsAHI derived from a deep neural network analysis of tracheal sounds.


table icon
Table 2

Performance of tsAHI derived from a deep neural network analysis of tracheal sounds.

(more ...)

Dominant Event Type

The proportion of central events was calculated in patients with AHI ≥ 5 events/h in both the PSG and DNN results (Figure S4). All of six central-dominant patients by PSG results were estimated to be central-dominant by the DNN analysis, whereas 19 (7.3%) of 261 obstructive-dominant patients were falsely estimated to be central-dominant by the DNN analysis (sensitivity 1.0, specificity 0.93). The positive and negative predictive value for central-dominance by the DNN analysis was 0.24 and 1.0, respectively. Visual inspection of the TS spectrogram for the falsely estimated patients revealed that the sensitivity of the microphone dropped during the PSG recording in 10 of the 19 patients.


Two DNNs to detect apnea/hypopnea and sleep stages were constructed through training with a large amount of TS spectrogram data labeled with the breathing event and sleep/wake statuses. The DNNs consisted of a multilayer neural network that included three convolutional layers and two fully connected layers. A subsequent validation study demonstrated a close agreement between a PSG-derived AHI and the DNN-derived tsAHI. Moreover, the DNN was capable of discriminating central and obstructive apnea and determining the sleep/wake status with a certain level of accuracy based on TS data alone.

From the perspective of a simplified OSA monitoring method, TS recording with DNN analysis has many advantages over existing devices. The recording is not uncomfortable because it uses only a single small sensor attached to the lower neck. It can yield the tsAHI with the DNN-derived TST as the denominator, resulting in a high agreement with the PSG-derived AHI. In this respect, the strength of tsAHI is demonstrated by the fact that the diagnostic ability of tsAHI was preserved in patients with a low sleep efficiency. In addition, a consistent diagnostic ability irrespective of the body mass index is also very important, because hypopnea events can be missed when limited channel portable monitors are used for patients without obesity because the oxygen saturation can remain unchanged in these patients despite the occurrence of hypopnea events.20 Finally, to the best of our knowledge, this is the first report to describe a single-channel OSA monitoring method capable of detecting and determining the type of apnea/hypopnea events as well as discriminating the sleep/wake status.

The black-box nature of DNN is sometimes thought to be problematic. Regarding this issue, the method used in the current study visualizes the TS as a sound spectrogram; by viewing such representations, technologists or physicians can recognize breathing events, if accustomed to TS images as shown in Figure 1.

The DNN analysis of TS was capable of discriminating the sleep stage with a certain level of accuracy. A previous study reported that an artificial neural network could detect wakefulness by identifying ventilatory irregularity.21 Furthermore, a recent study demonstrated that an analysis of ambient breath sounds could discriminate the sleep/wake status through machine learning using features of the breathing pattern and snoring.22 We speculate that breathing pattern characteristics, snoring, body movement, and voice sounds are important elements recognized by the DNN to discriminate sleep stages. The accuracy of the DNN’s ability to discriminate the four sleep stages (wake, N0, N1, N2/3, and R) was not very high. However, the accuracy of sleep/wake discrimination was relatively high (0.88), comparable to that obtained using wrist actigraphy.23 Unlike actigraphy (specificity, 0.33), in which wakefulness can be mistakenly judged as sleep if the patient does not move, the DNN analysis had a relatively high specificity (0.72; ability to identify wakefulness correctly) for sleep detection.23 This propensity may actually be an advantage of HSAT devices because it is important to exclude the awake time from the recording time when calculating the AHI. Also, the agreement between the TST determined using PSG data and the tsTST was high even among patients with severe OSA (ICC 0.74); these results are unlike those for conventional actigraphy, which has a reduced ability to detect sleep in patients with severe OSA.24 However, a sophisticated actigraphy algorithm developed specifically for patients with OSA has been reported to have a high performance, compared with conventional actigraphy, with an accuracy, sensitivity, and specificity for sleep/wake discrimination similar to those obtained using the currently reported DNN method.25

Regarding the breathing event type discrimination by the DNN, the epoch-by-epoch agreement between the DNN and PSG results is not so good (kappa statistic 0.54). One of the reasons for the disagreement is that many hypopnea events were estimated to be apnea events (and vice versa) by the DNN analysis. For the clinical practice, both apnea events and hypopnea events are counted together to yield a severity index of AHI. If the agreement is estimated for four categories–central apnea/hypopnea events, obstructive apnea/hypopnea events, off-target apnea/hypopnea events, and no apnea/hypopnea events–the kappa statistic was estimated to be 0.70, indicating good agreement. This analysis (Table 1) shows that 13% of central events were falsely estimated to be obstructive, whereas 6% of obstructive events were falsely estimated to be central. It means that patients with a similar number of both central and obstructive events could be estimated to be obstructive-dominant. However, with the analysis about dominant type in each patient, all central-dominant patients were properly estimated, whereas 7.3% of obstructive-dominant patients were falsely estimated to be central-dominant. One of the reasons for this misestimation is considered to be decreased sensitivity of the microphone, because weak breath sound is a feature of central events. The other reasons are that patients with central apnea were too few in the current sample and that PSG scoring of central apnea/hypopnea may not always be accurate. Exact evaluation for clinical usefulness of the event type discrimination will need a balanced sample with patients with central and obstructive apneas.

This study used a large amount of TS data for the DNN training. Only datasets for which the scorer’s comments included notations regarding TS signal failure or inability to complete PSG scoring because of other signal failures were excluded. Therefore, numerous inappropriate images and labels might have been included among the data used for the DNN training. Specifically, determinations of the type of hypopnea event were often inaccurate. Nevertheless, the performance, including event-type discrimination, was relatively good, probably because of the large amount of data used for the training.

TS was recorded using an electret condenser microphone (ECM), which was tightly connected to the neck surface using a rubber attachment with a thin air space between the microphone and the skin. We used four types of ECM depending on the time period. One ECM was not as good as the other ECMs in point of the sensitivity to TS. However, exclusion of the data from this ECM rather deteriorated the DNN training process. It is possible that inclusion of data obtained from various types of microphones for the DNN training makes the DNN system more robust.

This study had several limitations that need to be addressed. First, the DNN training and validation were performed at the same facility. Therefore, the robustness of this method for coping with environmental noise, including snoring from a bed partner, was not examined. However, in our experience, TS from the neck are of very high intensity and far less affected by environmental noises than other methods.26 Second, we used data obtained mainly from patients with moderate to severe OSA for the DNN training, which was also validated using data from predominantly patients with moderate-to-severe OSA. Therefore, its applicability to the general population remains to be tested. Third, the nationality of the entire study population was Japanese. Therefore, applicability to the other ethnic patients, who may have more obesity, remained to be tested. Fourth, we did not include respiratory effort-related arousal (RERA) among the respiratory events. The current definition of OSA includes RERA among obstructive respiratory events. Therefore, we cannot comment on the ability of the DNN to diagnose RERA-dominant OSA.

In conclusion, we demonstrated that TS spectrograms contain information that can be used for sleep/wake discrimination and apnea/hypopnea detection by DNN. We believe that the analysis of TS spectrograms using a DNN has potential as the basis for an innovative HSAT device.


All authors have seen and approved the manuscript. This study was supported by a grant from Japan Society for the Promotion of Science (Grant-in-Aid for Challenging Exploratory Research No. 17K19928). The authors report no conflicts of interest.



apnea-hypopnea index


artificial intelligence


continuous positive airway pressure


deep neural network


electret condenser microphone


home sleep apnea testing


intraclass correlation coefficient


obstructive sleep apnea


personal computer




respiratory effort-related arousal


respiratory inductive plethysmography


total sleep time


TST calculated from the tracheal sound DNN analysis


tracheal sound-derived apnea/hypopnea index



Senaratna CV, Perret JL, Lodge CJ, et al. Prevalence of obstructive sleep apnea in the general population: a systematic review. Sleep Med Rev. 2017;34:70–81. [PubMed]


Franklin KA, Lindberg E. Obstructive sleep apnea is a common disorder in the population—a review on the epidemiology of sleep apnea. J Thorac Dis. 2015;7:1311–1322. [PubMed Central][PubMed]


Rosen IM, Kirsch DB, Chervin RD, et al. Clinical use of a home sleep apnea test: an American Academy of Sleep Medicine position statement. J Clin Sleep Med. 2017;13(10):1205–1207. [PubMed Central][PubMed]


American Academy of Sleep Medicine. International Classification of Sleep Disorders. 3rd ed. Darien, IL: American Academy of Sleep Medicine; 2014.


Bianchi MT, Goparaju B. Potential underestimation of sleep apnea severity by at-home kits: rescoring in-laboratory polysomnography without sleep staging. J Clin Sleep Med. 2017;13(4):551–555. [PubMed Central][PubMed]


Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–118. [PubMed]


Teferra RA, Grant BJB, Mindel JW, et al. Cost minimization using an artificial neural network sleep apnea prediction tool for sleep studies. Ann Am Thorac Soc. 2014;11(7):1064–1074. [PubMed Central][PubMed]


Li A, Quan SF, Silva GE, Perfect MM, Roveda JM. A novel artificial neural network based sleep-disordered breathing screening tool. J Clin Sleep Med. 2018;14(6):1063–1069. [PubMed Central][PubMed]


de Chazal P, Tabatabaei Balaei A, Nosrati H. Screening patients for risk of sleep apnea using facial photographs. Conf Proc IEEE Eng Med Biol Soc. 2017;2017:2006–2009. [PubMed]


Andrés-Blanco AM, Álvarez D, Crespo A, et al. Assessment of automated analysis of portable oximetry as a screening test for moderate-to-severe sleep apnea in patients with chronic obstructive pulmonary disease. PLoS One. 2017;12(11):e0188094[PubMed Central][PubMed]


Urtnasan E, Park JU, Lee KJ. Multiclass classification of obstructive sleep apnea/hypopnea based on a convolutional neural network from a single-lead electrocardiogram. Physiol Meas. 2018;39(6):065003[PubMed]


Kim T, Kim JW, Lee K. Detection of sleep disordered breathing severity using acoustic biomarker and machine learning techniques. Biomed Eng Online. 2018;17(1):16[PubMed]


Erdenebayar U, Park JU, Jeong P, Lee KJ. Obstructive sleep apnea screening using a piezo-electric sensor. J Korean Med Sci. 2017;32(6):893–899. [PubMed Central][PubMed]


Nakano H, Hayashi M, Ohshima E, Nishikata N, Shinohara T. Validation of a new system of tracheal sound analysis for the diagnosis of sleep apnea-hypopnea syndrome. Sleep. 2004;27(5):951–957. [PubMed]


Nakano H, Ikeda T, Hayashi M, Ohshima E, Onizuka A. Effects of body position on snoring in apneic and nonapneic snorers. Sleep. 2003;26(2):169–172. [PubMed]


Akhter S, Abeyratne UR, Swarnkar V, Hukins C. Snore sound analysis can detect the presence of obstructive sleep apnea specific to NREM or REM sleep. J Clin Sleep Med. 2018;14(6):991–1003. [PubMed Central][PubMed]


Aykanat M, Kılıç Ö, Kurt B, Saryal S. Classification of lung sounds using convolutional neural networks. EURASIP J Image Video Process. 2017;2017(1):65


Iber C, Ancoli-Israel S, Chesson AL Jr, Quan SF; for the American Academy of Sleep Medicine. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. 1st ed. Westchester, IL: American Academy of Sleep Medicine; 2007.


Berry RB, Budhiraja R, Gottlieb DJ, et al. Rules for scoring respiratory events in sleep:update of the 2007 AASM Manual for the Scoring of Sleep and Associated Events. J Clin Sleep Med. 2012;8(5):597–619. [PubMed Central][PubMed]


Guilleminault C, Hagen CC, Huynh NT. Comparison of hypopnea definitions in lean patients with known obstructive sleep apnea hypopnea syndrome (OSAHS). Sleep Breath. 2009;13(4):341–347. [PubMed]


Ayappa I, Norman RG, Whiting D, et al. Irregular respiration as a marker of wakefulness during titration of CPAP. Sleep. 2009;32(1):99–104. [PubMed Central][PubMed]


Dafna E, Tarasiuk A, Zigel Y. Sleep-wake evaluation from whole-night non-contact audio recordings of breathing sounds. PLoS One. 2015;10(2):e0117382[PubMed Central][PubMed]


Marino M, Li Y, Rueschman MN, Winkelman JW, et al. Measuring sleep: accuracy, sensitivity, and specificity of wrist actigraphy compared to polysomnography. Sleep. 2013;36(11):1747–1755. [PubMed Central][PubMed]


Kim MJ, Lee GH, Kim CS, et al. Comparison of three actigraphic algorithms used to evaluate sleep in patients with obstructive sleep apnea. Sleep Breath. 2013;17(1):297–304. [PubMed]


Hedner J, Pillar G, Pittman SD, Zou D, Grote L, White DP. A novel adaptive wrist actigraphy algorithm for sleep-wake assessment in sleep apnea patients. Sleep. 2004;27(8):1560–1566. [PubMed]


Nakano H, Suzuki T, Yamauchi M, Ohnishi Y, Maekawa J. [Relationship between snoring measured at the anterior neck and ambient noise]. Clin Pharmacol Ther. 2003;13:333–337

Supplemental Material

Supplemental Material

(.pdf | 955 KB)