The Accuracy, Night-to-Night Variability, and Stability of Frontopolar Sleep Electroencephalography Biomarkers
To assess the validity of sleep architecture and sleep continuity biomarkers obtained from a portable, multichannel forehead electroencephalography (EEG) recorder.
Forty-seven subjects simultaneously underwent polysomnography (PSG) while wearing a multichannel frontopolar EEG recording device (Sleep Profiler). The PSG recordings independently staged by 5 registered polysomnographic technologists were compared for agreement with the autoscored sleep EEG before and after expert review. To assess the night-to-night variability and first night bias, 2 nights of self-applied, in-home EEG recordings obtained from a clinical cohort of 63 patients were used (41% with a diagnosis of insomnia/depression, 35% with insomnia/obstructive sleep apnea, and 17.5% with all three). The between-night stability of abnormal sleep biomarkers was determined by comparing each night's data to normative reference values.
The mean overall interscorer agreements between the 5 technologists were 75.9%, and the mean kappa score was 0.70. After visual review, the mean kappa score between the autostaging and five raters was 0.67, and staging agreed with a majority of scorers in at least 80% of the epochs for all stages except stage N1. Sleep spindles, autonomic activation, and stage N3 exhibited the least between-night variability (P < .0001) and strongest between-night stability. Antihypertensive medications were found to have a significant effect on sleep quality biomarkers (P < .02).
A strong agreement was observed between the automated sleep staging and human-scored PSG. One night's recording appeared sufficient to characterize abnormal slow wave sleep, sleep spindle activity, and heart rate variability in patients, but a 2-night average improved the assessment of all other sleep biomarkers.
Two commentaries on this article appear in this issue on pages 771 and 773.
Levendowski DJ, Ferini-Strambi L, Gamaldo C, Cetel M, Rosenberg R, Westbrook PR. The accuracy, night-to-night variability, and stability of frontopolar sleep electroencephalography biomarkers. J Clin Sleep Med. 2017;13(6):791–803.
The use of cardiovascular, tissue, and blood biomarkers are quite common in clinical research and clinical practice in large part because these measures characterize and repeatedly demonstrate a context for the interpretation of clinical outcomes across treatments and conditions.1 Mounting evidence has linked sleep phenomenology with the manifestation of a number of chronic diseases such as hypertension, heart disease, and diabetes.2 Emerging evidence also suggests that sleep architecture and sleep continuity (ie, sleep biomarkers) may be beneficial in monitoring brain health in the setting of management of neuro-degenerative disorders. For example, sleep spindle characteristics during non-rapid eye movement (NREM) sleep have been associated with cognitive decline in patients with Parkinson disease,3–5 and reduced slow wave sleep has been associated with increased beta amyloid (directly linked to Alzheimer disease) concentrations in the cerebrospinal fluid.6 Because sleep spindles and slow wave sleep are believed to be associated with the metabolic clearance systems of the brain,7 it is now argued that the routine monitoring of change in these measures are useful in evaluating the risk for, or progression of neurodegeneration.8
Current Knowledge/Study Rationale: The accuracy of autostaged frontopolar electroencephalography (EEG) compared to human-scored polysomnography (PSG) in patients with sleep disorders has not been established. The repeatability of sleep biomarkers acquired in the home of patients with insomnia is not currently known.
Study Impact: This study demonstrates that autoscored multichannel frontopolar EEG is as accurate as human-staged PSG, and ranks sleep biomarkers by variability and stability. The results point toward the use of sleep biomarkers for directing patient care and for assessing the effect of pharmacological interventions on sleep quality.
Although sleep biomarkers are commonly used as outcome measures and covariables in clinical trials and clinical research,2 these biomarkers are rarely used in clinical practice to predict outcomes or direct therapies that could improve morbidity and mortality. For example, the diagnosis of insomnia is commonly based solely on a patient's self-report. However, there is clear evidence of an objectively measured insomnia subtype (ie, insomnia with short sleep time) which has been associated with increased inflammatory state profiles and significant morbidity and mortality.9,10 Evidence also suggests that electroencephalography (EEG) based sleep time can be used to identify a “normal sleep duration” insomnia phenotype and positively predict those individuals most likely to respond to cognitive behavioral treatment intervention (the now recognized first-line therapy for insomnia).11 Additional phenotypes are emerging through the use of sleep biomarkers related to depressive and anxiety disorders and/or stress-related insomnia.12,13
Critical cost-benefit analysis discussion remains paramount when the practitioner and patient decides if and what type of pharmacotherapeutic sleep agent is most appropriate for treating their respective sleep complaint.14,15 Moreover, the United States Food and Drug Administration has continued to stress the importance of monitoring all classes of sleep aides due to concerns over safety and habituation16,17 calling for the potential clinical value for repeated-measures dose-response EEG studies. Although classes of drugs are currently available to enhance slow wave sleep (ie, carbamazipine, gabapentin, and tiagabine),2 the relative cost of acquiring the physiological data (primarily via in-laboratory polysomnography [PSG]) needed to objectively evaluate the long-term health benefits of drugs that deepen sleep, and to identify patients who may benefit from this type of intervention has been a limiting factor.
A prerequisite for establishing the validity and clinical utility of a sleep biomarker(s) is to demonstrate its accuracy and its reproducibility.1 Two previous reports evaluated the accuracy of the Sleep Profiler device used in this study, a portable, self-applied, multichannel recorder that could potentially meet the requirements for simple and routine assessment of sleep architecture and sleep continuity (ie, sleep biomarkers). One study compared the agreement between the autoscoring before and after manual editing to human-scored PSG in healthy subjects.18 A second study evaluated the agreement with PSG in a group that included elderly control patients for an Alzheimer disease investigation.19 The first part of the current study was intended to demonstrate the validity of sleep biomarkers by first evaluating the accuracy based on the agreement in the autostaging from frontopolar sites to simultaneously acquired PSG in a group of patients mainly referred for probable obstructive sleep apnea. In the second part of this study, we evaluated the night-to-night consistency of sleep biomarkers identified by this device in a clinical population of insomnia patients vulnerable to a first-night effect. Variability assessments were conducted to identify those biomarkers, which provide a high degree of between-night stability such that a medication dose effect might be measurable in a single night. The stability of an abnormality across nights was used to identify the sleep biomarkers that may be useful in predicting a phenotypic trait, or in confirming a disease state.
Biomarker Accuracy: Part 1
For this Institutional Review Board-approved prospective accuracy study, 65 subjects underwent laboratory PSG at Complete Sleep Solutions in Murrieta, California, United States using Alice 5 systems (Philips Respironics, Monroeville, Pennsylvania, United States) while simultaneous multichannel frontopolar EEG recordings were acquired with a forehead-worn recording device (Sleep Profiler, Advanced Brain Monitoring, Carlsbad, California, United States). The sleep laboratory study technician who conducted the study assisted the subject apply the recording device.
After the PSG biocalculations were performed, the recording device was turned on by the technician. Subjects were instructed to sit up in bed and slowly count to 10 so that the sound to be used for signal synchronization could be captured by both the PSG and the forehead EEG device followed by “lights out” and instructions to recline. The 2 sets of recordings were synchronized by first identifying the clock time associated with the start of a PSG epoch just prior to the 10-count. This clock time established the start of PSG recording time. The Sleep Profiler record was then cut so each epoch started at the same clock time as the PSG.
Forty-seven records from 35 males and 12 females (ages 23–77 years) with a minimum of 3 hours of PSG-based recording time and ≥ 85% good signal quality across all 3 frontopolar EEG channels were submitted for staging by 5 independent scorers. Eleven records did not meet the minimum recording time criteria (ie, split-night studies) and 7 had ≤ 85% good EEG quality. Forty records were from subjects referred for a PSG for an assessment of sleep-disordered breathing and the rest were from presumably healthy controls. Scorer 1 was a registered polysomnographic technologist (RPSGT) affiliated with the American Academy of Sleep Medicine (AASM)-accredited site where the PSG studies took place. Scorers 2 and 3 were independent RPSGTs, and scorers 4 and 5 were technologists specialized in sleep staging for research studies conducted at New York University. All of the PSG records were scored according to the AASM-recommended criteria. Scorer 1 staged sleep with a screen view that presented all of the PSG signals.20 Scorers 2 through 5 staged sleep using only the EEG, electroocular (EOG), and electromyographic (EMG) signals (ie, blinded to the cardiorespiratory signals). The staged epoch was exported and pooled into a single cross-tabulation for all studies. Interrater comparisons between the 5 scorers were made using Cohen kappa scores and by- and across-sleep stage agreements. Each scorer's staging was compared to the autostaging, before and after expert editing. Accuracy was also assessed with comparisons between those epochs where a majority of scorers agreed with the autostaging, both before and after review.
Biomarker Variability and Stability: Part 2
A retrospective study was conducted on studies acquired between August 2013 and July 2016 from the Sleep Disorders Center of Prescott Valley (Prescott Valley, Arizona, United States) and the Integrative Insomnia and Sleep Health Center (San Diego, California, United States). Patients were briefly instructed on how to self-apply the Sleep Profiler based on the written instructions. With in-home, self-application, patients acquired 2 nights of recordings and completed a medical questionnaire and sleep diary. An internet-based portal was used to enter the questionnaire responses, process the studies, enable expert review of the studies, and export the variables used for the analyses.
Selected records required 2 nights of data with no more than a 3-hour difference between each night's recording time, and with at least 85% of the epochs staged with the EEG channel. Records were further excluded based on missing entries in the medical questionnaire. These responses were used to characterize the cohort (Table 1) and to assess the effect of self-reported prescription sleeping aids and antidepressant medications on sleep architecture and sleep continuity. The responses were also used to select records that met the study's inclusion criteria of clinical insomnia. The sleep diary responses enabled confirmation that the use of prescription sleeping aids on the nights of the study matched the frequency of prescription sleep aids' use reported on the medical questionnaire.
Inclusion into the insomnia cohort required an Insomnia Severity Index (ISI) score ≥ 14 (n = 34),21 administration of prescription sleep aids (n = 23), or having an ISI ≥ 12 with either a reported clinician diagnosis of insomnia (n = 4) or a concurrent diagnosis of depression and anxiety (n = 2). Sleep biomarkers were extracted from 63 patient records, after expert review. To reduce the influence of outliers on the first-night reliability analyses, 3 subjects were excluded from rapid eye movement (REM) latency analysis due to less than 3 minutes of REM across an entire night. For the analysis of autonomic activation, 16 subjects were excluded due to less than 75% of reliable pulse rate recordings in each night.
Sleep biomarkers obtained from each night and averaged across both nights were compared to clinical thresholds to identify the stability of abnormal sleep characteristics. When available, abnormality was defined by the 10th or 90th percentile age- and sex-matched reference values reported for the Sleep Heart Health Study.22 For the remaining sleep measures, reference thresholds were obtained using the first or third interquar-tile cutoffs from a group of healthy controls. Inclusion in this healthy control group (n = 48) required not taking prescription sleeping aids or antidepressants, no diagnosis of obstructive sleep apnea, having an ISI score ≤ 12, and daytime somnolence (Epworth Sleepiness Scale), depression (Patient Health Questionnaire-9) and anxiety (Generalized Anxiety Disorder 7-Item) scores ≤ 10. The controls included records acquired by Washington University for the Knight Alzheimer's Disease Research Center (16 males and 18 females, ages 65 to 89 years) and by Advanced Brain Monitoring (9 males and 5 females, ages 24 to 44 years). Subjects 70 years or older were excluded from the interquartile range calculations for sleep spindle activity.
Intraclass correlations were used to compare the reliability of the first-night sleep quality measures. Bland-Altman plots were used to assess potential bias resulting from a first-night effect; t tests and Fisher exact tests were applied to the normally distributed/transformed variables.
The Sleep Profiler used in this study was a battery-powered recorder designed to acquire 3 frontopolar EEG signals between AF7-AF8, AF7-Fpz, and AF8-Fpz (Figure 1). The EEG signals were sampled at 256 Hz with a gain of ± 1000 μV, and filtered with a 0.1-Hz high-pass and 80-Hz low-pass filter. The scalp/ electrode impedances were obtained at each sensor site at the start of the study and every one-half hour throughout the night. The photoplethysmography (sampled at 100 Hz) obtained from the forehead was used to calculate the pulse rate at 1 Hz. Snoring sounds acquired with an acoustic microphone were sampled with firmware at 2 kHz, root mean square filtered to create a sound envelope that was downsampled to 10 Hz and saved in the study record. A triaxial accelerometer was sampled at 100 Hz, with the X|Y|Z signals converted to 360 degree angles and saved at 10 Hz.
For the in-home studies, patients were instructed to wipe their forehead thoroughly with an alcohol wipe prior to affixing the device to obtain acceptable skin-sensor impedances. Voice messages alerted patients when the impedances were too high at the beginning of the night. Patients replaced the forehead sensors prior to night 2.
Automated sleep staging was applied to sleep markers extracted from each 30-second epoch for the 3 frontopolar EEG channels (Figure 2). After rejection of periods when the absolute amplitude is ≥ 500 μV, the signals were notch filtered, and then infinite impulse response band pass-filtered to obtain 16 Hz samples of the power values for delta (1–3.5 Hz), DeltaC (delta power corrected for ocular activity), theta (4–6.5 Hz), alpha (8–12 Hz), sigma (12–16 Hz), beta (18–28 Hz), and EMG bands (> 40 Hz with a 80 Hz, 3 dB rolloff). A second set of power values was obtained after application of a 0.75-Hz high-pass filter. Both filtered and unfiltered power spectral data from the three frontopolar EEG channels were used to characterize and stage sleep.
If at least 15 seconds of valid data were available, the AF7-AF8 channel was used for autostaging, followed by the AF7-Fpz and AF8-Fpz channels. When either the AF7-Fpz or the AF8-Fpz signal was used for staging, the power spectra were increased to compensate for signal attenuation attributed to amplifier common mode rejection resulting from the substantially shorter interelectrode distances.
The power spectra values, averaged from 16 to 4 Hz, were used to detect sleep spindles, characterized by spikes in the absolute and relative alpha and sigma power that met empirically derived thresholds designed to ensure there was a sufficient sigma component (Figure 3). The minimum spindle length was 0.25 Hz with no maximum spindle length. To reduce the likelihood of misclassifying pseudospindles, the beta and EMG power bands required simultaneous suppression relative to the alpha and sigma power. When the sigma power peaked prior to the alpha power, the spindles were classified as fast-dominant. The spindle length (ie, from the start to the end of the spindle) was determined by either the alpha or sigma power crossing a minimum absolute power threshold. Spindle duration was tal-lied as the sum of all spindle lengths. Cortical arousals were detected when 3 or more seconds of absolute and relative alpha power exceeded the median alpha power from the preceding 2 minutes. Increased absolute and relative EMG powers were similarly compared to median values to detect micro-arousals. The duration of detected spindles and arousals were marked with a stripe in the staged channel. If a spindle occurred during a cortical arousal, the spindle was not marked.
Patterns in the AF7-Fpz and AF8-Fpz signals that characterized and distinguished slow eye rolls from phasic REMs were recognized by computing Pearson correlations of the 6.5 Hz zero phase infinite impulse response filtered outputs. The distinction between elevated delta power resulting from phasic REM versus slow wave sleep was made by comparisons of the filtered and unfiltered DeltaC.
Autonomic activation events were detected when the pulse rate changed by 6 or more beats per minute compared to the pulse rate 10 seconds prior and/or 10 seconds subsequent to the current second. Movement intensities from 0 to 5 were assigned to each second based on the sum of the actigraphy changes across the X|Y|Z axes. Position changes typically resulted in a movement ranking of 3 or more. For each epoch, the average magnitude (dB) and the percentage of time snoring were calculated. Crescendo and loud snoring events terminating with a significant decrease in sound or a single loud snore (ie, gasp) were identified as likely indicators of sleep-disordered breathing for visual inspection.
For each 30-second epoch, the power spectra values were averaged, and the number of arousals, spindles, movements, snoring, and other patterns tallied. These data in combination with ratios of the mean power spectra were used to assign sleep stages. A discriminant function analysis was used to differentiate stages N2 and N3. The number of seconds (ie, slow wave seconds) that delta power exceeded a threshold equivalent to ± 30 μV was primarily used to stage N3. Epochs with relatively high beta and low EMG power were classified as tonic REM and distinguished from tonic stage N1, based on temporal proximity to epochs with phasic REM. Brief but intense periods (eg, < 5 seconds) of increased EMG power typically resulted in an epoch being staged awake. A sub-stage labeled Light N2 was assigned to N2 epochs with no spindles and characterized with either a K-complex or dominant theta activity with relatively elevated levels of alpha or EMG power.
In the absence of a sleep spindle, the first sleep epoch following an awake epoch was staged N1. Epochs with at least 1 sleep spindle and no arousal, 2 sleep spindles and 1 arousal, or at least 3 sleep spindles with 2 cortical arousals were staged N2. Epochs with a cortical arousal or microarousal and no sleep spindle, or 2 arousals with 2 spindles were staged N1.
A number of additional rules and thresholds were developed to identify epochs that should be visually inspected. During visual inspection, these epochs were identified by the presentation of “primary” and “secondary” stage stripes (Figure 2). The greatest number of epochs assigned primary and secondary strips were those transitioning between awake and sleep. The software enabled selection of a setting, applied to the entire record that biased staging toward an increased classification of sleep. This awake = low setting reduced the editing needed for individuals with severe disruptions attributed to sleep-disordered breathing. The default setting applied a sequence of discriminant function analyses (awake versus N1, awake versus N2, and awake versus REM) to reassign the primary stage as awake, and provide a secondary stripe based on the original sleep stage. Epochs staged REM during the first 10 minutes after sleep onset were assigned a primary stage N1 and a secondary stage of REM. Intermittent NREM epochs within a block of REM epochs were assigned REM as the primary and NREM as the secondary stage. Intermittent REM epochs in proximity to multiple awake epochs were assigned primary and secondary stages of N1 and REM, respectively. Epochs with large phasic REM but staged NREM due to the incomplete correction for ocular activity were assigned an REM secondary stage.
Sleep onset was based on 4 consecutive sleep epochs during the initial 5 minutes of recording time, and 3 consecutive epochs after 5 minutes of recording time. All epochs were staged awake prior to sleep onset. The cortical and autonomic indexes were based on the total number of events divided by sleep time. Autonomic activation indexes were additionally computed for events that occurred during NREM and REM sleep time. Spindle and slow wave events were divided by the time staged N2 and N3.
For visual inspection, the signal acquired from AF7-AF8 was labeled EEG, default scaled to ± 50 μV and presented with the ocular activity removed. The signals from AF7-Fpz and AF8-Fpz were labeled LEOG and REOG and default scaled to ± 75 μV.
One rater visually inspected the frontopolar EEG signal waveforms along with the presentations of the alpha, sigma, beta, and EMG power to confirm the veracity of the autostaging near REM transitions and when epochs were assigned secondary stripes (Figure 2). Minimal editing was made when the primary/secondary stripes were awake/N1 in the absence of snoring with the goal to: (1) adjust the beginning of the study when necessary if the device was turned on prior to lying in bed; (2) review the autoscoring to accurately identify sleep onset, and the EMG power should be lower than the alpha, sigma, and beta; (3) review epochs with gray “secondary” stripes; (4) check the start and end of all REM periods, and look for missed REM at the end of the first sleep cycle; (5) inspect the dark blue signal used for staging when it includes segments marked red (rejected signals), ie, gross differences between the LEOG and REOG indicate artifact that may cause stage N3 or REM to be incorrectly staged; and (6) adjust the end of the study if the device was inadvertently left on (signals are flat). A detailed description of the sleep staging rules and application of expert editing is provided in the training video.23
After pooling of all staged PSG epochs, the mean interscorer agreements between the 5 technologists were 75.9% overall, with 90.1%, 51.3%, 75.5%, 67.2%, and 91.1% for stages awake, N1, N2, N3, and REM, respectively. The mean kappa score across the 10 comparisons was 0.70 (range 0.61 to 0.78). Scorers 1, 4, and 5 staged over 4.5 times more N1 epochs than scorer 2, and over 2.5 more N1 epochs than scorer 3. As compared to scorers 1, 4, and 5, scorers 2 and 3 staged over 2.5 times more N3 epochs.
When unedited default autostaged epochs were compared to each of the 5 scorers, the mean overall agreement was 71.3%, and 80.9%, 22.9%, 79.7%, 74.9%, and 71.5% for stages awake, N1, N2, N3, and REM, respectively. After expert review, the mean overall agreement improved to 73.9%, and to 85.3%, 27.9%, 80.6%, 75.3%, and 77.6% for stages awake, N1, N2, N3, and REM, respectively. The mean autostaging kappa score increased from 0.63 (range 0.62–0.65) to 0.67 (range 0.65–0.68) after visual review.
Table 2 and Table 3 compare the majority agreement in staging among the 5 scorers to the autostaging without and with the increased classification of awake time (which affected 3% of the epochs). Table 4 presents comparisons after the autostaging was expert reviewed. The primary benefit of technical review was improved sensitivity and specificity between REM and awake, and transitions between REM and N2. With each step, the overall agreements increased from 75.8% to 77.1 and 80.0%. As a result of the expert review, 3.3% of the total number of epochs was changed.
In contrast to the contingency tables, which compared exact agreement of each epoch, Table 5 compares the agreement among sleep architecture biomarkers while allowing for different epochs to contribute to the method-specific percentage of staged time, and incorporating method specific differences in the staging of sleep and wake. Although minimal differences were observed across stages awake and REM, there was a threefold difference across scorers in the percent time staged N1 and N3. Differences resulting from the 3 autoscoring methods were far less than that of PSG interscorer variability.
The scatterplots in Figure 4 show a strong concordance between majority agreement and the autostaging after expert review for all stages except N1. The mean biases point to the autoscoring overreporting awake time, and underreporting stages N1 and REM, compared to majority agreement.
Figure 5 presents scatterplots displaying the night-to-night variability and between-night bias for biomarkers useful in characterizing sleep. Sleep spindles, autonomic activation index, and slow wave sleep had the most consistent patterns, whereas sleep time, sleep latency, and wake after sleep onset had the greatest night-to-night variability. The mean alpha and sigma power across stages N2 and N3 exhibited very limited night-to-night variability (intraclass correlation of 0.97 and 0.98) given the relative prominence of these frequencies during sleep spindles and slow wave sleep.
There was significantly greater sleep spindle activity, and greater sigma and beta power across stages N2/N3 in those taking antidepressants (P < .01). Patients taking antihypertensive medications exhibited significantly less stage N3, coupled with increased Light N2, and lower delta, theta, alpha, and sigma power across stages N2/N3 (P < .001). Additionally, patients taking antihypertensive medications exhibited lower autonomic activation during REM sleep (P < .03).
The stability of the sleep biomarkers was evaluated by comparing individual results to clinical thresholds used to assist with clinical interpretation (Table 6). Biomarkers with the strongest trait characteristic would be consistently normal or abnormal on nights 1 and 2 (ie, stable) despite the night-to-night variability. The biomarkers with the highest stability were stage N3/slow wave sleep, spindle activity, and autonomic activation; sleep biomarkers were associated with chronic disease and neurodegeneration.2,8 For measures with lower stabilities, 2-night averages were needed to determine a normal/abnormal state in more than 25% of patients. Short sleep time, based on fewer than 6 hours, had the least night-to-night stability, and there were substantial differences in the proportion of those classified as abnormal based on the 10th percentile versus fewer than 6 hours of sleep across 2-night averages (14% versus 52%, respectively). A greater number of patients were identified as long sleepers based on the 90th percentile compared to the more than 10-hour threshold; however, neither measure could be thoroughly evaluated, because the inclusion criteria required complaints of chronic insomnia. Use of prescription sleeping aids did not influence the distributions of cases classified as either normal or abnormal. Based on the stability of the majority of sleep biomarkers, a 2-night study appears necessary for accurate profiling or assessment of treatment outcomes.
A framework for interpreting abnormal sleep biomarkers within the construct of a clinical evaluation or research investigation is presented in Table 7.
This study demonstrated that sleep biomarkers obtained from the multichannel frontopolar EEG recording device (Sleep Profiler) are valid, accurate, and reproducible. Three analyses were applied to assess the autostaging accuracy of frontopolar recordings compared to human scored PSG-based sleep architecture measures: direct agreement of epoch-by-epoch staging, total staged sleep times, and proportional agreement based on the percentages of staged time. Our study supports previously published studies reporting a fairly broad range of interscorer variability.24,25 Thus, our findings also underscore the difficulty in validating the accuracy of autostaging software against the current gold standard practice of technician-based visual scoring24,26–28 or by RPSGT certification,29–31 even when the same equipment and software is used.
The interscorer variability in staging N1 and N3 observed in this study is consistent with reports from the AASM Inter-Scorer Reliability Program.25 One of the challenges of validating autostaging is overcoming the inherent scorer bias. If this study had been conducted where all of the scorers were trained to stage sleep in a manner similar to that of scorers 2 and 3, the autoscoring would have underreported stage N3 and overreported stage N1. Conversely, if the training reflected the styles of scorers 1, 4, and 5, the autoscoring would overreport stage N3 and underreport stage N1. In the absence of a true gold standard, the autoscoring achieved a level of “Goldilocks” accuracy.32
Autoscoring combines speed and consistency, with the capability to further improve results with a focused, relatively brief technical review.18,26 In this study, changes in alpha, sigma, beta, and EMG power viewed on a 10-minute screen enable recognition of 30-second epochs that should be manually inspected due to transitions between NREM and REM or between sleep and wake. Autodetected sleep spindles and arousals were visually marked in the record, a technique shown to reduce interscorer variability.29 The presentation of secondary stripes assisted in the identification of epochs that might benefit from visual inspection, a concept similar to the editing helper feature described by Younes et al.33 A total of 3.3% of autostaged epochs were changed, with the greatest effect noted in the improved staging of REM, N1, and awake, specifically between awake and REM and transitions between REM and stage N2, consistent with a previous report.18 The Sleep Profiler provides the capability to optionally acquire submental EMG; however, the accuracy results in this study were achieved without use of this signal. Studies are underway to evaluate the benefit of including this signal for the in-home assessment of REM behavioral disorder, another sleep biomarker associated with neurodegeneration.34 After expert editing, the kappa score between frontopolar sleep EEG and majority agreement from PSG increased to 0.67, approximately the same as the mean kappa score from the 10 comparisons between the 5 scorers (0.70). Table 6 results suggest that accuracy may be further improved with more targeted editing during transitions between REM and N1.
Due to the challenges of human application of the N1 staging rules, a 5-fold difference in the percentage of epochs staged N1 was observed across scorers. Although there was relatively poor agreement between epochs scored by majority agreement and the autostaging, the percent times staged N1 for the 2 methods were quite similar, suggesting the discrepancies were related to the timing rather than inaccuracy in the recognition of light NREM sleep. Given the difficulty in human staging N1 and N3, it is likely that computer-assisted scoring is required to further characterize sleep biomarkers based on the depth of NREM.35,36 For this study, a subtype of stage N2 (ie, Light N2) was used to classify epochs with K-complex or dominant theta activity with relatively elevated levels of alpha or EMG power, and absent spindle activity. The staging of Light N2 exhibited less night-to-night variability than the overall stage N2 (intraclass correlation 0.82 versus 0.77). Light N2 was highly stable across both nights and it increased significantly in those taking antihypertensive medications, whereas deeper stage N2 did not. This finding not only validated Light N2 as a unique biomarker for depth of NREM sleep, but also highlighted the benefit of subcharacterizing NREM sleep. The user option that increased autoscored sleep time resulted in improved concordance between apnea-hypopnea indexes obtained by PSG and with Sleep Profiler with cardiorespiratory signals was also evaluated.
Slow wave sleep has emerged as an important sleep bio-marker, given its association with numerous chronic diseases and neurodegeneration.2,7,8 When these patients were stratified by condition, reduced slow wave sleep was noted in those with hypertension, but not depression, sleeping aids, or obstructive sleep apnea. It is unlikely, however, that the clinical application of this sleep biomarker can be broadened when the human scoring of standard PSG is not only time consuming and expensive, but unreliable. In this study the manual staging of N3 ranged from 7.5% to 26.5% of the pooled epochs, approximately equivalent to the 10th and 90th percentile cutoffs for healthy adults.22 For example, 2 sleep research technicians who underwent the same rigorous sleep staging training protocol as part of an academic research program still demonstrated relatively large differences in detected N3 sleep (8.1% and 12.9%, respectively). Conversely when PSG epochs staged N3 by majority agreement were compared to the autoscored forehead EEG, there was less than a 1% difference. Because the manual editing of autoscored N3 is only recommended for epochs with artifact or arousals, the accuracies were similar for the unedited or edited results.
The benefit of multichannel frontopolar EEG is that all of the signal elements needed to visually stage sleep are present, including ocular, spindle, K-complex, slow wave, and cortical activity. The differential recordings acquired with Sleep Profiler were selected because of the simplicity with user application and reduced likelihood of study failure across multinight studies. Additionally, differential recordings do not require removal of heartbeat artifact that can contribute to artifact-induced increases in theta power, and result in the autostaged misclassification of stage N2. The disadvantage of differential recordings is that the signal amplitudes are attenuated and thus the expected amplitudes used to visually or autostage sleep must be scaled (eg, 75 versus 60 μV for stage N3). The Sleep Profiler sensor placements enabled acquisition of frontopolar EEG that can be staged as well as used to detect ocular activity for the differentiation of stages N1 from REM. As compared to the conventional EOG sensor sites, the frontopolar EEG signals include blink activity, but not saccades.
Despite the differences in sleep time, sleep spindle duration showed the greatest night-to-night concordance, suggesting a very strong trait effect. The interscorer agreement achieved by manual sleep spindle staging of 115-second segments in one study was superior to the reliability obtained by human scoring of cortical arousals in another study.24,37 The accuracy of human sleep spindle staging under more realistic conditions, however, would be expected to decline as a result of scorer fatigue, given 5 times the number of sleep spindles were observed in our typical patient record as compared to cortical arousals (mean: 500 versus 100, maximum: 2900 versus 280, respectively). The approach used in this study to autodetect sleep spindles relied on patterns of the power spectral density, rather than extracting spindle patterns with filtering.38 It is likely that the sleep spindle length/duration measured by this power spectra approach will be less than by filtering, because of the rule that marks the spindle length, and as a result of elimination of 11–13 Hz spindles (because of the requirement for peaks in both the alpha (8–12 Hz) and sigma (12–16 Hz) power bands. The sleep spindle detection algorithms used in this study were based on thresholds selected for differential frontopolar EEG. Further research is required to compare the automated sleep spindle detection by this approach with automated routines applied to more conventional EEG spindle detection sites,38 and to evaluate the changes in this measure as it relates to age, sex, and neurodegeneration.3–5,39
A limitation of this study was the relatively dichotomous age representation of our healthy controls with very few participants in this cohort being between the ages of 40 and 60 years. Cognizant of the limitation, we excluded those older than 70 years when establishing the sleep spindle reference values. This enabled exclusion of all but one of the patients taking antihypertensive medication but may have excluded elderly women with sleep spindle activity appropriate for inclusion.40
This study introduced a highly stable sleep biomarker that combined sleep stage and cardiac autonomic tone. In the patients, the autonomic activation index (AAI), a measurement of brief but important episodic changes in pulse rate, was found to be 2 times greater during NREM sleep as compared to REM sleep, a difference similar to the NREM and REM ratio of low- to high-frequency heart rate variability (HRV).41,42 The night-to-night variability of the AAI and HRV were also similar during both NREM sleep (0.82 versus 0.90) and REM sleep (0.80 versus 0.89).41 These findings suggest that both low HRV and low AAI are measuring increased sympathetic dominance,43 a finding supported by the lower REM AAI in patients taking antihypertensive medications. Each autonomic activation event requires a 10-second detection window, thus tallying and computing an index for REM and NREM is possible in patients with limited REM time or who suffer from highly fragmented sleep. By comparison, obtaining a valid stage-dependent HRV measure is more challenging, given the need for a 5-minute detection window. Further investigations are needed to determine if the AAI during REM and NREM is associated with blood pressure, inflammation, and/or hypothalamic-pituitary-adrenal systems. If so, this biomarker may assist in subtyping patients with insomnia exacerbated by comorbid anxiety or depression. Longitudinal studies might benefit from the inclusion of the AAI biomarker combined with sleep spindle activity to monitor cognitive decline in old age, similar to the contributions made by measuring HRV.44
To achieve the study objectives, a prospectively acquired dataset was used to assess accuracy, and a retrospectively acquired dataset was used to assess sleep biomarker variability and stability. This study was limited by the fact that patients with different combinations of comorbidities, medications, etc. were included. For example, patients with depression disorders would be expected to have long REM sleep times, yet only 3% did, possibly because 35% were taking antidepressants, which suppresses REM sleep time. Many of the patients reported depressive or anxiety disorders, yet few had long sleep times (ie, > 10 hours), likely as the result of overlapping use of antidepressant and sleeping aids. At the same time, the heterogeneity of our sample size provides a more true to life sample of the relatively common complexity of comorbid conditions most sleep providers encounter in their sleep patient population.
Sleep biomarkers have been associated with a range of medical and neurodegenerative conditions and a number of medical therapies (Table 7); however, the neurobiological mechanisms underlying these associations remain poorly understood. Interpretive sleep biomarker profiling can serve as a critical step toward developing precision-based clinical protocols to improve the outcomes of patients frequently experiencing various combinations of these sleep and medical conditions. Treating patients with complex chronic medical conditions requires knowledge of the potential interplay between iatrogenic medications effects and sleep. For example, we found that anti-hypertensive medications were associated with increased sympathetic dominance during REM and suppressed slow wave sleep. Given the evidence suggesting learning, memory, and overall neuroprotective/health benefits of slow wave sleep, further studies are needed to examine which classes of anti-hypertensive medications (eg, alpha or beta blockers, angiotensin-converting-enzyme inhibitors, etc.) contribute most to the manifestation of these abnormal sleep biomarkers. Studies utilizing this biomarker approach are also needed to determine whether medications associated with enhanced slow wave sleep or sleep spindle activity (or other positive sleep biomarker features) can provide the same neuroprotective and overall health benefits as natural sleep.
This study confirms the validity of multichannel frontopolar EEG recordings for use in clinical or research applications, and suggests that autoscoring may be superior to human scoring of sleep biomarkers. Furthermore, our findings indicate that 1 night of recording with Sleep Profiler is sufficient to characterize abnormal slow wave sleep, spindle activity, and heart rate variability in patients; however, a 2-night average would improve the assessment of abnormality for the balance of sleep architecture and sleep continuity biomarkers.
Work for this study was performed at Complete Sleep Solutions, Murrieta, CA; Integrative Insomnia and Sleep Health Center, San Diego, CA; and Sleep Disorders Center of Prescott Valley, AZ. All authors have seen and approved the manuscript. Mr. Levendowski and Dr. Westbrook are employees of and shareholders in Advanced Brain Monitoring, Inc. Both would benefit financially if ownership of the Sleep Profiler intellectual property were to be sold to a third party.
autonomic activation index
American Academy of Sleep Medicine
body mass index
heart rate variability
non-rapid eye movement
rapid eye movement
wake after sleep onset
8 Sleep and neurodegeneration: a critical appraisal. Chest; 2017110[Epub ahead of print].
15 Clinical practice guidelines for the pharmacological treatment of chronic insomnia in adults: an American Academy of Sleep Medicine clinical practice guideline. J Clin Sleep Med; 2017;132:307-349, 27998379.
16 FDA Drug Safety Communication: FDA warns of next-day impairment with sleep aid Lunesta (eszopiclone) and lowers recommended dosage. United States Food and Drug Administration website
Accessed March 24, 2017http://www.fda.gov/Drugs/DrugSafety/ucm397260.htm. Published May 15, 2014. Updated February 10, 2016.
17 FDA Drug Safety Communication: Risk of next-morning impairment after use of insomnia drugs; FDA requires lower recommended doses for certain drugs containing zolpidem (Ambien, Ambien CR, Edluar, Zolpimist). United States Food and Drug Administration website
Accessed March 24, 2017http://www.fda.gov/Drugs/DrugSafety/ucm334033.htm. Published January 10, 2013. Updated January 16, 2016.
20 for the American Academy of Sleep MedicineThe AASM Manual for Scoring of Sleep and Associated Events: Rules, Terminology and Technical SpecificationDarien, IL: American Academy of Sleep MedicineVersion 2.0; 2012.
23 Sleep Profiler: Application and editing of sleep staging. YouTube website
Accessed March 24, 2017https://www.youtube.com/watch?v=CN-6vvvXdwI&feature=youtu.be. Published December 18, 2015.
34 Diagnostic thresholds for quantitative REM sleep phasic burst duration, phasic and tonic muscle activity, and REM atonia index in REM sleep behavioral disorder with and without comorbid obstructive sleep apnea. Sleep; 2014;3710:1649-1662, 25197816.
39 Age affects sleep microstructure more than sleep macrostructure. J Sleep Res; 2017117[Epub ahead of print].