To test the sleep-wake scoring reliability of a new wrist-worn sleep monitoring device.
Twenty-seven adult good sleepers underwent 1 night of polysomnography (PSG) while wearing both the new device (myCadian [MC]; CurAegis Technologies, Rochester, New York, United States) and commercially available actigraphy (Actiwatch 2 [AW]; Philips Respironics, Murrysville, Pennsylvania, United States) on their nondominant wrist. PSG tests were manually stage scored. After excluding missing data, 20 participants had full-night data on all three devices with 17,734 total 30-second epochs. Using PSG as the gold standard, pooled epoch-by-epoch agreement for sleep and wake was calculated for each device using percent agreement and Cohen kappa statistic. Positive predictive values for both sleep and wake epochs, as well as sleep continuity statistics, were calculated.
Percent agreement with PSG-scored wake and sleep was 91.3% for MC (kappa = 0.67) and 87.7% for AW (kappa = 0.50). Positive predictive values for sleep epochs were 94.4% and 90.8% for MC and AW, respectively, and 74.5% and 65.6% for wake. Both devices underestimated wake and overestimated sleep compared to PSG. Descriptively, compared to PSG, sleep latency was higher with MC and wake after sleep onset higher with AW. Total sleep time and sleep efficiency were more similar across devices.
The kappa statistic for MC is consistent with a high level of agreement with PSG. Overall, the reliability of MC compared to PSG scoring was slightly more favorable than that of AW. Findings suggest that MC provides reliable sleep-wake scoring during a nocturnal sleep period for good sleepers.
Pigeon WR, Taylor M, Bui A, Oleynk C, Walsh P, Bishop TM. Validation of the sleep-wake scoring of a new wrist-worn sleep monitoring device. J Clin Sleep Med. 2018;14(6):1057–1062.
The supply of wearable technology utilizing wrist-worn accelerometers (ie, actigraphy) to measure sleep has grown significantly in recent years. Price points vary from $100 for widely available commercial products to more than $1,000 per unit when including scoring software and/or data interface platforms for products that have historically been used primarily in research and clinical settings.
The determination of sleep and wake states by accelerometer-based wearable devices is based on movement activity recorded by the device and its associated software. The main device-specific factors that can contribute to a device's ability to accurately determine sleep from wake states include the type of accelerometer used and its technical specifications, the mode of calculating movement activity, and the algorithm used to score sleep and wake epochs. There are relative advantages and disadvantages to the approaches in each of these domains, which are thoroughly reviewed by Ancoli-Israel et al.1
The gold standard for the measurement of sleep and wake states is polysomnography (PSG).2 There is a fairly long history of validation work evaluating the capacity of actigraphy to accurately measure sleep as compared to PSG.3,4 Units that have been available for research and clinical use for many years have been well reviewed and standards exist for their use.5–7 In general, actigraphy tends to underestimate time to fall asleep, scoring early stages of wake as sleep, and is moderately reliable compared to PSG overall. Reliability of actigraphy can be diminished in sleep-disordered patients, particularly for those who have an elevated number of transitions between sleep and wake.8,9 More recently, validation work has been undertaken with many of the newer wearables available at the lower price points, which generally find that they have less precision, compared to PSG, than devices at a higher price point.10–13 Despite the growth in wearable devices that can measure sleep-wake activity, there remains room for commercially available devices that perform well when compared to PSG.
The purpose of the current study was to validate a new wrist-worn sleep-wake monitor, the myCadian watch (MC; CurAegis Technologies, Rochester, New York, United States), which has a triaxial Microelectro Mechanical System, or MEMS, accelerometer configured with a range of ± 4 g, a sampling rate of 25 Hz, and bandwidth of 6.25 Hz. The study was designed to validate the MC against the gold standard for the differentiation of wake from sleep, overnight PSG, and compare it to the performance of a validated sleep-wake monitor, the Actiwatch 2 (AW; Philips Respironics, Murrysville, Pennsylvania, United States) against PSG.
Eligible and consenting research subjects underwent 1 full night of PSG in a University Sleep Research Laboratory (the Sleep Lab) while wearing both the MC and the AW on the wrist of their nondominant hand according to a protocol approved by the University Research Subjects Review Board and following an approved written informed consent process.
Participants were recruited from the community by study flyers and screened by phone. A total of 27 participants were enrolled in the study who met the following eligibility criteria. The inclusion criteria included: (1) 18–64 years of age; (2) willingness to abstain from alcohol, nicotine, marijuana, illicit drugs, and over-the-counter products that may affect sleep for 24 hours (and caffeine for 8 hours) prior to the study visit as indicated by agreement during a phone screen and verified by self-report at the study visit; (3) ability to speak and read English as determined during screening by having subjects read aloud a paragraph of the consent form; and (4) ability to provide informed consent as determined by asking the subjects to repeat the main features of study involvement.
The exclusion criteria included: (1) a body mass index > 34 kg/m2; (2) an Insomnia Severity Index14,15 score of ≥ 10; (3) an Epworth Sleepiness Scale16 score of ≥ 10; (4) the presence or clinical suspicion of sleep apnea, narcolepsy, or circadian rhythm disorder; (5) regular shift work or any shift work in the past 4 weeks; (6) travel across more than two time zones in the past 3 weeks; (7) serious health conditions such as cardiac or blood vessel disease, respiratory conditions (eg, emphysema, chronic bronchitis), cancer, history of myocardial infarction or stroke, or any other conditions deemed by the principal investigator to be serious; (8) current pregnancy; (9) current or recent history (within 3 months) of major psychiatric disorders or drug dependency or history of schizophrenia or bipolar I disorder; (10) current use (or use in the past 3 months) of antipsychotics, mood stabilizers, sleep medications or opiate analgesic medications as determined by self-report and study questionnaires; and (11) symptoms of active illness (eg, fever) on the night of the study visit.
Demographics, medical conditions, and medications were captured on self-report questionnaires developed in the Sleep Lab for prior studies.
The Sleep Disorders Screening Questionnaire
The Sleep Disorders Screening Questionnaire is an unpublished instrument developed at the Sleep Lab as a screening tool to assess for possible sleep disorders including insomnia, sleep apnea, narcolepsy, restless legs syndrome, and circadian rhythm disorders, with questions derived from the diagnostic criteria for these disorders.
Insomnia Severity Index
The Insomnia Severity Index14 is a widely used and validated 7-item insomnia severity instrument with a summed score range of zero to 28 on which a total score ≥ 10 represents clinically meaningful insomnia.15
Epworth Sleepiness Scale
The Epworth Sleepiness Scale16 is a validated 8-item scale that assesses the propensity to fall asleep in certain situations. A summed score (zero to 24) on the instrument is widely used to assess sleepiness with a score ≥ 10 indicative of excessive daytime sleepiness.
Pittsburgh Sleep Quality Index
The Pittsburgh Sleep Quality Index17 is a 19-item self-report questionnaire that assesses sleep quality and sleep disturbances with acceptable internal homogeneity, test-retest reliability, and validity. A global score greater than 5 indicates the presence of a clinically meaningful sleep disturbance.
Sleep continuity variables were derived from each of the devices. These included: sleep latency (SL) defined as time from lights out to first epoch of sleep; latency to persistent sleep (LPS) defined as minutes from lights out to the first 10 minutes of uninterrupted sleep; minutes of wake occurring after sleep onset (WASO); total wake time, total sleep time, and sleep efficiency as calculated by dividing total sleep time by total time between lights on and lights off.
Study Procedures and Assessments
All study procedures took place in the Sleep Lab where participants arrived between 7:00–8:00 PM. Following the informed consent process, subjects completed study questionnaires. Eligible subjects who remained interested in participating were prepared for the overnight PSG. All subjects were recorded for approximately 7 to 8 hours on three devices simultaneously (PSG, MC, and AW), which were time-synchronized prior to the recording. Lights off was between 10:00 PM to midnight with a minimum of 7 hours in bed with lights on occurring between 5:30–8:00 AM.
For PSG, electrode placement and equipment settings followed the recommendations established by the American Academy of Sleep Medicine (AASM).18 Electrodes placed on the head and face included six sites on the scalp (F3, F4, C3, C4, O1, and O2), one on either side of each eye, one reference electrode behind each ear (M1, M2), three on the chin/jawline with mental/submental positioning), and two on the upper torso for modified lead II electrocardiogram placement. Recordings did not include measurement of respiration or limb movement. The base sampling rate was set at 512 Hz with AASM recommended “desirable” individual channel sampling rates and filter settings with the exception that electroencephalography sampling rates were set at 512 Hz and the associated high frequency filter setting at 70 Hz. Recordings were achieved using Embla N7000 recording systems (Embla Systems Inc., Broomfield, Colorado, United States). PSG tests were visually scored in 30-second epochs according to revised scoring guidelines18 by a certified sleep technician.
For both MC and AW, data were downloaded from the unit to a computer via a USB connection.
The associated statistical software for each unit (Actiware version 6.0.2, Philips Respironics and CURA System, CurAegis Technologies, for the AW and MC respectively) transformed the downloaded data into estimates of wake and sleep time. AW has already been shown to be a reliable and valid measure to assess sleep/wake patterns.19–21 There is no manual scoring for either unit; the study used the sleep/wake scoring provided by the respective units' software for each 30 seconds of recording. For AW the scoring threshold was set to “medium.” The MC mode of collecting and calculating activity data is by defining it as the maximum acceleration of a zero-mean epoch, for every epoch. The scoring for sleep-wake counts is accomplished with a proprietary algorithm, which is a combination of a rule-based classification layer and a discriminant function analysis to predict the designation of an epoch as sleep or wake. A filed United States patent application,22 which is mostly focused on the alertness monitoring technology of the MC, describes the MC technology broadly (eg, “a bio-mathematical model is applied by the processor to the extracted coefficients and determined actigraphy data”). Nonetheless, the reader can interpret the technical descriptions therein as some indication of how the MC achieves its results.
Each unit was attached to the nondominant wrist; the subjects wore both units on the same wrist. The study alternated the placement of the units sequentially by subject, so that approximately half of the subjects had the MC placed closest to the hand and the other half had the AW placed closer to the hand.
Data from all three systems (PSG, MC, AW) were rendered in binary fashion (0 = wakefulness, 1 = sleep) for each 30-second epoch of recording time. Although the PSG scorer manually stage scored all stages of sleep, for analytic purposes all sleep stages were defined as sleep. Both actigraphy units provide data as sleep or wake. Missing data due to technical problems was not imputed, but was instead not used. All analyses were conducted with SPSS 22.0 (IBM Corp., Armonk, New York, United States).
To determine the reliability of MC to differentiate sleep from wake with PSG as the gold standard, pooled epoch-by-epoch agreement is presented for sleep versus wake between PSG and MC by calculating percent agreement and Cohen kappa, which measures the agreement between two systems beyond what would be expected from chance alone.23 A kappa value of 0–0.2 is considered essentially no agreement, 0.2–0.4 low agreement, 0.4–0.6 moderate agreement, 0.6–0.8 high agreement, and 0.8–1.0 nearly perfect agreement.24
To compare MC's reliability to the reliability of the AW, similar to the aforementioned analytic strategy, the pooled epoch-by-epoch agreement is presented for sleep versus wake between PSG and AW by calculating percent agreement and Cohen kappa. This can be descriptively compared to results of PSG versus MC.
As an additional comparison of reliability with PSG scoring between MC and AW, we conducted contingency analyses to determine whether one system outperformed the other at the individual subject (rather than pooled) level. Here, reliability statistics (percent agreement, positive predictive values [PPVs], and kappa) were computed for each individual participant. The percentage of subjects for which one system outperformed the other was calculated for each measure and chi square analyses were performed.
Descriptive statistics are provided for sleep continuity variables. Comparisons between PSG and MC and between PSG and AW on each of these variables was done with t tests.
There were no study withdrawals and no adverse events. There were missing data due to technical problems; no missing data was due to subject or technician error. Three subjects had missing AW data. Two subjects had missing MC data. In addition, two subjects had partial night MC data. As a result, a total of 20 subjects had complete night data on PSG, MC, and AW with a total of 17,734 scored epochs. Subjects with missing data did not differ from the rest of the sample in terms of age, sex, or mean scores/times on either the sleep scales or the PSG variables. A total of 23 subjects had complete night data on PSG and MC with a total of 20,396 scored epochs. Subject characteristics of the entire sample (and the n = 20 subsample) are provide in Table 1.
Table 2 presents the results of interrater reliability of MC and of AW compared to the PSG scored epochs of wake and sleep for the 20 subjects with complete data. MC had a higher percentage of scoring agreement with PSG than did AW, higher PPVs for sleep epochs and wake epochs, and a larger kappa statistic. Because there were three additional subjects with complete data for both PSG and MC who did not have actigraphy data, comparison of the complete MC versus PSG sample was also undertaken with results displayed in the table.
Comparison of sleep-wake scoring to PSG scored epochs.
Comparison of sleep-wake scoring to PSG scored epochs.
Contingency analyses revealed that MC outperformed AW on percent agreement with PSG, PPV of PSG-scored wake and kappa statistic in 90% of subjects (χ21 = 12.8, P < .001) and for 65% of subjects on the PPV of PSG-scored sleep (χ21 = 1.8, P = .180).
For all subjects with complete night data on all three devices (n = 20), mean values for sleep continuity variables are presented in Table 3. For all variables, the mean difference between the PSG scored variable and the MC scored variable was descriptively smaller than the mean difference between the PSG scored variable and AW. These differences, however, were not significantly different. In addition, the mean values of the sleep continuity variables as calculated by MC in the n = 23 subsample did not differ appreciably from that observed in the n = 20 sample.
Sleep continuity variables as scored by PSG, MC and AW in subjects with complete night data (n = 20).
Sleep continuity variables as scored by PSG, MC and AW in subjects with complete night data (n = 20).
The overall reliability of sleep-wake scoring of MC compared to the PSG gold standard scoring was similar to, or slightly more favorable than, the reliability of the AW scoring compared to PSG scoring with respect to percentage of agreement and PPV. The 0.67 Cohen kappa statistic for MC sleep-wake scoring was consistent with a high level of agreement with PSG and was somewhat higher than the kappa statistic for AW sleep-wake scoring (0.50), which was consistent with a moderate level of agreement with PSG scoring.
Both MC and AW tended to underscore wake epochs and overscore sleep epochs compared to PSG scoring. This is consistent with actigraphy reliability studies in the extant literature, although in this study the pattern was slightly more pronounced in the AW scoring. A different pattern emerged when assessing SL and WASO. Here, MC tended to have an elevated SL (but not LPS) compared to PSG, whereas AW dramatically shorter SL (and slightly shorter LPS). The opposite was true of WASO, with MC having less scored WASO than PSG and the AW more scored WASO. Overall, mean values of the sleep continuity variables calculated by MC in the n = 23 sample of subjects with full-night PSG and MC data were not largely different than those calculated by PSG scoring.
One possibility for the apparent favorability of MC to AW scoring reliability in this study is that the AW scoring may have underperformed due to the settings used or some other unknown factor. As a check against this possibility, the AW findings in this study were descriptively compared to other published studies in which actigraphy was compared to PSG. AW findings in this study are comparable to published findings of the AW's performance compared to PSG scoring. In particular, Shambroom et al.25 reported that AW had 87.6% agreement in sleep-wake scoring to PSG (compared to 87.7% in the current study), PPV for sleep of 91.2% (compared to 90.8%), and PPV for wake of 53.4% (compared to 65.6%). This suggests that AW scoring in the current study is consistent with published performance of the device compared to PSG.
Although the PSG scorer was blinded to MC and AW scoring, it is possible that the study results are in part due to some aspect of PSG scoring that favored the MC scoring. Scoring of the PSG by a second scorer would more fully rule out this possibility, but this was not undertaken. Given that the reliability of AW scoring in this study, as noted previously, is similar to or more favorable than the reliability achieved in published AW studies, any bias favoring MC in the current study would likely be minimal. In addition, although the study sample included good sleepers, the mean PSG-measured SL was 45 minutes, which is high for good sleepers. Given that no participants had self-reported insomnia or sleep disturbance on our sleep instruments, the most likely explanation is that some participants experienced a first-night effect (ie, good sleepers tend to sleep more poorly in a laboratory environment than at home on an initial night in the laboratory).26 This would not, however, have had any effect on our outcomes. Technical issues with MC units did result in data loss in 4 of 27 subjects (15%), which seems high when taken at face value, but nonetheless was similar to data loss in the AW units (11%) as well as to rates observed in other actigraphy validation studies.10
Finally, an important limitation of the current study is that it was conducted in a sample of generally good sleepers who had no significant symptoms or complaints suggestive of a sleep disorders. The results are also based on 1 night of laboratory testing and not on sleep in the home environment. The results cannot be generalized, therefore, to patients with sleep disorders. The results do suggest that for healthy, good-sleeping adults the MC provides reliable scoring of sleep and wake during a nocturnal sleep period in a laboratory setting.
Validation work in sleep-disordered populations will be important to establish the benefits and drawbacks of the MC device in comparison to widely available consumer wearables and to more traditional actigraphy units (and their updated versions, which may also outperform the AW). One consideration, of course, is price. The anticipated price of the MC device will be approximately $250 (according to personal communication from the manufacturer), a price point above the $100–$200 range of the newer, widely available consumer wearables and somewhat below the higher price of approximately $1,000 for devices similar to the AW that have been used for research and clinical work. It will be interesting to observe how the price landscape may shift as a result of the expanding universe of wearables and with the increased features, capacities, and performance of consumer wearables and updated/new models of existing actigraphy devices.
Although the study was not supported by or conducted in the United States Department of Veterans Affairs (VA), authors Pigeon and Bishop are VA employees. The authors' views or opinions do not necessarily represent those of the Department of Veterans Affairs or the United States Government. There is a pending patent for the myCadian technology, but it is related to its alertness prediction system and does not describe the sleep-wake scoring algorithm. The current study did not evaluate the alertness prediction system and the authors make no representation about the manufacturers' claims in this regard. Work for this study was performed at the University of Rochester Medical Center. All authors have seen and approved the manuscript. CurAegis Technologies provided the test device and the comparison actigraphy units to the authors free of charge for the duration of the study. Dr. Pigeon has provided fee-based consultation to CurAegis Technologies, which was disclosed in the approved Institutional Review Board study protocol and the informed consent document. The other authors report no conflicts of interest.
The Emerging Technologies section focuses on new tools and techniques of potential utility in the diagnosis and management of any and all sleep disorders. The technologies may not yet be marketed, and indeed may only exist in prototype form. Some preliminary evidence of efficacy must be available, which can consist of small pilot studies or even data from animal studies, but definitive evidence of efficacy will not be required, and the submissions will be reviewed according to this standard. The intent is to alert readers of Journal of Clinical Sleep Medicine of promising technology that is in early stages of development. With this information, the reader may wish to (1) contact the author(s) in order to offer assistance in more definitive studies of the technology; (2) use the ideas underlying the technology to develop novel approaches of their own (with due respect for any patent issues); and (3) focus on subsequent publications involving the technology in order to determine when and if it is suitable for application to their own clinical practice. The Journal of Clinical Sleep Medicine and the American Academy of Sleep Medicine expressly do not endorse or represent that any of the technology described in the Emerging Technologies section has proven efficacy or effectiveness in the treatment of human disease, nor that any required regulatory approval has been obtained.
American Academy of Sleep Medicine
latency to persistent sleep
positive predictive value
wake after sleep onset
The authors express their thanks to sleep technicians Colin Gorman and Caitlin Casey.
Ancoli-Israel S, Martin JL, Blackwell T, et al. The SBSM guide to actigraphy monitoring: clinical and research applications. Behav Sleep Med. 2015;13 sup1:S4–S38. [PubMed]
Silber MH, Ancoli-Israel S, Bonnet MH, et al. The visual scoring of sleep in adults. J Clin Sleep Med. 2007;3(2):121–131. [PubMed]
Mullaney DJ, Kripke DF, Messin S. Wrist-actigraphic estimation of sleep time. Sleep. 1980;3(1):83–92. [PubMed]
Hauri P, Wisbey J. Wrist actigraphy in insomnia. Sleep. 1992;15(4):293–300. [PubMed]
Sadeh A, Acebo C. The role of actigraphy in sleep medicine. Sleep Med Rev. 2002;6(2):113–124. [PubMed]
Ancoli-Israel S, Cole R, Alessi C, Chambers M, Moorcroft W, Pollack C. The role of actigraphy in the study of sleep and circadian rhythms. Sleep. 2003;26(3):342–359. [PubMed]
Morgenthaler T, Alessi C, Friedman L, et al. Practice parameters for the use of actigraphy in the assessment of sleep and sleep disorders: an update for 2007. Sleep. 2007;30(4):519–529. [PubMed]
Kushida CA, Chang A, Gadkary C, Guilleminault C, Carrillo O, Dement WC. Comparison of actigraphic, polysomnographic, and subjective assessment of sleep parameters in sleep-disordered patients. Sleep Med. 2001;2(5):389–396. [PubMed]
Lichstein KL, Stone KC, Donaldson J, et al. Actigraphy validation with insomnia. Sleep. 2006;29(2):232–239. [PubMed]
Meltzer LJ, Hiruma LS, Avis K, Montgomery-Downs H, Valentin J. Comparison of a commercial accelerometer with polysomnography and actigraphy in children and adolescents. Sleep. 2015;38(8):1323–1330. [PubMed Central][PubMed]
Montgomery-Downs HE, Insana SP, Bond JA. Movement toward a novel activity monitoring device. Sleep Breath. 2012;16(3):913–917. [PubMed]
Cook JD, Prairie ML, Plante DT. Utility of the Fitbit Flex to evaluate sleep in major depressive disorder: A comparison against polysomnography and wrist-worn actigraphy. J Affect Disord. 2017;217:299–305. [PubMed]
Evenson KR, Goto MM, Furberg RD. Systematic review of the validity and reliability of consumer-wearable activity trackers. Int J Behav Nutr Phys Act. 2015;12:159. [PubMed Central][PubMed]
Morin CM. Insomnia: Psychological Assessment and Management. New York, NY: Guilford Press; 1993.
Bastien C, Vallieres C, Morin CM. Validation of the Insomnia Severity Index as an outcome measure for Insomnia research. Sleep Med. 2001;2(4):297–307. [PubMed]
Johns MW. A new method for measuring daytime sleepiness: the Epworth sleepiness scale. Sleep. 1991;14(6):540–545. [PubMed]
Buysee DJ, Reynolds CF 3rd, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry Res. 1989;28(2):193–213. [PubMed]
Iber C, Ancoli-Israel S, Chesson AL Jr, Quan SF; for the American Academy of Sleep Medicine. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. 1st ed. Westchester, IL: American Academy of Sleep Medicine; 2007.
Kushida CA, Chang A, Gadkary C, Guilleminault C, Carrillo O, Dement WC. Comparison of actigraphic, polysomnographic, and subjective assessment of sleep parameters in sleep disordered patients. Sleep Med. 2001;2(5):389–396. [PubMed]
Lichstein KL, Stone KC, Donaldson J, et al. Actigraphy validation with insomnia. Sleep. 2006;29(2):232–239. [PubMed]
Edinger JD, Means MK, Stechuchak KM, Olsen MK. A pilot study of inexpensive sleep-assessment devices. Behav Sleep Med. 2004;2(1):41–49. [PubMed]
Kenyon M, Payne-Rogers C, Jones J; CurAegis Technologies Inc., assignee. Alertness prediction system and method. US patent application 20170238868A1. February 17, 2017.
Cohen J. Coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. [PubMed]
Shambroom JR, Fábregas SE, Johnstone J. Validation of an automated wireless system to monitor sleep in healthy adults. J Sleep Res. 2012;21(2):221–230. [PubMed]
Agnew HW Jr, Webb WB, Williams RL. The first night effect: an EEG study of sleep. Psychophysiology. 1966;2:263–266. [PubMed]