Abstract
Background: Greater awareness of sleep-disordered breathing and rising obesity rates have fueled demand for sleep studies. Sleep testing using level 3 portable devices may expedite diagnosis and reduce the costs associated with level 1 in-laboratory polysomnography. We sought to assess the diagnostic accuracy of level 3 testing compared with level 1 testing and to identify the appropriate patient population for each test.
Methods: We conducted a systematic review and meta-analysis of comparative studies of level 3 versus level 1 sleep tests in adults with suspected sleep-disordered breathing. We searched 3 research databases and grey literature sources for studies that reported on diagnostic accuracy parameters or disease management after diagnosis. Two reviewers screened the search results, selected potentially relevant studies and extracted data. We used a bivariate mixed-effects binary regression model to estimate summary diagnostic accuracy parameters.
Results: We included 59 studies involving a total of 5026 evaluable patients (mostly patients suspected of having obstructive sleep apnea). Of these, 19 studies were included in the meta-analysis. The estimated area under the receiver operating characteristics curve was high, ranging between 0.85 and 0.99 across different levels of disease severity. Summary sensitivity ranged between 0.79 and 0.97, and summary specificity ranged between 0.60 and 0.93 across different apnea–hypopnea cut-offs. We saw no significant difference in the clinical management parameters between patients who underwent either test to receive their diagnosis.
Interpretation: Level 3 portable devices showed good diagnostic performance compared with level 1 sleep tests in adult patients with a high pretest probability of moderate to severe obstructive sleep apnea and no unstable comorbidities. For patients suspected of having other types of sleep-disordered breathing or sleep disorders not related to breathing, level 1 testing remains the reference standard.
Undiagnosed sleep-disordered breathing places a substantial burden on patients, families, health care systems and society.1 Sleep fragmentation and recurrent hypoxemia cause daytime sleepiness and impaired concentration, which increase the risk of motor vehicle collisions and occupational accidents.2–7 In addition, sleep-disordered breathing is associated with hypertension, stroke, cardiovascular disease, obesity and type 2 diabetes,8–12 all of which involve greater use of health care resources.13–17
Obstructive sleep apnea is the most common type of sleep-disordered breathing. Narrowing of the upper airway during inspiration results in episodes of apnea (breathing cessation for at least 10 seconds), hypopnea (reduced airflow), oxygen desaturation and arousal from sleep due to respiratory effort.18 Clinical signs and symptoms include snoring, reports of nocturnal apnea, gasping or choking witnessed by a partner, daytime sleepiness, morning headaches and inability to concentrate. Patients with obesity or cardiovascular disease are at increased risk.19
The severity of obstructive sleep apnea is usually graded using the apnea–hypopnea index (the mean number of apneas and hypopneas per hour of sleep) as follows: mild (5–14), moderate (15–29) and severe (≥ 30).18,20
Other, less common types of sleep-disordered breathing include upper airway resistance syndrome, obesity hyperventilation syndrome, central sleep apnea, and nocturnal hypoventilation/hypoxemia secondary to cardiopulmonary or neuromuscular disease. It is not uncommon for patients to have more than 1 type of sleep-disordered breathing.
Estimates of the prevalence of sleep-disordered breathing vary depending on the population (e.g., by sex, age and comorbidities).21 According to the Wisconsin Sleep Cohort Study, values in American adults (aged 30–60 yr) are 24% for men and 9% for women.1 A Canadian survey found a self-reported prevalence of sleep apnea of 3% among adults more than 18 years of age, and 5% among those more than 45 years of age.22 As the population ages and rates of obesity increase, the prevalence of sleep-disordered breathing is climbing.1,19,23,24 Given its clinical implications, accurate diagnosis and treatment of the condition are critical.
Level 1 sleep testing, or polysomnography, requires an overnight stay in a sleep laboratory with a technician in attendance. It captures a minimum of 7 channels of data (but typically ≥ 16), including respiratory, cardiovascular and neurologic parameters, to produce a comprehensive picture of sleep architecture. Level 1 is considered the reference standard for diagnosing all types of sleep-disordered breathing and sleep disorders.19,25–27 However, limited facilities and the growing demand for sleep studies have resulted in long wait times.28 Level 2 sleep testing uses level 1 equipment, but is performed without a technician in attendance.
Level 3 testing uses portable monitors that allow sleep studies to be done at the patient’s home or elsewhere. This option was introduced as a more accessible and less expensive alternative to in-laboratory polysomnography. Level 3 devices record at least 3 channels of data (e.g., oximetry, airflow, respiratory effort). Unlike level 1, level 3 testing cannot measure the duration of sleep, the number of arousals or sleep stages, nor can it detect nonrespiratory sleep disorders.27,29 Level 4 devices are also portable, but they capture less data — usually only 1 or 2 channels.27,30
We conducted a systematic review and meta-analysis to compare the diagnostic accuracy of the widely used level 3 portable monitors to in-laboratory polysomnography, and to determine the subpopulations of patients whose conditions might be most appropriately diagnosed with each test.
Methods
Literature search
We performed a comprehensive literature search of PubMed (MEDLINE and non-MEDLINE sources), the Cochrane Library and Embase for studies that compared level 3 to level 1 tests for the diagnosis of sleep-disordered breathing in adults (Appendix 1, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.130952/-/DC1). We limited our search to English-language studies from 2007 to March 2012, with monthly updates from PubMed until March 2013. We also included studies from a previous systematic review prepared by our research unit, which covered the literature from 2004 to 2009. Consequently, this review covers the literature from 2004 to March 2013. We determined our date limit based on several previous Canadian and American assessments that examined the earlier literature.31–40
Study selection
Two reviewers independently screened titles and abstracts to identify possible studies for inclusion. All studies comparing level 3 with level 1 sleep tests involving adults were included if they reported on either diagnostic accuracy parameters or management after testing (Appendix 2, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.130952/-/DC1). We followed PICOS (Patients, Intervention, Comparator, Outcomes and Study design) criteria to include or exclude studies. We assessed reviewer agreement using the κ statistic.
Data extraction
Two reviewers independently extracted data from included studies using a standard form. Our diagnostic accuracy parameters were sensitivity, specificity, area under the receiver operating characteristic (ROC) curve, and positive and negative likelihood ratios. We extracted safety data and technical failures from all of the studies that reported these parameters. Our clinical management parameters were acceptance of continuous positive airway pressure treatment, treatment adherence, mechanical estimates of residual apnea–hypopnea index, mean machine pressure difference between patients whose diagnoses were made with the 2 different tests, quality of life and functional status as measured by clinical sleepiness questionnaires (usually the Epworth Sleepiness Scale).
Disagreements were discussed and resolved between the reviewers. No third-party adjudication was needed.
Quality assessment
We used the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool, which assesses bias (internal validity) and applicability (external validity) in multiple domains: flow and timing, reference-standard test, index test and patient selection.41,42
Statistical analysis
We pooled patient characteristics (age, body mass index [BMI] and score on the Epworth Sleepiness Scale) to obtain weighted averages. We extracted and grouped comorbidities. We presented technical failures and safety data as frequencies and proportions.
Because studies reported level 3 test performance at different apnea–hypopnea index severity levels, we examined diagnostic accuracy parameters in all studies to determine the overall ranges. We examined patterns at different severity levels in studies that reported multiple index cut-offs.
We performed a meta-analysis using a bivariate mixed-effects binary regression model. The model estimates the amount of between-study variation in sensitivity and specificity, as well as the degree of correlation between sensitivity and specificity through random effects, and uses the logit sensitivity and specificity to draw the summary ROC curves. This model requires the primary parameters of true-positive, false-positive, true-negative and false-negative. We included studies if they reported the parameters required for the model. If such parameters were not reported, we calculated them from the data provided, where possible. We estimated summary diagnostic accuracy parameters.43–45 We assessed overall heterogeneity using the Q statistic. When heterogeneity was significant, we quantified it using the I2 statistic. We estimated the summary ROC curves at different apnea–hypopnea severity levels. We performed all analyses using Stata SE version 12.
We conducted a subgroup sensitivity analysis to identify changes in diagnostic accuracy when studies that included only patients with comorbidities were removed from the analysis.
Results
We included 59 comparative studies (15 abstracts, 44 full-text articles) involving 5044 patients (5026 of whom were evaluable) in our analysis (Figure 1). The κ statistic showed reviewer agreement (0.86).
We classified the included studies as “combination” studies (10 studies involving 572 evaluable patients, in which the patients underwent simultaneous in-laboratory level 3 and level 1 tests, followed by an at-home level 3 test), “simultaneous” studies (20 studies involving 1152 evaluable patients, in which the patients underwent simultaneous in-laboratory level 3 and level 1 tests) and “separate” studies (29 studies involving 3302 evaluable patients, in which an at-home level 3 test and an in-laboratory level 1 test were conducted, either with the same patients or on 2 different arms) (Table 1).46–104
Patient characteristics
The included studies recruited patients with suspected obstructive sleep apnea (Appendix 3, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.130952/-/DC1). Patients were referred for sleep testing after a pretest assessment that included sleep questionnaires, history and clinical examination.
When we pooled participant characteristics from all studies, patients had a mean age of 50.8 years, a mean score of 11.6 on the Epworth Sleepiness Scale and a mean BMI of 30.4. The ratio of male to female patients was 2.9 to 1. A total of 1382 comorbidities were reported, with cardiovascular conditions the most common (1080 patients, 78.1% of total comorbidities). Hypertension was the most frequently reported cardiovascular condition (574 patients), followed by stable chronic heart disease (142 patients) and coronary artery disease (113 patients). Respiratory comorbidities were limited to a single patient with asthma and 9 patients with chronic obstructive pulmonary disease (0.7% of total comorbidities).
Study characteristics
The 4 channels measured in all of the studies were nasal airflow, thoracoabdominal movement, oxygen saturation and body position.
Two studies reported adverse events with in-laboratory level 3 tests (1 hypertensive crisis, 1 pacemaker interference).46,52 One study reported sensor irritation in 27 patients.46
Technical failures affected 0.44% of patients who underwent level 1 tests, 1.30% of patients who underwent in-laboratory level 3 tests and 10.25% of patients who underwent level 3 tests at home (Appendix 4, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.130952/-/DC1).
Diagnostic accuracy of sleep tests
Among all included studies, the area under the ROC curves for at-home (6 studies) and in-laboratory (7 studies) testing showed values of 0.90 or greater at all apnea–hypopnea index cutoffs, with the exception of 2 studies that reported values of 0.79 and 0.86 at an apnea–hypopnea index of moderate or severe (≥ 15 events/h) at home, and 2 studies that reported values ranging from 0.87 to 0.89 at moderate or severe cut-offs (≥ 15, ≥ 20 and ≥ 30 events/h) in laboratory (Appendix 5, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.130952/-/DC1).
In studies reporting multiple cut-offs, with increasing disease severity, 7 of 10 at-home studies showed a decline in sensitivity and an increase in specificity, and 2 of the studies showed an increase in area under the ROC curve.46–48,52,55,81,88,89,98,104 In addition, 7 of 14 in-laboratory studies showed a decline in sensitivity and an increase in specificity, and 2 studies showed an increase in area under the ROC curve.47,48,51,52,58,61–63,65–67,69–71
We found no significant difference in baseline characteristics between the 2 groups of patients in all 8 studies that reported disease management after the diagnosis by either test. None of the studies found significant differences in disease management parameters.77–79,92,93,99,102,103
In most of the studies, patients underwent both level 1 and level 3 tests to avoid the risk of internal bias due to differences between study groups. In all of the simultaneous studies, level 3 tests were scored manually by the same technician who scored the level 1 test, which may have resulted in observer bias. In contrast, most of the studies reported blinding the interpreters of level 3 tests to the level 1 test results, mitigating the risk of observer bias.
Most of the studies adequately described the tests, number of patients, recruitment methods and dropouts. Fifteen studies (only available as abstracts) had incomplete reporting of 1 or more elements (Table 2).
Most studies recruited patients suspected of having simple obstructive sleep apnea without comorbidities or with stable cardiovascular comorbidities. None of the studies included patients with other forms of sleep-disordered breathing (Table 2).
Results of the meta-analysis
We identified 19 studies reporting the parameters needed for our meta-analysis (Table 3). Among these studies, we found moderate to high heterogeneity at a mild apnea–hypopnea index cut-off in laboratory (≥ 5 events/h) and at home (≥ 10 events/h), and at a moderate cut-off for both settings (≥ 15 events/h) (I2 53%–85%).105 Overall, diagnostic accuracy improved as disease severity increased (Figures 2 and 3).
Sensitivity analysis
When we removed the 3 studies that recruited only patients with comorbidities from the meta-analysis, the results of in-laboratory sleep testing remained unchanged, because the excluded studies had only been done at the patients’ homes. Sensitivity in the at-home setting showed a slight improvement, ranging from 1% to 3% at all apnea–hypopnea index cut-offs, with the exception of 10 or more events per hour (where sensitivity decreased from 83% to 81%). Specificity improved by 2% and 3% at cut-offs of 5 or more and 10 or more events per hour, respectively, but remained unchanged at cut-offs of 15 or more and 30 or more events per hour. The area under the ROC curve improved slightly (1%) at all cut-offs other than 10 or more events per hour.
Interpretation
Level 3 portable devices scored well for sensitivity (the ability of a test to correctly identify those who have the disease), and specificity (the ability of a test to correctly identify those who do not have the disease), with a trade-off of increasing specificity and decreasing sensitivity as disease severity increased. The areas under the ROC curves (a measure that combines sensitivity and specificity to show the overall discriminatory power of the test, with a value of 1 indicating perfect discrimination) confirmed the performance of level 3 devices. The performance of level 3 devices was better in the laboratory than at home — the devices had a high technical failure rate when testing was done at home. Bruyneel and colleagues reported similar rates in their study comparing level 1 in-laboratory to unattended level 1 at-home sleep studies (the latter is considered level 2 testing). The unattended level 1 studies had similar rates of technical failures, despite using full polysomnography equipment, suggesting the failures were because a sleep technician was not in attendance.106
Despite the heterogeneity we saw at some apnea–hypopnea index cut-offs in our meta-analysis, the pooled estimates of diagnostic accuracy parameters appear reliable. We used a model that accounts for this heterogeneity107–110 despite the use of different level 3 devices, which each measured the same core parameters.
The studies included in this review were designed to evaluate diagnostic accuracy rather than identify subpopulations of patients who might benefit from each test. Most patients in these studies had uncomplicated obstructive sleep apnea without unstable comorbidities. The patients were typically referred from sleep or respiratory clinics where a comprehensive pre-test evaluation had been completed, suggesting a high pretest probability of obstructive sleep apnea (e.g., symptoms such as snoring and daytime sleepiness). Family physicians play a key role in the diagnosis of sleep-disordered breathing. Reuveni and colleagues discussed the need for educational programs to increase awareness among family physicians of the signs of obstructive sleep apnea.111 Such programs will likely increase testing, optimize the use of diagnostic resources and expedite treatment.112–114
Our findings confirm those of previous reviews, health technology assessments and clinical practice guidelines based on earlier evidence of portable monitor use in the diagnosis of sleep-disordered breathing.25–27,31–39,115 These reviews concluded that level 3 devices are useful in the diagnosis of obstructive sleep apnea in patients with a high pretest likelihood of having moderate to severe forms of the condition. The American Academy of Sleep Medicine and Canadian Sleep Society/Canadian Thoracic Society guidelines recommend that portable sleep studies be provided under the direction of health professionals with accreditation in sleep medicine and as part of a comprehensive assessment.25–27 The US Centers for Medicare & Medicaid Services has determined that portable devices (with a minimum of 3 channels) are acceptable for diagnosing obstructive sleep apnea in patients with clinical signs or symptoms suggestive of the condition.116
Limitations
We included only English-language studies in this review, therefore it is possible that relevant studies in other languages were excluded. In addition, none of the studies included patients with forms of sleep-disordered breathing other than obstructive sleep apnea, limiting the generalizability of the results to patients with other forms of sleep-disordered breathing.
Conclusion
Level 3 sleep studies are safe and convenient for diagnosing obstructive sleep apnea in patients with a high pretest probability of moderate to severe forms of the condition without substantial comorbidities. Level 1 polysomnography remains the cornerstone for the diagnosis in patients suspected of having comorbid sleep disorders, unstable medical conditions or complex sleep-disordered breathing. Further studies assessing the use of portable sleep studies in patients with conditions other than obstructive sleep apnea, and in patients with obstructive sleep apnea and comorbidities, are needed.
Acknowledgments
The authors thank Dr. Babak Bohlouli, University of Alberta, Department of Medicine, for his help with screening, reviewing and abstracting the data; Ms. Sarah Ndegwa for her help with reviewing and abstracting data; and Dr. Dominic Carney for his clinical advice throughout the project.
Footnotes
Competing interests: None declared.
This article has been peer reviewed.
Contributors: Mohamed El Shayeb selected the studies, extracted the data, conducted the meta-analysis, analyzed the results and drafted the manuscript. Leigh-Ann Topfer conducted the literature search, and edited and revised the manuscript. Tania Stafinski helped conceive the design of the review, extracted the data, and edited and revised the manuscript. Lawrence Pawluk edited and revised the manuscript. Devidas Menon helped conceive the design of the review, extracted the data, and edited and reviewed the manuscript. All of the authors approved the final version submitted for publication.
Funding: Production of this work has been made possible by a financial contribution from Alberta Health and under the auspices of the Alberta Health Technologies Decision Process: the Alberta Model for Health Technology Assessment and Policy Analysis. The views expressed herein do not necessarily represent the official policy of Alberta Health. The study sponsor had no role in the design of the study, the collection, analysis or interpretation of data, the writing of the report, or the decision to submit the article for publication.