Top

Published in:

Open Access 01-12-2024 | Research

Health estimate differences between six independent web surveys: different web surveys, different results?

Authors: Rainer Schnell, Jonas Klingwort

Published in: BMC Medical Research Methodology | Issue 1/2024

Abstract

Most general population web surveys are based on online panels maintained by commercial survey agencies. Many of these panels are based on non-probability samples. However, survey agencies differ in their panel selection and management strategies. Little is known if these different strategies cause differences in survey estimates. This paper presents the results of a systematic study designed to analyze the differences in web survey results between agencies. Six different survey agencies were commissioned with the same web survey using an identical standardized questionnaire covering factual health items. Five surveys were fielded at the same time. A calibration approach was used to control the effect of demographics on the outcome. Overall, the results show differences between probability and non-probability surveys in health estimates, which were reduced but not eliminated by weighting. Furthermore, the differences between non-probability surveys before and after weighting are larger than expected between random samples from the same population.

NPS-1 is operated by GMI (now part of Kantar), NPS-2 is operated by SSI (now named Dynata), NPS-3 is operated by Ipsos, NPS-4 is the WiSo-panel [39], PS-1 is operated by Forsa, and PS-2 is the GESIS panel.

Therefore, many different panel management strategies could impact differences between agencies, for example, recruitment, payment, control and web interface. Furthermore, providers may have different panel attrition problems or suffer from different panel conditioning effects. Separating these effects could form a research program on its own.

The data would have been available in a closely supervised research data center, but initially, PS-2 was not able to grant access within six months to the research data center. Later, Covid-19 restrictions delayed access to the research data center.

Since two heterogeneous kinds of samples have to be compared, we have no meta-analysis problem, which excludes standard measures of heterogeneity. Therefore, we use multiple pairwise comparisons (Tukey’s HSD) between the weighted means of surveys.

Comparing p-values with a fixed threshold is rarely advised [47]. We use t-tests here as rough indicators for differences larger than expected, not to make decisions about a hypothesis. However, the effect measure Cohen d is related to t: \((|t|=\sqrt{\left( n_1 n_2\right) /\left( n_1+n_2\right) } d)\) [48]. The factor for multiplying d to yield t is about 38.7 and 50 for all comparisons. Due to this monotonic transformation, an analysis based on d would, therefore, yield comparable results. To help interpreting the results, we additionally report effect sizes using Cohen’s d.

Age was used with six categories (18–24, 25–29, 30–39, 40–49, 50–64, and 65+), gender with two categories, education with five categories, size of the municipality with three categories (10.000–20.000, 20.001–100.000, 100.000 and more inhabitants) and region with 16 categories (the German federal states). The GREG weighting model can be written as age \(*\) gender \(*\) education \(*\) size of municipality \(*\) federal state.

Between 0.4% (NPS-2) and 8.3% (PS-1) respondents did not answer at least one question on demography.

During the weight computations, empty cells in the weighting model were replaced with one pseudo-observation for each missing cell. The number of created pseudo-observations per survey were 1.566 for NPS-1, 1.628 for NPS-2, 1.505 for NPS-3, 1.965 for NPS-4, 1.714 for PS-1, and 1.839 for PS-2. After calculating the weights, the pseudo-observations were removed from the data set.

Effect sizes of mode differences are rarely published in survey methodology. However, [56] reports 0.04 as the mean of Cohen’s d for 138 items compared between a face-to-face survey and a mixed-mode survey. Compared to these values, the mean effects of NPS vs PS are larger.

Liu H, Cella D, Gershon R, Shen J, Morales LS, Riley W, et al. Representativeness of the Patient-reported Outcomes Measurement Information System Internet Panel. J Clin Epidemiol. 2010;63(11):1169–78. https://doi.org/10.1016/j.jclinepi.2009.11.021.CrossRefPubMedPubMedCentral

Russell CW, Boggs DA, Palmer JR, Rosenberg L. Use of a Web-based Questionnaire in the Black Women’s Health Study. Am J Epidemiol. 2010;172(11):1286–91. https://doi.org/10.1093/aje/kwq310.CrossRefPubMedPubMedCentral

Haddad C, Sacre H, Zeenny RM, Hajj A, Akel M, Iskandar K, et al. Should samples be weighted to decrease selection bias in online surveys during the COVID-19 pandemic? Data from seven datasets. BMC Med Res Methodol. 2022;22(1):1–11. https://doi.org/10.1186/s12874-022-01547-3.CrossRef

Klingwort J, Buelens B, Schnell R. Early Versus Late Respondents in Web Surveys: Evidence from a National Health Survey. Stat J IAOS. 2018;34(3):461–71. https://doi.org/10.3233/SJI-170421.CrossRef

ADM. Jahresbericht 2019 [annual report 2019, in German]. 2020. https://www.adm-ev.de/wp-content/uploads/2020/09/ADM_Jahresbericht_2019_020920_WEB.pdf. Accessed 01 Sept 2021.

Sohlberg J, Gilljam M, Martinsson J. Determinants of Polling Accuracy: The Effect of Opt-in Internet Surveys. J Elections Public Opin Parties. 2017;27(4):433–47. https://doi.org/10.1080/17457289.2017.1300588.CrossRef

Sturgis P, Kuha J, Baker N, Callegaro M, Fisher S, Green J, et al. An Assessment of the Causes of the Errors in the 2015 UK General Election Opinion Polls. J R Stat Soc A (Stat Soc). 2018;181(3):757–81. https://doi.org/10.1111/rssa.12329.CrossRef

Blair J, Czaja R, Blair EA. Designing Surveys: A Guide to Decisions and Procedures. 3rd ed. Thousand Oaks: Sage; 2014.CrossRef

Kreuter F, Presser S, Tourangeau R. Social Desirability Bias in CATI, IVR, and Web Surveys: The Effect of Mode and Question Sensitivity. Public Opin Q. 2008;72(5):847–65. https://doi.org/10.1093/poq/nfn063.CrossRef

10.

McPhee C, Barlas F, Brigham N, Darling J, Dutwin D, Jackson C, et al. Data Quality Metrics for Online Samples: Considerations for Study Design and Analysis. 2023. https://aapor.org/wp-content/uploads/2023/02/Task-Force-Report-FINAL.pdf. Accessed 18 Mar 2023.

11.

Cornesse C, Blom AG, Dutwin D, Krosnick JA, De Leeuw ED, Legleye S, et al. A Review of Conceptual Approaches and Empirical Evidence on Probability and Nonprobability Sample Survey Research. J Surv Stat Methodol. 2020;8(1):4–36. https://doi.org/10.1093/jssam/smz041.CrossRef

12.

Pekari N, Lipps O, Roberts C, Lutz G. Conditional distributions of frame variables and voting behaviour in probability-based surveys and opt-in panels. Swiss Political Sci Rev. 2022;28(4):696–711. https://doi.org/10.1111/spsr.12539.CrossRef

13.

Couper MP. Web Surveys: A Review of Issues and Approaches. Public Opin Q. 2000;64(4):464–94. https://doi.org/10.1086/318641.CrossRefPubMed

14.

Bethlehem J. Web Surveys in Official Statistics. In: Engel U, Jann B, Lynn P, Scherpenzeel A, Sturgis P, editors. Improving Survey Methods: Lessons from Recent Research. New York: Routledge; 2015. p. 156–69.

15.

Leenheer J, Scherpenzeel AC. Does It Pay Off to Include Non-Internet Households in an Internet Panel? Int J Internet Sci. 2013;8(1):17–29.

16.

Blom AG, Herzing JME, Cornesse C, Sakshaug JW, Krieger U, Bossert D. Does the Recruitment of Offline Households Increase the Sample Representativeness of Probability-Based Online Panels? Evidence From the German Internet Panel. Soc Sci Comput Rev. 2016;35(4):498–520. https://doi.org/10.1177/0894439316651584.CrossRef

17.

Cornesse C, Schaurer I. The Long-Term Impact of Different Offline Population Inclusion Strategies in Probability-Based Online Panels: Evidence From the German Internet Panel and the GESIS Panel. Soc Sci Comput Rev. 2021;39(4):687–704. https://doi.org/10.1177/0894439320984131.CrossRef

18.

Eurostat. Households – level of internet access. 2023. https://ec.europa.eu/eurostat/databrowser/view/isoc_ci_in_h/default/table?lang=en. Accessed 06 Aug 2023.

19.

Eurostat. Individuals – internet use. 2023. https://ec.europa.eu/eurostat/databrowser/view/ISOC_CI_IFP_IU/default/table?lang=en. Accessed 06 Aug 2023.

20.

United States Census Bureau. Types of Computers and Internet subscriptions. 2023. https://data.census.gov/table?q=internet&tid=ACSST1Y2021.S2801. Accessed 06 Aug 2023.

21.

Bethlehem J, Biffignandi S. Handbook of Web Surveys. Hoboken: Wiley; 2012.

22.

Baker R, Blumberg SJ, Brick JM, Couper MP, Courtright M, Dennis JM, et al. Research Synthesis: AAPOR Report on Online Panels. Public Opin Q. 2010;74(4):711–81. https://doi.org/10.1093/poq/nfq048.CrossRef

23.

Meyer BD, Mok WK, Sullivan JX. Household Surveys in Crisis. J Econ Perspect. 2015;29(4):199–226. https://doi.org/10.1257/jep.29.4.199.CrossRef

24.

Czajka JL, Beyler A. Declining Response Rates in Federal Surveys: Trends and Implications (Background Paper). 2016. Technical Report Final Report – Volume I, Mathematica Policy Research.

25.

Williams D, Brick JM. Trends in U.S Face-to-face Household Survey Nonresponse and Level of Effort. J Surv Stat Methodol. 2017;6(2):186–211. https://doi.org/10.1093/jssam/smx019.CrossRef

26.

de Leeuw E, Hox J, Luiten A. International Nonresponse Trends Across Countries and Years: An Analysis of 36 Years of Labour Force Survey Data. Surv Insights Methods Field. 2018;1–11.

27.

Daikeler J, Bošnjak M, Lozar Manfreda K. Web Versus Other Survey Modes: An Updated and Extended Meta-Analysis Comparing Response Rates. J Surv Stat Methodol. 2020;8(3):513–39. https://doi.org/10.1093/jssam/smz008.CrossRef

28.

Groves RM, Peytcheva E. The Impact of Nonresponse Rates on Nonresponse Bias: A Meta-Analysis. Public Opin Q. 2008;72(2):167–89. https://doi.org/10.1093/poq/nfn011.CrossRef

29.

Adams J, White M. Health Behaviours in People Who Respond to a Web-based Survey Advertised on Regional News Media. Eur J Pub Health. 2007;18(3):335–8. https://doi.org/10.1093/eurpub/ckm100.CrossRef

30.

Tourangeau R, Conrad FG, Couper MP. The Science of Web Surveys. New York: Oxford University Press; 2013.CrossRef

31.

Schnell R, Noack M, Torregroza S. Differences in General Health of Internet Users and Non-users and Implications for the Use of Web Surveys. Surv Res Methods. 2017;11(2):105–23. https://doi.org/10.18148/srm/2017.v11i2.6803.CrossRef

32.

Braekman E, Charafeddine R, Demarest S, Drieskens S, Berete F, Gisle L, et al. Comparing Web-based Versus Face-to-face and Paper-and-pencil Questionnaire Data Collected Through Two Belgian Health Surveys. Int J Publ Health. 2020;1–12. https://doi.org/10.1007/s00038-019-01327-9.

33.

Dutwin D, Buskirk TD. A Deeper Dive into the Digital Divide: Reducing Coverage Bias in Internet Surveys. Soc Sci Comput Rev. 2022. https://doi.org/10.1177/08944393221093467.

34.

Helsper EJ, Reisdorf BC. The emergence of a “digital underclass” in Great Britain and Sweden: Changing reasons for digital exclusion. New Media Soc. 2017;19(8):1253–70. https://doi.org/10.1177/1461444816634676.

35.

Zhou XH, Zhou C, Liu D, Ding X. Applied Missing Data Analysis in the Health Sciences. Hoboken: Wiley; 2014.

36.

Little RJA, Rubin DB. Statistical Analysis with Missing Data. 3rd ed. Hoboken: Wiley; 2020.

37.

Särndal CE, Swensson B, Wretman J. Model Assisted Survey Sampling. New York: Springer; 1992.CrossRef

38.

Särndal CE, Lundström S. Estimation in Surveys with Nonresponse. Chichester: Wiley; 2005.CrossRef

39.

Göritz A. Determinants of the Starting Rate and the Completion Rate in Online Panel Studies. In: Callegaro M, Baker RP, Bethlehem J, Göritz A, Krosnick JA, Lavrakas PJ, editors. Online Panel Research: A Data Quality Perspective. Hoboken: Wiley; 2014. p. 154–70.CrossRef

40.

Güllner M, Schmitt LH. Innovation in der Markt- und Sozialforschung: das forsa.omninet-Panel [Innovations in market research: The fosa.omninet-panel, in German]. Sozialwissenschaften Berufspraxis. 2004;27(1):11–22.

41.

Gößwald A, Lange M, Dölle R, Hölling H. Die erste Welle der Studie zur Gesundheit Erwachsener in Deutschland (DEGS1): Gewinnung von Studienteilnehmenden, Durchführung der Feldarbeit und Qualitätsmanagement [The First Wave of the Study of Adult Health in Germany (DEGS1): Recruitment of Study Participants, Fieldwork Implementation, and Quality Management, in German]. Bundesgesundheitsbl Gesundheitsforsch Gesundheitsschutz. 2013;56(5). https://doi.org/10.1007/s00103-013-1671-z.

42.

Kamtsiuris P, Lange M, Hoffmann R, Rosario AS, Dahm S, Kuhnert R, et al. Die erste Welle der Studie zur Gesundheit Erwachsener in Deutschland (DEGS1): Stichprobendesign, Response, Gewichtung und Repräsentativität [The first wave of the Study of Adult Health in Germany (DEGS1): sampling design, response, weighting, and representativeness., in German]. Bundesgesundheitsbl Gesundheitsforsch Gesundheitsschutz. 2013;56(5). https://doi.org/10.1007/s00103-012-1650-9.

43.

RKI. Beiträge zur Gesundheitsberichterstattung des Bundes - Daten und Fakten: Ergebnisse der Studie Gesundheit in Deutschland aktuell 2012 [Contributions to federal health reporting – Facts and figures: Results of the study on current health in Germany 2012, in German]. Abteilung für Epidemiologie und Gesundheitsmonitoring. Berlin: Robert Koch-Institut; 2014.

44.

Saß AC, Lange C, Finger JD, Allen J, Born S, Hoebel J, et al. Supplement: Fragebogen zur Studie ‘Gesundheit in Deutschland aktuell’: GEDA 2014/2015-EHIS [Supplement: Questionnaire for the study ‘Current Health in Germany’: GEDA 2014/2015-EHIS, in German]. J Health Monit. 2017;2(1):106–34.

45.

Forschungsdatenzentrum ALLBUS. ALLBUS 2014 Fragebogendokumentation: Material zu den Datensätzen der Studiennummern ZA5240 und ZA5241 [ALLBUS 2014 questionnaire documentation: material on the data sets of study numbers ZA5240 and ZA5241, in German]. 2014.

46.

Destatis. Statistik und Wissenschaft: Demographische Standards Ausgabe 2010 [Statistics and Science: Demographic Standards Edition 2010, in German]. Wiesbaden; 2010.

47.

Wasserstein RL, Schirm AL, Lazar NA. Moving to a World Beyond “p\(<\)0.05”. Am Stat. 2019;73(sup1):1–19. https://doi.org/10.1080/00031305.2019.1583913.

48.

Flury BK, Riedwyl H. Standard Distance in Univariate and Multivariate Analysis. Am Stat. 1986;40(3):249–51. https://doi.org/10.1080/00031305.1986.10475403.CrossRef

49.

Bickel DR. Genomics Data Analysis: False Discovery Rates and Empirical Bayes Methods. Boca Raton: CRC Press; 2020.

50.

Callegaro M, Manfreda KL, Vehovar V. Web Survey Methodology. Los Angeles: Sage; 2015.CrossRef

51.

Toepoel V. Doing Surveys Online. London: Sage; 2016.CrossRef

52.

Potter F, Zheng Y. Methods and Issues in Trimming Extreme Weights in Sample Surveys. JSM Proc Surv Res Methods Sect. 2015;2707–2719.

53.

Chen Q, Elliott MR, Haziza D, Yang Y, Ghosh M, Little RJA, et al. Approaches to Improving Survey-Weighted Estimates. Stat Sci. 2017;32(2):227–48.CrossRef

54.

Elliott MR. Model Averaging Methods for Weight Trimming. J Off Stat. 2008;24(4):517–40.PubMedPubMedCentral

55.

Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale: Erlbaum; 1988.

56.

Christmann P, Gummer T, Hähnel S, Wolf C. Does the mode matter? An experimental comparison of survey responses between face-to-face and mixed-mode surveys. Unpublished presentation at the 8th Conference of the European Survey Research Association, Zagreb (Croatia). 2019. Available at https://www.europeansurveyresearch.org/conf2019/uploads/393/3000/71/ESRA_Christmann_et_al..pdf. Accessed 17 July 2019.

57.

Haziza D, Beaumont JF. Construction of Weights in Surveys: A Review. Stat Sci. 2017;32(2):206–26. https://doi.org/10.1214/16-STS608.CrossRef

58.

Schonlau M, van Soest A, Kapteyn A. Are ‘Webographic’ or attitudinal questions useful for adjusting estimates from Web surveys using propensity scoring? Surv Res Methods. 2007;1(3):155–63. https://doi.org/10.18148/srm/2007.v1i3.70.CrossRef

59.

DiSogra C, Cobb C, Chan E, Dennis JM. Calibrating Non-Probability Internet Samples with Probability Samples Using Early Adopter Characteristics. In: Proceedings of Joint Statistical Meetings (JSM). Alexandria: American Statistical Association, Section on Survey Research Methods; 2011. p. 4501–4515.

60.

Gelman A, Little TC. Poststratification into many categories using hierarchical logistic regression. Surv Methodol. 1997;23(2):127–35.

61.

Rosenbaum PR, Rubin DB. The Central Role of the Propensity Score in Observational Studies for Causal Effects on JSTOR. Biometrika. 1983;70(1):41–55.CrossRef

62.

Lee S, Valliant R. Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment. Sociol Methods Res. 2009;37(3):319–43. https://doi.org/10.1177/0049124108329643.CrossRef

63.

Bruch C, Felderer B. Applying multilevel regression weighting when only population margins are available. Commun Stat Simul Comput. 2022. https://doi.org/10.1080/03610918.2021.1988642.

64.

Copas A, Burkill S, Conrad F, Couper MP, Erens B. An evaluation of whether propensity score adjustment can remove the self-selection bias inherent to web panel surveys addressing sensitive health behaviours. BMC Med Res Methodol. 2020;20(1):1–10. https://doi.org/10.1186/s12874-020-01134-4.CrossRef

65.

Si Y. On the Use of Auxiliary Variables in Multilevel Regression and Poststratification. arXiv. 2019. https://doi.org/10.48550/arXiv.2011.00360.

66.

Hanretty C. An Introduction to Multilevel Regression and Post-Stratification for Estimating Constituency Opinion. Political Stud Rev. 2019;18(4):630–45. https://doi.org/10.1177/1478929919864773.CrossRef

Title: Health estimate differences between six independent web surveys: different web surveys, different results?
Authors: Rainer Schnell
Jonas Klingwort
Publication date: 01-12-2024
Publisher: BioMed Central
Published in: BMC Medical Research Methodology / Issue 1/2024
Electronic ISSN: 1471-2288
DOI: https://doi.org/10.1186/s12874-023-02122-0

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Health estimate differences between six independent web surveys: different web surveys, different results?

Abstract

Keynote webinar | Spotlight on medication adherence

Springer Medicine

Abstract

Please log in to get access to this content

Other articles of this Issue 1/2024

A time series driven model for early sepsis prediction based on transformer module

Model-based standardization using multiple imputation

Extraction frequent patterns in trauma dataset based on automatic generation of minimum support and feature weighting

Segmentation of patients with small cell lung cancer into responders and non-responders using the optimal cross-validation technique

A flexible approach to measure care coordination based on patient-sharing networks

Assessing the properties of patient-specific treatment effect estimates from causal forest algorithms under essential heterogeneity