Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error

Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability roa...

Full description

Saved in:
Bibliographic Details
Main Authors: Maaike Walraad, Jonas Klingwort, Joep Burger
Format: Article
Language:English
Published: European Survey Research Association 2024-08-01
Series:Survey Research Methods
Subjects:
Online Access:https://ojs.ub.uni-konstanz.de/srm/article/view/8307
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861427331399680
author Maaike Walraad
Jonas Klingwort
Joep Burger
author_facet Maaike Walraad
Jonas Klingwort
Joep Burger
author_sort Maaike Walraad
collection DOAJ
description Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability road sensor data (as the secondary data source). The design-based survey estimate was considerably lower than the CRC estimates, which are based on multiple data sources and statistical models. A likely explanation is measurement error in the survey, which is conceivable given the response burden of diary questionnaires. This paper explores the potential of machine learning as a more flexible alternative to the commonly used regression models as the basis for a number of CRC estimators. Moreover, we report on the potential impact of the quality of the non-probability source degrading over time. In particular, we study differences in prediction quality, point estimates, variance estimates, and estimates of measurement error in five years. Results show that machine learning clearly outperforms the regression models, but the obtained CRC point estimates remain largely unaffected. Log-linear estimators, in combination with machine learning models seem more sensitive to a declining number of working sensors than the Lincoln-Peterson estimator, Huggins estimator, and loglinear estimators with regression models.
format Article
id doaj-art-c34e4bb0a69c46eababe43dae6edf652
institution Kabale University
issn 1864-3361
language English
publishDate 2024-08-01
publisher European Survey Research Association
record_format Article
series Survey Research Methods
spelling doaj-art-c34e4bb0a69c46eababe43dae6edf6522025-02-09T14:16:09ZengEuropean Survey Research AssociationSurvey Research Methods1864-33612024-08-01182Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement ErrorMaaike WalraadJonas Klingwort0https://orcid.org/0000-0002-4545-9136Joep Burger1https://orcid.org/0000-0002-7298-5561Statistics Netherlands (CBS)Statistics Netherlands (CBS) Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability road sensor data (as the secondary data source). The design-based survey estimate was considerably lower than the CRC estimates, which are based on multiple data sources and statistical models. A likely explanation is measurement error in the survey, which is conceivable given the response burden of diary questionnaires. This paper explores the potential of machine learning as a more flexible alternative to the commonly used regression models as the basis for a number of CRC estimators. Moreover, we report on the potential impact of the quality of the non-probability source degrading over time. In particular, we study differences in prediction quality, point estimates, variance estimates, and estimates of measurement error in five years. Results show that machine learning clearly outperforms the regression models, but the obtained CRC point estimates remain largely unaffected. Log-linear estimators, in combination with machine learning models seem more sensitive to a declining number of working sensors than the Lincoln-Peterson estimator, Huggins estimator, and loglinear estimators with regression models. https://ojs.ub.uni-konstanz.de/srm/article/view/8307total survey errorsurvey underreportingroad freight transport surveyweigh-in-motion sensor dataadministrative datagradient boosting
spellingShingle Maaike Walraad
Jonas Klingwort
Joep Burger
Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
Survey Research Methods
total survey error
survey underreporting
road freight transport survey
weigh-in-motion sensor data
administrative data
gradient boosting
title Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_full Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_fullStr Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_full_unstemmed Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_short Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_sort incorporating machine learning in capture recapture estimation of survey measurement error
topic total survey error
survey underreporting
road freight transport survey
weigh-in-motion sensor data
administrative data
gradient boosting
url https://ojs.ub.uni-konstanz.de/srm/article/view/8307
work_keys_str_mv AT maaikewalraad incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror
AT jonasklingwort incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror
AT joepburger incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror