Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error

Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability roa...

Full description

Saved in:

Bibliographic Details
Main Authors:	Maaike Walraad, Jonas Klingwort, Joep Burger
Format:	Article
Language:	English
Published:	European Survey Research Association 2024-08-01
Series:	Survey Research Methods
Subjects:	total survey error survey underreporting road freight transport survey weigh-in-motion sensor data administrative data gradient boosting
Online Access:	https://ojs.ub.uni-konstanz.de/srm/article/view/8307
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823861427331399680
author	Maaike Walraad Jonas Klingwort Joep Burger
author_facet	Maaike Walraad Jonas Klingwort Joep Burger
author_sort	Maaike Walraad
collection	DOAJ
description	Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability road sensor data (as the secondary data source). The design-based survey estimate was considerably lower than the CRC estimates, which are based on multiple data sources and statistical models. A likely explanation is measurement error in the survey, which is conceivable given the response burden of diary questionnaires. This paper explores the potential of machine learning as a more flexible alternative to the commonly used regression models as the basis for a number of CRC estimators. Moreover, we report on the potential impact of the quality of the non-probability source degrading over time. In particular, we study differences in prediction quality, point estimates, variance estimates, and estimates of measurement error in five years. Results show that machine learning clearly outperforms the regression models, but the obtained CRC point estimates remain largely unaffected. Log-linear estimators, in combination with machine learning models seem more sensitive to a declining number of working sensors than the Lincoln-Peterson estimator, Huggins estimator, and loglinear estimators with regression models.
format	Article
id	doaj-art-c34e4bb0a69c46eababe43dae6edf652
institution	Kabale University
issn	1864-3361
language	English
publishDate	2024-08-01
publisher	European Survey Research Association
record_format	Article
series	Survey Research Methods
spelling	doaj-art-c34e4bb0a69c46eababe43dae6edf6522025-02-09T14:16:09ZengEuropean Survey Research AssociationSurvey Research Methods1864-33612024-08-01182Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement ErrorMaaike WalraadJonas Klingwort0https://orcid.org/0000-0002-4545-9136Joep Burger1https://orcid.org/0000-0002-7298-5561Statistics Netherlands (CBS)Statistics Netherlands (CBS) Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability road sensor data (as the secondary data source). The design-based survey estimate was considerably lower than the CRC estimates, which are based on multiple data sources and statistical models. A likely explanation is measurement error in the survey, which is conceivable given the response burden of diary questionnaires. This paper explores the potential of machine learning as a more flexible alternative to the commonly used regression models as the basis for a number of CRC estimators. Moreover, we report on the potential impact of the quality of the non-probability source degrading over time. In particular, we study differences in prediction quality, point estimates, variance estimates, and estimates of measurement error in five years. Results show that machine learning clearly outperforms the regression models, but the obtained CRC point estimates remain largely unaffected. Log-linear estimators, in combination with machine learning models seem more sensitive to a declining number of working sensors than the Lincoln-Peterson estimator, Huggins estimator, and loglinear estimators with regression models. https://ojs.ub.uni-konstanz.de/srm/article/view/8307total survey errorsurvey underreportingroad freight transport surveyweigh-in-motion sensor dataadministrative datagradient boosting
spellingShingle	Maaike Walraad Jonas Klingwort Joep Burger Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error Survey Research Methods total survey error survey underreporting road freight transport survey weigh-in-motion sensor data administrative data gradient boosting
title	Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_full	Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_fullStr	Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_full_unstemmed	Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_short	Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
title_sort	incorporating machine learning in capture recapture estimation of survey measurement error
topic	total survey error survey underreporting road freight transport survey weigh-in-motion sensor data administrative data gradient boosting
url	https://ojs.ub.uni-konstanz.de/srm/article/view/8307
work_keys_str_mv	AT maaikewalraad incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror AT jonasklingwort incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror AT joepburger incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror

Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error

Similar Items