Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error
Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability roa...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
European Survey Research Association
2024-08-01
|
Series: | Survey Research Methods |
Subjects: | |
Online Access: | https://ojs.ub.uni-konstanz.de/srm/article/view/8307 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861427331399680 |
---|---|
author | Maaike Walraad Jonas Klingwort Joep Burger |
author_facet | Maaike Walraad Jonas Klingwort Joep Burger |
author_sort | Maaike Walraad |
collection | DOAJ |
description |
Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability road sensor data (as the secondary data source). The design-based survey estimate was considerably lower than the CRC estimates, which are based on multiple data sources and statistical models. A likely explanation is measurement error in the survey, which is conceivable given the response burden of diary questionnaires. This paper explores the potential of machine learning as a more flexible alternative to the commonly used regression models as the basis for a number of CRC estimators. Moreover, we report on the potential impact of the quality of the non-probability source degrading over time. In particular, we study differences in prediction quality, point estimates, variance estimates, and estimates of measurement error in five years. Results show that machine learning clearly outperforms the regression models, but the obtained CRC point estimates remain largely unaffected. Log-linear estimators, in combination with machine learning models seem more sensitive to a declining number of working sensors than the Lincoln-Peterson estimator, Huggins estimator, and loglinear estimators with regression models.
|
format | Article |
id | doaj-art-c34e4bb0a69c46eababe43dae6edf652 |
institution | Kabale University |
issn | 1864-3361 |
language | English |
publishDate | 2024-08-01 |
publisher | European Survey Research Association |
record_format | Article |
series | Survey Research Methods |
spelling | doaj-art-c34e4bb0a69c46eababe43dae6edf6522025-02-09T14:16:09ZengEuropean Survey Research AssociationSurvey Research Methods1864-33612024-08-01182Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement ErrorMaaike WalraadJonas Klingwort0https://orcid.org/0000-0002-4545-9136Joep Burger1https://orcid.org/0000-0002-7298-5561Statistics Netherlands (CBS)Statistics Netherlands (CBS) Capture-recapture (CRC) is currently considered a promising method to use non-probability samples to estimate survey measurement error. In previous studies, we derived adjusted survey estimates using CRC by combining probability-based survey data (as the initial data source) and non-probability road sensor data (as the secondary data source). The design-based survey estimate was considerably lower than the CRC estimates, which are based on multiple data sources and statistical models. A likely explanation is measurement error in the survey, which is conceivable given the response burden of diary questionnaires. This paper explores the potential of machine learning as a more flexible alternative to the commonly used regression models as the basis for a number of CRC estimators. Moreover, we report on the potential impact of the quality of the non-probability source degrading over time. In particular, we study differences in prediction quality, point estimates, variance estimates, and estimates of measurement error in five years. Results show that machine learning clearly outperforms the regression models, but the obtained CRC point estimates remain largely unaffected. Log-linear estimators, in combination with machine learning models seem more sensitive to a declining number of working sensors than the Lincoln-Peterson estimator, Huggins estimator, and loglinear estimators with regression models. https://ojs.ub.uni-konstanz.de/srm/article/view/8307total survey errorsurvey underreportingroad freight transport surveyweigh-in-motion sensor dataadministrative datagradient boosting |
spellingShingle | Maaike Walraad Jonas Klingwort Joep Burger Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error Survey Research Methods total survey error survey underreporting road freight transport survey weigh-in-motion sensor data administrative data gradient boosting |
title | Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error |
title_full | Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error |
title_fullStr | Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error |
title_full_unstemmed | Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error |
title_short | Incorporating Machine Learning in Capture-Recapture Estimation of Survey Measurement Error |
title_sort | incorporating machine learning in capture recapture estimation of survey measurement error |
topic | total survey error survey underreporting road freight transport survey weigh-in-motion sensor data administrative data gradient boosting |
url | https://ojs.ub.uni-konstanz.de/srm/article/view/8307 |
work_keys_str_mv | AT maaikewalraad incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror AT jonasklingwort incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror AT joepburger incorporatingmachinelearningincapturerecaptureestimationofsurveymeasurementerror |