What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor Selection

Abstract The low-cost sensor has changed the air quality monitoring paradigm with the capacity for efficient network expansion and community engagement. The surge in its use has sparked new research interests in understanding its data quality. Many studies have employed field calibration to improve...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lu Liang, Jacob Daniels
Format:	Article
Language:	English
Published:	Springer 2022-06-01
Series:	Aerosol and Air Quality Research
Subjects:	PurpleAir Machine learning Particulate matter PM2.5 Air quality
Online Access:	https://doi.org/10.4209/aaqr.220076
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1825197508617306112
author	Lu Liang Jacob Daniels
author_facet	Lu Liang Jacob Daniels
author_sort	Lu Liang
collection	DOAJ
description	Abstract The low-cost sensor has changed the air quality monitoring paradigm with the capacity for efficient network expansion and community engagement. The surge in its use has sparked new research interests in understanding its data quality. Many studies have employed field calibration to improve sensor agreement with co-located reference monitors. Yet, studies that systematically examine the performance of different calibration techniques are limited in scope and depth. This study comprehensively assessed ten widely used data techniques, namely AdaBoost, Bayesian ridge, gradient tree boosting, K-nearest neighbors, Lasso, multivariable linear regression, neural network, random forest, ridge regression, and support vector machine. We compared their performance using a standardized baseline dataset and their responses to various parameter combinations. We further assessed the training sample size effect to understand the optimal duration of field calibration for achieving good accuracy. Finally, we tested different predictor combinations to address whether the inclusion of more predictors will lead to better performance. Using baseline data, the neural network achieved the best performance, followed by the four regression-based methods, showing very consistent and stable performance. While confirming that the latest research tendency is deep learning, regression is still a viable option for studies with limited effort in parameter tuning and method selection, especially considering its computational efficiency and simplicity. The sample size effect is most evident when the sample size drops below 30%, which is equivalent to six weeks of continuously collected hourly data. Although algorithms react differently to the number of predictors, their performance was typically boosted by adding more predictors, especially the particle count and humidity. Our study not only describes an approach of sophisticated data-driven calibration for practical applications, but also provides insights into the compounding impacts of parameters, samples, and predictors in algorithm performance.
format	Article
id	doaj-art-6e877f5339194086a51536a511cae517
institution	Kabale University
issn	1680-8584 2071-1409
language	English
publishDate	2022-06-01
publisher	Springer
record_format	Article
series	Aerosol and Air Quality Research
spelling	doaj-art-6e877f5339194086a51536a511cae5172025-02-09T12:18:28ZengSpringerAerosol and Air Quality Research1680-85842071-14092022-06-0122911610.4209/aaqr.220076What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor SelectionLu Liang0Jacob Daniels1Department of Geography and the Environment, University of North TexasDepartment of Electrical Engineering, University of North TexasAbstract The low-cost sensor has changed the air quality monitoring paradigm with the capacity for efficient network expansion and community engagement. The surge in its use has sparked new research interests in understanding its data quality. Many studies have employed field calibration to improve sensor agreement with co-located reference monitors. Yet, studies that systematically examine the performance of different calibration techniques are limited in scope and depth. This study comprehensively assessed ten widely used data techniques, namely AdaBoost, Bayesian ridge, gradient tree boosting, K-nearest neighbors, Lasso, multivariable linear regression, neural network, random forest, ridge regression, and support vector machine. We compared their performance using a standardized baseline dataset and their responses to various parameter combinations. We further assessed the training sample size effect to understand the optimal duration of field calibration for achieving good accuracy. Finally, we tested different predictor combinations to address whether the inclusion of more predictors will lead to better performance. Using baseline data, the neural network achieved the best performance, followed by the four regression-based methods, showing very consistent and stable performance. While confirming that the latest research tendency is deep learning, regression is still a viable option for studies with limited effort in parameter tuning and method selection, especially considering its computational efficiency and simplicity. The sample size effect is most evident when the sample size drops below 30%, which is equivalent to six weeks of continuously collected hourly data. Although algorithms react differently to the number of predictors, their performance was typically boosted by adding more predictors, especially the particle count and humidity. Our study not only describes an approach of sophisticated data-driven calibration for practical applications, but also provides insights into the compounding impacts of parameters, samples, and predictors in algorithm performance.https://doi.org/10.4209/aaqr.220076PurpleAirMachine learningParticulate matterPM2.5Air quality
spellingShingle	Lu Liang Jacob Daniels What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor Selection Aerosol and Air Quality Research PurpleAir Machine learning Particulate matter PM2.5 Air quality
title	What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor Selection
title_full	What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor Selection
title_fullStr	What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor Selection
title_full_unstemmed	What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor Selection
title_short	What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor Selection
title_sort	what influences low cost sensor data calibration a systematic assessment of algorithms duration and predictor selection
topic	PurpleAir Machine learning Particulate matter PM2.5 Air quality
url	https://doi.org/10.4209/aaqr.220076
work_keys_str_mv	AT luliang whatinfluenceslowcostsensordatacalibrationasystematicassessmentofalgorithmsdurationandpredictorselection AT jacobdaniels whatinfluenceslowcostsensordatacalibrationasystematicassessmentofalgorithmsdurationandpredictorselection

What Influences Low-cost Sensor Data Calibration? - A Systematic Assessment of Algorithms, Duration, and Predictor Selection

Similar Items