Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand

Abstract We have used NASA’s Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) reanalysis data of aerosols and meteorology into a machine learning algorithm (MLA) to estimate surface PM2.5 concentration in Thailand. One year of hourly data from 51 ground monitoring...

Full description

Saved in:
Bibliographic Details
Main Authors: Pawan Gupta, Shanshan Zhan, Vikalp Mishra, Aekkapol Aekakkararungroj, Amanda Markert, Sarawut Paibong, Farrukh Chishtie
Format: Article
Language:English
Published: Springer 2021-09-01
Series:Aerosol and Air Quality Research
Subjects:
Online Access:https://doi.org/10.4209/aaqr.210105
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823862908712386560
author Pawan Gupta
Shanshan Zhan
Vikalp Mishra
Aekkapol Aekakkararungroj
Amanda Markert
Sarawut Paibong
Farrukh Chishtie
author_facet Pawan Gupta
Shanshan Zhan
Vikalp Mishra
Aekkapol Aekakkararungroj
Amanda Markert
Sarawut Paibong
Farrukh Chishtie
author_sort Pawan Gupta
collection DOAJ
description Abstract We have used NASA’s Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) reanalysis data of aerosols and meteorology into a machine learning algorithm (MLA) to estimate surface PM2.5 concentration in Thailand. One year of hourly data from 51 ground monitoring stations in Thailand was spatiotemporally collocated with MERRA2 fields. The integrated data then used to train and validate a supervised MLA’ random forest’ to estimate hourly and daily PM2.5 concentrations. The MLA is cross-validated using a 10-fold random sampling approach. The trained MLA can estimate PM2.5 with close to zero mean bias across the country. The correlation coefficient of 0.95 with slope and intercept values of 0.95 and 0.88 are achieved between observed and estimated PM2.5. The MLA also shows underestimation at hourly scale under very clean conditions (PM2.5 < 10 µg m−3) and overestimation during high loading (PM2.5 > 80 µg m−3). The hourly data also demonstrate high skill in following the diurnal cycle during different seasons of the year. The daily mean PM2.5 (24-hour) values follow day-to-day variability very well (correlation coefficient of 0.98, RMSE = 3.14 µg m−3), showing high value during winter months (November– February) and lower during other seasons. The trained MLA has the potential to reprocess the MERRA2 timeseries for the region, and the bias corrected data can be used in other applications such as long-term trend analysis and health exposure studies. The MLA can also be applied to GEOS forecasted fields to generate bias corrected air quality forecasts for the region.
format Article
id doaj-art-917cecd443d2486f8dd92c4fccc907a5
institution Kabale University
issn 1680-8584
2071-1409
language English
publishDate 2021-09-01
publisher Springer
record_format Article
series Aerosol and Air Quality Research
spelling doaj-art-917cecd443d2486f8dd92c4fccc907a52025-02-09T12:20:38ZengSpringerAerosol and Air Quality Research1680-85842071-14092021-09-01211111310.4209/aaqr.210105Machine Learning Algorithm for Estimating Surface PM2.5 in ThailandPawan Gupta0Shanshan Zhan1Vikalp Mishra2Aekkapol Aekakkararungroj3Amanda Markert4Sarawut Paibong5Farrukh Chishtie6Universities Space Research Association (USRA)Earth System Science Center, The University of Alabama in HuntsvilleEarth System Science Center, The University of Alabama in HuntsvilleAsian Disaster Preparedness CenterEarth System Science Center, The University of Alabama in HuntsvilleThai Pollution Control DepartmentAsian Disaster Preparedness CenterAbstract We have used NASA’s Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) reanalysis data of aerosols and meteorology into a machine learning algorithm (MLA) to estimate surface PM2.5 concentration in Thailand. One year of hourly data from 51 ground monitoring stations in Thailand was spatiotemporally collocated with MERRA2 fields. The integrated data then used to train and validate a supervised MLA’ random forest’ to estimate hourly and daily PM2.5 concentrations. The MLA is cross-validated using a 10-fold random sampling approach. The trained MLA can estimate PM2.5 with close to zero mean bias across the country. The correlation coefficient of 0.95 with slope and intercept values of 0.95 and 0.88 are achieved between observed and estimated PM2.5. The MLA also shows underestimation at hourly scale under very clean conditions (PM2.5 < 10 µg m−3) and overestimation during high loading (PM2.5 > 80 µg m−3). The hourly data also demonstrate high skill in following the diurnal cycle during different seasons of the year. The daily mean PM2.5 (24-hour) values follow day-to-day variability very well (correlation coefficient of 0.98, RMSE = 3.14 µg m−3), showing high value during winter months (November– February) and lower during other seasons. The trained MLA has the potential to reprocess the MERRA2 timeseries for the region, and the bias corrected data can be used in other applications such as long-term trend analysis and health exposure studies. The MLA can also be applied to GEOS forecasted fields to generate bias corrected air quality forecasts for the region.https://doi.org/10.4209/aaqr.210105ThailandMERRA2PM2.5Air qualityMachine learning
spellingShingle Pawan Gupta
Shanshan Zhan
Vikalp Mishra
Aekkapol Aekakkararungroj
Amanda Markert
Sarawut Paibong
Farrukh Chishtie
Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand
Aerosol and Air Quality Research
Thailand
MERRA2
PM2.5
Air quality
Machine learning
title Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand
title_full Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand
title_fullStr Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand
title_full_unstemmed Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand
title_short Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand
title_sort machine learning algorithm for estimating surface pm2 5 in thailand
topic Thailand
MERRA2
PM2.5
Air quality
Machine learning
url https://doi.org/10.4209/aaqr.210105
work_keys_str_mv AT pawangupta machinelearningalgorithmforestimatingsurfacepm25inthailand
AT shanshanzhan machinelearningalgorithmforestimatingsurfacepm25inthailand
AT vikalpmishra machinelearningalgorithmforestimatingsurfacepm25inthailand
AT aekkapolaekakkararungroj machinelearningalgorithmforestimatingsurfacepm25inthailand
AT amandamarkert machinelearningalgorithmforestimatingsurfacepm25inthailand
AT sarawutpaibong machinelearningalgorithmforestimatingsurfacepm25inthailand
AT farrukhchishtie machinelearningalgorithmforestimatingsurfacepm25inthailand