Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study

OBJECTIVES: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI’s ChatGPT is a su...

Full description

Saved in:
Bibliographic Details
Main Authors: İbrahim Sarbay, Göksu Bozdereli Berikol, İbrahim Ulaş Özturan
Format: Article
Language:English
Published: Wolters Kluwer Medknow Publications 2023-07-01
Series:Turkish Journal of Emergency Medicine
Subjects:
Online Access:https://journals.lww.com/10.4103/tjem.tjem_79_23
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823863889172889600
author İbrahim Sarbay
Göksu Bozdereli Berikol
İbrahim Ulaş Özturan
author_facet İbrahim Sarbay
Göksu Bozdereli Berikol
İbrahim Ulaş Özturan
author_sort İbrahim Sarbay
collection DOAJ
description OBJECTIVES: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI’s ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction. METHODS: This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative). RESULTS: Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen’s Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34–78.2), specificity of 34.5% (95% CI: 17.9–54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8–57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9–75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8–91.8), specificity of 93.1% (95% CI: 77.2–99.2), PPV of 88.9% (95% CI: 65.3–98.6), NPV of 84.4 (95% CI: 67.2–94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724–0.969, P < 0.001) for high acuity cases. CONCLUSION: The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.
format Article
id doaj-art-083f00e59e15478f9124ad165f755636
institution Kabale University
issn 2452-2473
language English
publishDate 2023-07-01
publisher Wolters Kluwer Medknow Publications
record_format Article
series Turkish Journal of Emergency Medicine
spelling doaj-art-083f00e59e15478f9124ad165f7556362025-02-09T08:57:01ZengWolters Kluwer Medknow PublicationsTurkish Journal of Emergency Medicine2452-24732023-07-0123315616110.4103/tjem.tjem_79_23Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional studyİbrahim SarbayGöksu Bozdereli Berikolİbrahim Ulaş ÖzturanOBJECTIVES: Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI’s ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction. METHODS: This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative). RESULTS: Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen’s Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34–78.2), specificity of 34.5% (95% CI: 17.9–54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8–57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9–75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8–91.8), specificity of 93.1% (95% CI: 77.2–99.2), PPV of 88.9% (95% CI: 65.3–98.6), NPV of 84.4 (95% CI: 67.2–94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724–0.969, P < 0.001) for high acuity cases. CONCLUSION: The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.https://journals.lww.com/10.4103/tjem.tjem_79_23chatbotchatgptemergency severity indextriage
spellingShingle İbrahim Sarbay
Göksu Bozdereli Berikol
İbrahim Ulaş Özturan
Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
Turkish Journal of Emergency Medicine
chatbot
chatgpt
emergency severity index
triage
title Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_full Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_fullStr Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_full_unstemmed Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_short Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study
title_sort performance of emergency triage prediction of an open access natural language processing based chatbot application chatgpt a preliminary scenario based cross sectional study
topic chatbot
chatgpt
emergency severity index
triage
url https://journals.lww.com/10.4103/tjem.tjem_79_23
work_keys_str_mv AT ibrahimsarbay performanceofemergencytriagepredictionofanopenaccessnaturallanguageprocessingbasedchatbotapplicationchatgptapreliminaryscenariobasedcrosssectionalstudy
AT goksubozdereliberikol performanceofemergencytriagepredictionofanopenaccessnaturallanguageprocessingbasedchatbotapplicationchatgptapreliminaryscenariobasedcrosssectionalstudy
AT ibrahimulasozturan performanceofemergencytriagepredictionofanopenaccessnaturallanguageprocessingbasedchatbotapplicationchatgptapreliminaryscenariobasedcrosssectionalstudy