Speech-to-text models to transcribe emergency calls
Abstract
This thesis is part of the larger project “AI-Support in Medical Emergency Calls (AISMEC)”, which aims to develop a decision support system for Emergency Medical Communication Center (EMCC) operators to better identify and respond to acute brain stroke. The system will utilize historical health data and the transcription from the emergency call to assist the EMCC operator in whether or not to dispatch an ambulance and with what priority and urgency. Our research primarily focuses on adapting the Automatic Speech Recognition (ASR) model, Whisper, to create a robust and accurate ASR model to transcribe Norwegian emergency calls. The model was fine-tuned on simulated emergency calls and recordings done by ourselves. Furthermore, a proof-of-concept ASR web application was developed with the goal of streamlining the manual task of transcribing emergency calls. After demonstrating the application to the involved researchers in AISMEC, and the potential users, both suggested optimism about the potential of this solution to streamline the transcription process. As part of our research, we conducted an experiment where we utilized the suggested transcriptions provided by the application and then corrected them for accuracy. This approach showed a notable reduction in our transcription time. We also found that establishing a machine learning pipeline to fine-tune the model on historical emergency calls was feasible. Further work would involve training the model on actual emergency calls. To investigate the efficiency of the ASR web application further, a larger scale of the semi-automatic transcription experiment could be conducted by the professional audio transcribers at Haukeland universitetssjukehus.