Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software
Abstract
Speech recognition has become increasingly popular in radiology reporting in the last decade. However, developing a speech recognition system for a new language in a highly specific domain requires a lot of resources, expert knowledge and skills. Therefore, commercial vendors do not offer ready-made radiology speech recognition systems for less-resourced languages. This paper describes the implementation of a radiology speech recognition system for Estonian, a language with less than one million native speakers. The system was developed in partnership with a hospital that provided a corpus of written reports for language modeling purposes. Rewrite rules for pre-processing training texts and postprocessing recognition results were created manually based on a small parallel corpus created by the hospital’s radiologists, using the Thrax toolkit. Deep neural network based acoustic models were trained based on 216 hours of out-of-domain data and adapted on 14 hours of spoken radiology data, using the Kaldi toolkit. The current word error rate of the system is 5.4%. The system is in active use in real clinical environment.