Developing a Speech-Recognition System using Weighted Finite-State Transducers

Summary

In a team of two students, we developed a speech-recognition system using weighted finite-state transducers (WFSTs) and a Viterbi decoder.
Our goal was to reduce the word error rate and increase the computational efficiency of the algorithm.
We did so by conducting various experiments on the system
- Tuning transition probabilities, self-loop probabilities, and final probabilities.
- Testing different WFSTs, based on uni-gram or n-gram word occurrence probabilities and adding optional silences between words.
- Enhancing the Viterbi decoder by pruning the search-tree with different strategies.
- Improving the efficiency of the decoder by using a tree-structured lexicon with language model look-ahead.
We used the library openfst_python to model the WFSTs in this project and performed our experiments in a Jupyter notebook.

I learned how I can communicate my ideas clearly to my teammate with a different background (computational linguistics). This was an interesting experience, since in most other projects, all team partners study computer science.
Additionally, I now understand clearly how automatic speech recognition can work without neural networks. Of course, most current systems simply rely on machine learning.