Neural Oracle Search on N-BEST Hypotheses

Series
Applied and Computational Mathematics Seminar
Time
Monday, September 12, 2022 - 2:00pm for 1 hour (actually 50 minutes)
Location
Skiles 005 and https://gatech.zoom.us/j/98355006347
Speaker
Tongzhou Chen – Google – tongzhou@google.com
Organizer
Martin Short

In this talk, we propose a Neural Oracle Search(NOS) model in Automatic Speech Recognition(ASR) to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The model provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed model is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This model achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. In addition, we investigate the use of the NOS model on a 1st-pass multilingual model and show that similar to the 1st-pass model, the NOS model can be made multilingual.