- Series
- Applied and Computational Mathematics Seminar
- Time
- Monday, September 12, 2022 - 2:00pm for 1 hour (actually 50 minutes)
- Location
- Skiles 005 and https://gatech.zoom.us/j/98355006347
- Speaker
- Tongzhou Chen – Google – tongzhou@google.com
- Organizer
- Martin Short
In this talk, we propose a Neural Oracle Search(NOS) model in Automatic Speech Recognition(ASR) to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The model provides a sequence level score for each audio-hypothesis pair that is obtained by integrating information from multiple sources, such as the input acoustic representations, N-best hypotheses, additional 1st-pass statistics, and unpaired textual information through an external language model. These scores are then used to map the search problem of identifying the most likely hypothesis to a sequence classification problem. The definition of the proposed model is broad enough to allow its use as an alternative to beam search in the 1st-pass or as a 2nd-pass, rescoring step. This model achieves up to 12% relative reductions in Word Error Rate (WER) across several languages over state-of-the-art baselines with relatively few additional parameters. In addition, we investigate the use of the NOS model on a 1st-pass multilingual model and show that similar to the 1st-pass model, the NOS model can be made multilingual.