US 7,457,745 B2
Method and apparatus for fast on-line automatic speaker/environment adaptation for speech/speaker recognition in the presence of changing environments
Shubha Kadambe, Thousand Oaks, Calif. (US); Ron Burns, Oceanside, Calif. (US); and Markus Iseli, Los Angeles, Calif. (US)
Assigned to HRL Laboratories, LLC, Malibu, Calif. (US)
Filed on Dec. 03, 2003, as Appl. No. 10/728,106.
Claims priority of provisional application 60/430788, filed on Dec. 03, 2002.
Prior Publication US 2004/0230420 A1, Nov. 18, 2004
Int. Cl. G10L 19/00 (2006.01)
U.S. Cl. 704—216  [704/243; 704/244] 123 Claims
OG exemplary drawing
 
1. A method for fast on-line automatic speaker/environment adaptation suitable for speech/speaker recognition in the presence of changing environmental conditions, the method comprising acts of:
performing front-end processing on an acoustic input signal, wherein the front-end processing generates MEL frequency cepstral features representative of the acoustic input signal;
performing recognition and adaptation by:
providing the MEL frequency cepstral features to a speech recognizer, wherein the speech recognizer utilizes the MEL frequency cepstral features and a current list of acoustic training models to determine at least one best hypothesis;
receiving, from the speech recognizer, at least one best hypothesis, associated acoustic training models, and associated probabilities;
computing a pre-adaptation acoustic score by recognizing an utterance using the associated acoustic training models;
choosing acoustic training models from the associated acoustic training models;
performing adaptation on the chosen associated acoustic training models;
computing a post-adaptation acoustic score by recognizing the utterance using the adapted acoustic training models;
comparing the pre-adaptation acoustic score with the post-adaptation acoustic score to check for improvement; modifying the current list of acoustic training models to include the adapted acoustic training models, if the acoustic score improved after performing adaptation; and performing recognition and adaptation iteratively until the acoustic score ceases to improve;
choosing the best hypothesis as recognized words once the acoustic score ceases to improve; and
outputting the recognized words.