×

Identifying substitute pronunciations

  • US 9,747,897 B2
  • Filed: 12/17/2013
  • Issued: 08/29/2017
  • Est. Priority Date: 12/17/2013
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • selecting, by a system that includes (i) a confusion matrix manager configured to store, for like-pronunciation groups, phones of expected phonetic transcriptions as substitute pronunciations for corresponding phones of actual phonetic transcriptions in a confusion matrix and (ii) an enhanced speech recognizer configured to obtain, from the confusion matrix manager, using output of an acoustic model associated with the enhanced speech recognizer, the substitute pronunciations before inputting the substitute pronunciations to a language model associated with the enhanced speech recognizer, one or more terms;

    obtaining, by the system, an expected phonetic transcription of an idealized native speaker of a natural language speaking the one or more terms;

    after obtaining the expected phonetic transcription of the idealized native speaker of the natural language speaking the one or more terms, receiving audio data corresponding to a particular user that is not the idealized native speaker of the natural language speaking the one or more terms in the natural language;

    receiving data identifying a like-pronunciation group associated with the particular user;

    obtaining, by the system, based on the audio data, an actual phonetic transcription of the particular user that is not the idealized native speaker of the natural language speaking the one or more terms in the natural language;

    aligning, by the system, the expected phonetic transcription of the idealized native speaker of the natural language with the actual phonetic transcription of the particular user that is not the idealized native speaker of the natural language;

    identifying, by the system, based on aligning the expected phonetic transcription of the idealized native speaker with the actual phonetic transcription of the particular user that is not the idealized native speaker of the natural language, one or more phones of the expected phonetic transcription that is different than one or more corresponding phones of the actual phonetic transcription;

    in response to identifying, based on aligning the expected phonetic transcription of the idealized native speaker with the actual phonetic transcription of the particular user that is not the idealized native speaker of the natural language, designating, by the system, the one or more phones of the expected phonetic transcription as a substitute pronunciation for the corresponding phones of the actual phonetic transcription for other terms that (i) are spoken by other users that are also associated with the like-pronunciation group, and (ii) have a respective phonetic transcription that includes the one or more corresponding phones;

    obtaining, by the enhanced speech recognizer that is configured to obtain, from the confusion matrix manager, using output of the acoustic model associated with the enhanced speech recognizer, the substitute pronunciations before inputting the substitute pronunciations to the language model associated with the enhanced speech recognizer, a transcription of another term that (i) is spoken by another user that is also associated with the like-pronunciation group, and (ii) has a phonetic transcription that includes the one or more corresponding phones based on the one or more phones of the expected phonetic transcription being designated as a substitute pronunciation for the one or more corresponding phones of the actual phonetic transcription; and

    inputting, by the enhanced speech recognizer, using the transcription of another term that (i) is spoken by another user that is also associated with the like-pronunciation group, and (ii) has a phonetic transcription that includes the one or more corresponding phones, the one or more phones of the expected phonetic transcription that is designated as the substitute pronunciation for the one or more corresponding phones of the actual phonetic transcription to the language model associated with the enhanced speech recognizer.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×