System and method for learning alternate pronunciations for speech recognition

US 9,489,943 B2
Filed: 10/16/2014
Issued: 11/08/2016
Est. Priority Date: 10/16/2013
Status: Active Grant

First Claim

Patent Images

1. A method for learning pronunciation in a given language comprising the steps of:

a. training an acoustic model on a large speech corpus to distinguish phonemes;

b. constructing a phoneme confusion matrix;

c. constructing a phoneme replacement candidate list for each phoneme in a set of speech data containing pronunciations for recognition;

d. learning alternative pronunciations of a word that has been mispronounced;

e. combining said learned alternative pronunciations with a linguistic dictionary to create a pooled dictionary; and

f. pruning said pooled dictionary to limit the number of learned alternative pronunciations in order to create an improved dictionary.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for learning alternate pronunciations for speech recognition is disclosed. Alternative name pronunciations may be covered, through pronunciation learning, that have not been previously covered in a general pronunciation dictionary. In an embodiment, the detection of phone-level and syllable-level mispronunciations in words and sentences may be based on acoustic models trained by Hidden Markov Models. Mispronunciations may be detected by comparing the likelihood of the potential state of the targeting pronunciation unit with a pre-determined threshold through a series of tests. It is also within the scope of an embodiment to detect accents.

15 Citations

View as Search Results

37 Claims

1. A method for learning pronunciation in a given language comprising the steps of:
- a. training an acoustic model on a large speech corpus to distinguish phonemes;
  
  b. constructing a phoneme confusion matrix;
  
  c. constructing a phoneme replacement candidate list for each phoneme in a set of speech data containing pronunciations for recognition;
  
  d. learning alternative pronunciations of a word that has been mispronounced;
  
  e. combining said learned alternative pronunciations with a linguistic dictionary to create a pooled dictionary; and
  
  f. pruning said pooled dictionary to limit the number of learned alternative pronunciations in order to create an improved dictionary.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 2. The method of claim 1, wherein the acoustic model in step (a) is trained by one of:
    - maximum likelihood criterion and discriminative training criterion.
  - 3. The acoustic model of claim 1, wherein said acoustic model in step (a) is based on a Hidden Markov Model and Gaussian Mixture Model.
  - 4. The method of claim 1, wherein step (b) further comprises the step of merging an acoustic confusion matrix with a linguistic confusion matrix to construct said phoneme confusion matrix.
  - 5. The method of claim 4, wherein said acoustic confusion matrix is obtained by performing a phoneme recognition experiment on a test set of speech data.
  - 6. The acoustic confusion matrix of claim 5, wherein a low value in the acoustic confusion matrix indicates a phoneme is similar to other phonemes and confusable.
  - 7. The linguistic confusion matrix of claim 4, wherein the linguistic confusion matrix is provided by a group of linguistic experts.
  - 8. The linguistic confusion matrix of claim 7, wherein the linguistic confusion matrix comprises a binary matrix comprising the numbers 0 and 1.
  - 9. The linguistic confusion matrix of claim 8, wherein 0 indicates that a phoneme belongs to the same confusion cluster as an other phoneme and the phonemes are confusable.
  - 10. The method of claim 1, wherein step (c) further comprises the steps of:
    - a. selecting a phoneme from the speech data as a target phoneme for analysis and arranging the remaining phonemes based on distance to the target phoneme;
      
      b. applying a statistical clustering algorithm to similarly group the arranged phonemes;
      
      c. constructing the list of phoneme replacement candidates for the target phoneme from the similarly grouped phonemes; and
      
      d. repeating all of the steps for each phoneme in the speech data set.
  - 11. The method of claim 10, wherein the distance between a phoneme and said target phoneme in step (a) represents a confusion value in a phoneme confusion matrix.
  - 12. The phoneme confusion matrix of claim 11, wherein a low value indicates high confusion between a phoneme and a target phoneme.
  - 13. The method of claim 1, wherein step (d) further comprises the steps of:
    - a. obtaining the original pronunciation for each word that has been misrecognized;
      
      b. generating an alternative pronunciation data set wherein an improved new pronunciation is compared with an original pronunciation;
      
      c. performing recognition on the alternative pronunciation data set, the acoustic model, and the set of speech data;
      
      d. determining the best pronunciation from the alternative pronunciation data set;
      
      e. retaining selected pronunciations from said alternative pronunciation data set; and
      
      f. repeating the steps for all misrecognized words to form the learned pronunciation data set.
  - 14. The method of claim 13, wherein the original pronunciation in step (a) is obtained from one of:
    - a linguistic dictionary and an automatic word-to-phoneme generator.
  - 15. The method of claim 13, wherein step (b) further comprises the steps of:
    - a. placing groups of phonemes in their respective positions; and
      
      b. obtaining all of the phoneme combinations.
  - 16. The determination of the best pronunciation of claim 13, wherein the best pronunciation of the word in step (d) results in the highest recognition accuracy.
  - 17. The method of claim 16, wherein step (b) further comprises the step of:
    - determining the size of the alternative pronunciation data set by the mathematical equation;
      
      X=Π
      
      _m=1^MN_m.
  - 18. The method of claim 13, wherein the recognition of step (c) is performed using a Viterbi decoding algorithm.
  - 19. The linguistic dictionary of claim 1, wherein the linguistic dictionary comprises a set of pronunciations of common words in a language and is provided by a group of linguistic experts.
  - 20. The method of claim 1, wherein creation of the improved dictionary in step (f) further comprises the steps of:
    - a. computing a distance from each word to an other word in the linguistic dictionary;
      
      b. creating a subset of similar words for each misrecognized word;
      
      c. performing recognition on the subset of similar words, the acoustic model and the set of speech;
      
      d. identifying the frequency of failure;
      
      e. removing a pronunciation contributing to frequency failure greater than a threshold; and
      
      f. repeating the process for all misrecognized words.
  - 21. The method of claim 20, wherein computing of step (a) comprises a dynamic programming algorithm based on word pronunciations.
  - 22. The method of claim 21, wherein the dynamic programming algorithm uses the accumulative confusion values in the phoneme confusion matrix as costs in computation.
  - 23. The confusion values of claim 22, wherein a low value indicates word similarity.
  - 24. The method of claim 20, wherein step (b) further comprises the step of selecting a number of words based on a threshold.
  - 25. The method of claim 20, wherein step (d) further comprises the step of identifying incorrect recognitions, a pronunciation associated with an incorrect recognition, and the frequency of failure related to an incorrect recognition.
  - 26. The method of claim 20, wherein the recognition of step (c) is performed using a Viterbi decoding algorithm.
  - 27. The method of claim 1, wherein step (f) further comprises the step of:
    - applying the improved dictionary in a grammar based speech recognition task to improve speech recognition accuracy.
  - 28. The method of claim 1 further comprising the step of optimizing the efficiency of learning alternative pronunciations, wherein said optimizing comprises one or more of the following:
    - a. reducing the length of a phoneme replacement candidate list for each phoneme in the original pronunciation of the a word that has been mispronounced if a number of candidate pronunciations exceed a threshold; and
      
      b. optimizing a phoneme determination order when obtaining a desired pronunciation for a misrecognized word.
  - 29. The method of claim 28, wherein step (a) further comprises the step of determining the scale of length reduction of a phoneme replacement candidate list with the mathematical equation:
  - 30. The method of claim 28, wherein the phoneme determination order in step (b) continues from a phoneme with the longest phoneme replacement candidate list and continues in the descending order of the length of phoneme replacement candidate list for each phoneme.

31. A method for learning alternative pronunciations for speech in a given language comprising the steps of:
- a. selecting a word instance for learning alternative pronunciations;
  
  b. performing a first test on the word instance to determine a baseline recognition result;
  
  c. performing hierarchical pronunciation learning on the word instance and selecting a pronunciation that is similar to the word instance; and
  
  d. performing an other test to assess if the selected pronunciation is recognized as the word instance wherein if the word is recognized, adding the selected pronunciation to a dictionary, otherwise, discarding the selected pronunciation.
- View Dependent Claims (32, 33, 34, 35)
- - 32. The method of claim 31, wherein the first test comprises multi-grammar recognition with reference pronunciations.
  - 33. The method of claim 32, wherein the reference pronunciations are scored against the word instance to determine matches.
  - 34. The method of claim 31, wherein the hierarchical pronunciation learning comprises learning alternative pronunciations for the word instance through iterations.
  - 35. The method of claim 31, wherein the other test comprises multi-grammar recognition.

36. A system for language learning of mispronunciation detection comprising:
- a. a lexicon builder which is capable of integrating one or more of;
  
  pronunciation dictionaries, spelling-to-pronunciation interpretations, and text normalizations, to create a list of acceptable phoneme sequences;
  
  b. a speech corpus;
  
  c. an acoustic model;
  
  d. a word lexicon;
  
  e. a word grammar;
  
  f. a grammar-based recognizer which provides a hypothesized name based on the speech corpus, acoustic model, word lexicon, and the word grammar to a means for scoring; and
  
  g. a means for scoring which indicates accuracy of the hypothesized name.
- View Dependent Claims (37)
- - 37. The system of claim 36, wherein the pronunciation dictionaries comprise a learned dictionary, a linguistic dictionary, and prototype dictionary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interactive Intelligence Group Incorporated (Genesys Cloud Services Incorporated)
Original Assignee
Interactive Intelligence Group Incorporated (Genesys Cloud Services Incorporated)
Inventors
Ge, Zhenhao, Tyagi, Vivek, Ganapathiraju, Aravind, Iyer, Ananth Nagaraja, Randal, Scott Allen, Wyss, Felix Immanuel
Primary Examiner(s)
Augustin, Marcellus

Application Number

US14/515,607
Publication Number

US 20150106082A1
Time in Patent Office

754 Days
Field of Search

704/274
US Class Current

1/1
CPC Class Codes

G06F 40/242   Dictionaries

G06F 40/40   Processing or translation o...

G09B 19/04   Speaking with audible prese...

G09B 19/06   Foreign languages with audi...

G10L 15/063   Training

G10L 15/14   using statistical models, e...

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/081   Search algorithms, e.g. Bau...

System and method for learning alternate pronunciations for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

15 Citations

37 Claims

Specification

Use Cases

Quick Links

Others

System and method for learning alternate pronunciations for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

37 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others