Methodology for generating enhanced demiphone acoustic models for speech recognition

US 20060136209A1
Filed: 12/16/2004
Published: 06/22/2006
Est. Priority Date: 12/16/2004
Status: Active Grant

First Claim

Patent Images

1. A system for implementing a speech recognition engine, comprising:

demiphone acoustic models that said speech recognition engine utilizes to perform speech recognition procedures, said demiphone acoustic models each having three states that collectively form a preceding demiphone and a succeeding demiphone; and

an acoustic model generator that analyzes speech context information to configure each of said demiphone acoustic models as either a succeeding-dominant demiphone acoustic model or a preceding-dominant demiphone acoustic model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for effectively performing speech recognition procedures includes enhanced demiphone acoustic models that a speech recognition engine utilizes to perform the speech recognition procedures. The enhanced demiphone acoustic models each have three states that are collectively arranged to form a preceding demiphone and a succeeding demiphone. An acoustic model generator may utilize a decision tree for analyzing speech context information from a training database. The acoustic model generator then effectively configures each of the enhanced demiphone acoustic models as either a succeeding-dominant enhanced demiphone acoustic model or a preceding-dominant enhanced demiphone acoustic model to accurately model speech characteristics.

Citations

43 Claims

1. A system for implementing a speech recognition engine, comprising:
- demiphone acoustic models that said speech recognition engine utilizes to perform speech recognition procedures, said demiphone acoustic models each having three states that collectively form a preceding demiphone and a succeeding demiphone; and
  
  an acoustic model generator that analyzes speech context information to configure each of said demiphone acoustic models as either a succeeding-dominant demiphone acoustic model or a preceding-dominant demiphone acoustic model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The system of claim 1 wherein said acoustic models represent phones from a phone set utilized by said speech recognition engine.
  - 3. The system of claim 1 wherein said speech context information includes a preceding context corresponding to an immediately preceding phone with respect to a current phone represented by one of said demiphone models.
  - 4. The system of claim 1 wherein said speech context information includes a succeeding context corresponding to an immediately succeeding phone with respect to a current phone represented by one of said demiphone models.
  - 5. The system of claim 1 wherein said succeeding-dominant demiphone model has a first state in said preceding demiphone, said succeeding-dominant demiphone model also having a second state and a third state in said succeeding demiphone.
  - 6. The system of claim 1 wherein said preceding-dominant demiphone model has a first state and a second state in said preceding demiphone, said preceding-dominant demiphone model also having a third state in said succeeding demiphone.
  - 7. The system of claim 1 wherein said demiphone models are configured to model speech characteristics by utilizing said succeeding-dominant demiphone models to emphasize succeeding phone contexts, and by utilizing said preceding-dominant demiphone models to emphasize preceding phone contexts.
  - 8. The system of claim 1 wherein said speech context information is identified by decision trees that correspond to said three states, said decision trees being generated to summarize linguistic properties and acoustic characteristics observed in a database of speech samples.
  - 9. The system of claim 1 wherein said succeeding-dominant demiphone has no states in said preceding demiphone, said succeeding-dominant demiphone model having said three states in said succeeding demiphone.
  - 10. The system of claim 1 wherein said preceding-dominant demiphone has zero states in said succeeding demiphone, said preceding-dominant demiphone model having said three states in said preceding demiphone.
  - 11. The system of claim 1 wherein a contextual dominance for each demiphone state from a given one of said demiphone acoustic models is determined by analyzing predominant contextual information in a triphone decision tree corresponding to said each demiphone state.
  - 12. The system of claim 1 wherein said preceding demiphone includes said speech context information only from a preceding phone with respect to one of said demiphone models that includes said preceding demiphone.
  - 13. The system of claim 1 wherein said succeeding demiphone includes said speech context information only from a succeeding phone with respect to one of said demiphone models that includes said succeeding demiphone.
  - 14. The system of claim 1 wherein said speech context information is identified by decision trees that each include a series of questions, said questions each corresponding to a different acoustic speech characteristic, said questions each also being used to identify a contextual dominance characteristic corresponding to said different acoustic speech characteristic.
  - 15. The system of claim 14 wherein said acoustic model generator analyzes all of said questions for a given demiphone model to determine a predominant contextual dominance characteristic for said given demiphone model.
  - 16. The system of claim 14 wherein each of said three states is associated with a different one of said decision trees, each of said three states having a separate contextual dominance characteristic.
  - 17. The system of claim 16 wherein a dominance characteristic of a middle state from said three states determines whether said demiphone acoustic models are configured as either said succeeding-dominant demiphone acoustic model or said preceding-dominant demiphone acoustic model.
  - 18. The system of claim 14 wherein said decision trees are implemented as triphone decision trees that are based upon triphone acoustic models corresponding to said demiphone acoustic models.
  - 19. The system of claim 18 wherein said triphone acoustic models are implemented with three triphone states that each incorporate acoustic contexts from both a preceding phone and a succeeding phone.
  - 20. The system of claim 1 wherein said acoustic models are utilized to implement a speech recognition dictionary for use by said speech recognition engine during said speech recognition procedures.

21. A method for implementing a speech recognition engine, comprising:
- utilizing demiphone acoustic models to perform speech recognition procedures, each of said demiphone acoustic models having three states that collectively form a preceding demiphone and a succeeding demiphone; and
  
  analyzing speech context information with an acoustic model generator to configure each of said demiphone acoustic models as either a succeeding-dominant demiphone acoustic model or a preceding-dominant demiphone acoustic model.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
- - 22. The method of claim 21 wherein said acoustic models represent phones from a phone set utilized by said speech recognition engine.
  - 23. The method of claim 21 wherein said speech context information includes a preceding context corresponding to an immediately preceding phone with respect to a current phone represented by one of said demiphone models.
  - 24. The method of claim 21 wherein said speech context information includes a succeeding context corresponding to an immediately succeeding phone with respect to a current phone represented by one of said demiphone models.
  - 25. The method of claim 21 wherein said succeeding-dominant demiphone model has a first state in said preceding demiphone, said succeeding-dominant demiphone model also having a second state and a third state in said succeeding demiphone.
  - 26. The method of claim 21 wherein said preceding-dominant demiphone model has a first state and a second state in said preceding demiphone, said preceding-dominant demiphone model also having a third state in said succeeding demiphone.
  - 27. The method of claim 21 wherein said demiphone models are configured to model speech characteristics by utilizing said succeeding-dominant demiphone models to emphasize succeeding phone contexts, and by utilizing said preceding-dominant demiphone models to emphasize preceding phone contexts.
  - 28. The method of claim 21 wherein said speech context information is identified by decision trees that correspond to said three states, said decision trees being generated to summarize linguistic properties and acoustic characteristics observed in a database of speech samples.
  - 29. The method of claim 21 wherein said succeeding-dominant demiphone has no states in said preceding demiphone, said succeeding-dominant demiphone model having said three states in said succeeding demiphone.
  - 30. The method of claim 21 wherein said preceding-dominant demiphone has zero states in said succeeding demiphone, said preceding-dominant demiphone model having said three states in said preceding demiphone.
  - 31. The method of claim 21 wherein a contextual dominance for each demiphone state from a given one of said demiphone acoustic models is determined by analyzing predominant contextual information in a triphone decision tree corresponding to said each demiphone state.
  - 32. The method of claim 21 wherein said preceding demiphone includes said speech context information only from a preceding phone with respect to one of said demiphone models that includes said preceding demiphone.
  - 33. The method of claim 21 wherein said succeeding demiphone includes said speech context information only from a succeeding phone with respect to one of said demiphone models that includes said succeeding demiphone.
  - 34. The method of claim 21 wherein said speech context information is identified by decision trees that each include a series of questions, said questions each corresponding to a different acoustic speech characteristic, said questions each also being used to identify a contextual dominance characteristic corresponding to said different acoustic speech characteristic.
  - 35. The method of claim 34 wherein said acoustic model generator analyzes all of said questions for a given demiphone model to determine a predominant contextual dominance characteristic for said given demiphone model.
  - 36. The method of claim 34 wherein each of said three states is associated with a different one of said decision trees, each of said three states having a separate contextual dominance characteristic.
  - 37. The method of claim 36 wherein a dominance characteristic of a middle state from said three states determines whether said demiphone acoustic models are configured as either said succeeding-dominant demiphone acoustic model or said preceding-dominant demiphone acoustic model.
  - 38. The method of claim 34 wherein said decision trees are implemented as triphone decision trees that are based upon triphone acoustic models corresponding to said demiphone acoustic models.
  - 39. The method of claim 38 wherein said triphone acoustic models are implemented with three triphone states that each incorporate acoustic contexts from both a preceding phone and a succeeding phone.
  - 40. The method of claim 21 wherein said acoustic models are utilized to implement a speech recognition dictionary for use by said speech recognition engine during said speech recognition procedures.

41. A system for implementing a speech recognition engine, comprising:
- means for performing speech recognition procedures, said means for performing speech recognition procedures each having three states that collectively form a preceding demiphone and a succeeding demiphone; and
  
  means for configuring each of said means for performing speech recognition procedures as either a succeeding-dominant demiphone acoustic model or a preceding-dominant demiphone acoustic model.

42. A system for implementing a speech recognition engine, comprising:
- demiphone acoustic models that each have three states that collectively form a succeeding demiphone and a preceding demiphone, said demiphone acoustic models all being configured in a succeeding-dominant configuration that has a first state forming said preceding demiphone, said succeeding-dominant configuration also having a second state and a third state forming said succeeding demiphone; and
  
  a speech recognition engine that utilizes said demiphone acoustic models to perform speech recognition procedures.

43. An electronic device comprising:
- an electronic data processor; and
  
  a speech recognition engine implemented by the electronic data processor;
  
  wherein the speech recognition engine comprises acoustic models, each acoustic model having three states, the three states being used to form a first demiphone and a second demiphone;
  
  wherein the first demiphone is based on a speech element immediately preceding a speech element being modeled, and the second demiphone is based on a speech element immediately succeeding the speech element being modeled;
  
  wherein for at least one of the acoustic models, the first demiphone is based on a first of the states and the second demiphone is based on the remaining two of the states; and
  
  wherein for at least one of the acoustic models, the first demiphone is based on two of the states and the second demiphone is based on the remaining one of the states.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.), Sony Electronics Inc. (Sony Group Corp.)
Inventors
Abrego, Gustavo Hernandez, Menendez-Pidal, Xavier, Olorenshaw, Lex S.

Granted Patent

US 7,467,086 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/254
CPC Class Codes

G10L 15/142 Hidden Markov Models [HMMs]

G10L 2015/022 Demisyllables, biphones or ...

Methodology for generating enhanced demiphone acoustic models for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

Methodology for generating enhanced demiphone acoustic models for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links