Method, system, and apparatus for speech recognition

US 20010039492A1
Filed: 04/30/2001
Published: 11/08/2001
Est. Priority Date: 05/02/2000
Status: Active Grant

First Claim

Patent Images

1. A speech recognition system comprising:

correspondence information, said correspondence information storing a correspondence between recognized words and a plurality of speech element arrays expressing pronunciation of said recognized words;

said speech recognition system recognizing a recognizable word from a received user spoken utterance by comparing a speech element array generated from said user spoken utterance with said plurality of speech element arrays in said correspondence information;

wherein, in a dialog of a single person occurring within a certain period of time, said generated speech element array corresponds to one of said plurality of speech element arrays, a pronunciation prediction probability corresponding to said one of said plurality of speech element arrays is lowered, said pronunciation prediction probability being different from said generated speech element array.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention can be used to improve speech recognition accuracy, especially with regard to characters, words and the like which can correspond to a plurality of readings. The same person can be apt to maintain the same reading in the same conversation. For example, a person who pronounced “7” “shichi” is apt to pronounce “shichi” consistently in the conversation. By utilizing this tendency, recognition from the second time is executed after reducing a recognition probability corresponding to the reading, which is not used by the person in the first response of the conversation. In the case where a system repeats a recognition result by speech synthesis, the system repeats the recognition result corresponding to the reading of a speaker that is already recognized. For example, when the speaker pronounced “7” “shichi”, the system pronounces “shichi” at the time of repetition.

Citations

36 Claims

1. A speech recognition system comprising:
- correspondence information, said correspondence information storing a correspondence between recognized words and a plurality of speech element arrays expressing pronunciation of said recognized words;
  
  said speech recognition system recognizing a recognizable word from a received user spoken utterance by comparing a speech element array generated from said user spoken utterance with said plurality of speech element arrays in said correspondence information;
  
  wherein, in a dialog of a single person occurring within a certain period of time, said generated speech element array corresponds to one of said plurality of speech element arrays, a pronunciation prediction probability corresponding to said one of said plurality of speech element arrays is lowered, said pronunciation prediction probability being different from said generated speech element array.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The speech recognition system according to claim 1, wherein:
    - different speech element arrays expressing pronunciation for a single recognized word include a number corresponding to a previously measured pronunciation prediction probability and a recognized word corresponding to said previously measured pronunciation prediction probability.
  - 3. The speech recognition system of claim 1, wherein said certain period of time is a period of time for a continued dialog.
  - 4. The speech recognition system of claim 1, wherein said certain period of time is a period of time including a plurality of dialogs in one day.
  - 5. The speech recognition system of claim 1, further comprising:
    - means for detecting erroneously recognized words by referring a speaker to at least a part of said recognized words; and
      
      means for replacing one of said erroneously recognized words with a recognizable word which can be recognized as said one of said erroneously recognized words.
  - 6. The speech recognition system of claim 2, further comprising:
    - means for detecting erroneously recognized words by referring a speaker to at least a part of said recognized words; and
      
      means for replacing one of said erroneously recognized words with a recognizable word which can be recognized as said one of said erroneously recognized words.
  - 7. The speech recognition system of claim 1, further comprising:
    - means for replacing a recognized word which corresponds to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel, when a number of recognized words does not conform to a previously registered number in said speech recognition system.
  - 8. The speech recognition system of claim 2, further comprising:
    - means for replacing a recognized word which corresponds to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel, when a number of recognized words does not conform to a previously registered number in said speech recognition system.
  - 9. The speech recognition system of claim 5, further comprising:
    - means for replacing a recognized word which corresponds to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel, when a number of recognized words does not conform to a previously registered number in said speech recognition system.
  - 10. The speech recognition system of claim 6, further comprising:
    - means for replacing a recognized word which corresponds to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel, when a number of recognized words does not conform to a previously registered number in said speech recognition system.
  - 11. The speech recognition system of claim 7, further comprising:
    - means for replacing a recognized word corresponding to a speech element having one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.
  - 12. The speech recognition system of claim 8, further comprising:
    - means for replacing a recognized word corresponding to a speech element having one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.
  - 13. The speech recognition system of claim 9, further comprising:
    - means for replacing a recognized word corresponding to a speech element having one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.
  - 14. The speech recognition system of claim 10, further comprising:
    - means for replacing a recognized word corresponding to a speech element having one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.

15. A speech recognition method for use within a dialog of a single person, said dialog occurring in a certain period of time, said method comprising:
- receiving a first user spoken utterance and generating a first speech element array from said first user spoken utterance;
  
  searching correspondence information, said correspondence information associating recognizable words with a plurality of speech element arrays expressing pronunciation of said recognizable words;
  
  generating a first recognized word by comparing said first speech element array and said plurality of speech element arrays in said correspondence information;
  
  lowering a pronunciation prediction probability of one of said plurality of speech element arrays which differs from said first speech element array, wherein said one of said plurality of speech element arrays is made to correspond to said first speech element array;
  
  receiving a second user spoken utterance and generating a second speech element array from said second user spoken utterance;
  
  searching said correspondence information comprising said lowered pronunciation prediction probability; and
  
  generating a second recognized word by comparing said second speech element array and said plurality of speech element arrays in said correspondence information.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The speech recognition method of claim 15, wherein said correspondence information comprises one of said plurality of speech element arrays having a number corresponding to a measured pronunciation prediction probability corresponding to one of said recognizable words.
  - 17. The speech recognition method of claim 15, wherein said certain period of time is a period of time for a continued dialog.
  - 18. The speech recognition method of claim 15, wherein said certain period of time is a period of time including a plurality of dialogs in one day.
  - 19. The speech recognition method of claim 15, further comprising:
    - determining one of said recognized words to be erroneous by referring a speaker to at least part of said one of said recognized words; and
      
      replacing said erroneous word with a different recognizable word, said different recognizable word capable of being erroneously recognized as said erroneous word.
  - 20. The speech recognition method of claim 16, further comprising:
    - determining one of said recognized words to be erroneous by referring a speaker to at least part of said one of said recognized words; and
      
      replacing said erroneous word with a different recognizable word, said different recognizable word capable of being erroneously recognized as said erroneous word.
  - 21. The speech recognition method of claim 15, further comprising:
    - replacing one of said recognized words corresponding to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel wherein a number of said generated words does not conform to a previously registered number in said speech recognition system.
  - 22. The speech recognition method of claim 16, further comprising:
    - replacing one of said recognized words corresponding to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel wherein a number of said generated words does not conform to a previously registered number in said speech recognition system.
  - 23. The speech recognition method of claim 19, further comprising:
    - replacing one of said recognized words corresponding to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel wherein a number of pet said generated words does not conform to a previously registered number in said speech recognition system.
  - 24. The speech recognition method of claim 19, further comprising:
    - replacing one of said recognized words corresponding to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel wherein a number of said generated words does not conform to a previously registered number in said speech recognition system.
  - 25. The speech recognition method of claim 21, further comprising:
    - replacing a recognized word corresponding to a speech element comprising one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.
  - 26. The speech recognition method of claim 22, further comprising:
    - replacing a recognized word corresponding to a speech element comprising one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.
  - 27. The speech recognition method of claim 23, further comprising:
    - replacing a recognized word corresponding to a speech element comprising one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.
  - 28. The speech recognition method of claim 24, further comprising:
    - replacing a recognized word corresponding to a speech element comprising one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.

29. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
- receiving a first user spoken utterance and generating a first speech element array from said first user spoken utterance;
  
  searching correspondence information, said correspondence information comprising a correspondence between recognizable words and a plurality of speech element arrays expressing pronunciation of said recognizable words;
  
  generating a recognized word by comparing said first speech element array and said plurality of speech element arrays in said correspondence information; and
  
  lowering a pronunciation prediction probability of one of said plurality of speech element arrays which differs from said first speech element array, wherein said one of said plurality of speech element arrays is made to correspond to said first speech element array.

30. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
- receiving a first user spoken utterance and generating a first speech element array from said first user spoken utterance;
  
  searching correspondence information, said correspondence information associating recognizable words with a plurality of speech element arrays expressing pronunciation of said recognizable words;
  
  generating a first recognized word by comparing said first speech element array and said plurality of speech element arrays in said correspondence information;
  
  lowering a pronunciation prediction probability of one of said plurality of speech element arrays which differs from said first speech element array, wherein said one of said plurality of speech element arrays is made to correspond to said first speech element array;
  
  receiving a second user spoken utterance and generating a second speech element array from said second user spoken utterance;
  
  searching said correspondence information comprising said lowered pronunciation prediction probability; and
  
  generating a second recognized word by comparing said second speech element array and said plurality of speech element arrays in said correspondence information.
- View Dependent Claims (31, 32, 33, 34, 35, 36)
- - 31. The machine readable storage of claim 30, wherein said correspondence information comprises one of said plurality of speech element arrays having a number corresponding to a measured pronunciation prediction probability corresponding to one of said recognizable words.
  - 32. The machine readable storage of claim 30, wherein said certain period of time is a period of time for a continued dialog.
  - 33. The machine readable storage of claim 30, wherein said certain period of time is a period of time including a plurality of dialogs in one day.
  - 34. The machine readable storage of claim 30, further comprising:
    - determining one of said recognized words to be erroneous by referring a speaker to at least part of said one of said recognized words; and
      
      replacing said erroneous word with a different recognizable word, said different recognizable word capable of being erroneously recognized as said erroneous word.
  - 35. The machine readable storage of claim 30, further comprising:
    - replacing one of said recognized words corresponding to a speech element comprising one syllable with a long vowel with a previously recognized word comprising one syllable with a short vowel corresponding to said long vowel wherein a number of said generated words does not conform to a previously registered number in said speech recognition system.
  - 36. The machine readable storage of claim 35, further comprising:
    - replacing a recognized word corresponding to a speech element comprising one syllable with a short vowel with another previously recognized word corresponding to one syllable with a long vowel, said long vowel corresponding to said short vowel.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Nemoto, Kazuo

Granted Patent

US 6,968,310 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/0631   Creating reference template...

G10L 2015/227   of the speaker; Human-fact...

G10L 2015/228   of application context

Method, system, and apparatus for speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Method, system, and apparatus for speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links