Speech recognition system having word-based and phoneme-based recognition means

US 5,315,689 A
Filed: 12/21/1992
Issued: 05/24/1994
Est. Priority Date: 05/27/1988
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system comprising:

parameter extracting means for analyzing input speech and extracting a speech parameter from the input speech;

first storage means for storing a word reference pattern;

first word recognizing means for segmenting the speech parameter extracted by said parameter extracting means into units of words and outputting a word speech pattern corresponding to one of the words, and for performing word recognition by matching the word speech pattern with the word reference pattern stored in said first storage means and outputting a first word-recognition result;

second storage means for storing at least one word constituent element reference pattern;

second word recognizing means for segmenting the speech parameter into units of word constituent elements and outputting a word constituent element speech pattern corresponding to one of the word constituent elements, for performing recognition of each of the word constituent elements by matching the word constituent element speech pattern with the word constituent element reference pattern stored in said second storage means and outputting a series of recognized word constituent elements, and for performing word recognition on the basis of the series of recognized word constituent elements and outputting a second word recognition result;

recognition result output means connected to said first and second word recognizing means, for obtaining a final recognition result from the first and second word recognition results from said first and second word recognizing means, said recognition result output means including means for increasing a value representing a contribution of the first word recognition result to the final recognition result to be larger than a value representing a contribution of the second word recognition result to the final recognition result in accordance with an increase in the number of word speech patterns used in learning for the first recognition and means for determining the final recognition result in accordance with the increasing value representing the contribution of the first word recognition result thereto; and

learning means for executing learning for forming a new word reference pattern from the recognition result obtained by said recognition result output means and the word speech pattern, said learning means transferring the new word reference pattern to said first storage means, to store it therein and increase the number of word speech patterns being stored in said first storage means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system includes a parameter extracting section for extracting a speech parameter of input speech, a first recognizing section for performing recognition processing by word-based matching, and a second recognizing section for performing word recognition by matching in units of word constituent elements. The first word recognizing section segments the speech parameter in units of words to extract a word speech pattern and performs word recognition by matching the word speech pattern with a predetermined word reference pattern. The second word recognizing section performs recognition in units of word constituent elements by using the extracted speech parameter and performs word recognition on the basis of candidates of an obtained word constituent element series. The speech recognition system further includes a recognition result output section for obtaining a recognition result on the basis of the word recognition results obtained by the first and second recognizing sections and outputting the obtained recognition result. The speech recognition system further includes a word reference pattern learning section for performing learning of a word reference pattern on the basis of the recognition result obtained by the recognizing result output section and the word speech pattern.

Citations

20 Claims

1. A speech recognition system comprising:
- parameter extracting means for analyzing input speech and extracting a speech parameter from the input speech;
  
  first storage means for storing a word reference pattern;
  
  first word recognizing means for segmenting the speech parameter extracted by said parameter extracting means into units of words and outputting a word speech pattern corresponding to one of the words, and for performing word recognition by matching the word speech pattern with the word reference pattern stored in said first storage means and outputting a first word-recognition result;
  
  second storage means for storing at least one word constituent element reference pattern;
  
  second word recognizing means for segmenting the speech parameter into units of word constituent elements and outputting a word constituent element speech pattern corresponding to one of the word constituent elements, for performing recognition of each of the word constituent elements by matching the word constituent element speech pattern with the word constituent element reference pattern stored in said second storage means and outputting a series of recognized word constituent elements, and for performing word recognition on the basis of the series of recognized word constituent elements and outputting a second word recognition result;
  
  recognition result output means connected to said first and second word recognizing means, for obtaining a final recognition result from the first and second word recognition results from said first and second word recognizing means, said recognition result output means including means for increasing a value representing a contribution of the first word recognition result to the final recognition result to be larger than a value representing a contribution of the second word recognition result to the final recognition result in accordance with an increase in the number of word speech patterns used in learning for the first recognition and means for determining the final recognition result in accordance with the increasing value representing the contribution of the first word recognition result thereto; and
  
  learning means for executing learning for forming a new word reference pattern from the recognition result obtained by said recognition result output means and the word speech pattern, said learning means transferring the new word reference pattern to said first storage means, to store it therein and increase the number of word speech patterns being stored in said first storage means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A system according to claim 1, wherein said first word recognizing means includes storage means for storing a word reference pattern, and said learning means includes means for updating a storage content of said storage means.
  - 3. A system according to claim 1, wherein said second word recognizing means uses phonemes as units of a word constituent element.
  - 4. A system according to claim 1, wherein said second word recognizing means uses syllables as units of a word constituent element.
  - 5. A system according to claim 1, wherein said recognizing result output means includes means for weighting the first and second recognition results in accordance with recognition characteristics of said first and second word recognizing means, respectively, and outputting a synthetic recognition result.
  - 6. A system according to claim 1, wherein said learning means includes input means for designating category names of the recognized input speech as learning word speech patterns and means for collecting the learning word speech patterns in response to a designation input from said input means.
  - 7. A system according to claim 1, wherein said learning means includes means for storing non-learned recognition information in said first and second word recognizing means and said recognition result output means, monitor means for monitoring the amount of non-learned recognition information, and means for executing learning on the basis of the recognition information in response to said monitor means every time the amount of non-learned recognition information reaches a predetermined value.
  - 8. A system according to claim 1, wherein said predetermined determination rule is given as:
    - space="preserve" listing-type="equation">S.sup.(l) =α
      
      .sup.(l) P1.sup.(l) +(1-α
      
      .sup.(l))P2.sup.(l)
      wherein S.sup.(l) is a final similarity value of the category l, α
      
      .sup.(l) is a parameter representing a contribution of word recognition by word-based matching, and P1.sup.(l) and P2.sup.(l) are converted similarity values based upon word recognition of word- and phoneme-based matching, respectively.

9. A speech recognition system comprising:
- parameter extracting means for analyzing input speech and extracting a speech parameter from the input speech;
  
  first storage means for storing a word reference speech pattern;
  
  first word recognizing means for segmenting the speech parameter extracted by said parameter extracting means into units of words and outputting a word speech pattern corresponding to one of the words, and for performing word recognition by matching the word speech pattern with the word reference pattern stored in said first storage means and outputting a first word-recognition result;
  
  second storage means for storing at least one word constituent element reference pattern;
  
  second word recognizing means for segmenting the speech parameter into units of word constituent elements and outputting a word constituent element speech pattern corresponding to one of the word constituent elements, for performing recognition of each of the word constituent elements by matching the word constituent element speech pattern with the word constituent element reference pattern stored in said second storage means and outputting a series of recognized word constituent elements, and for performing word recognition on the basis of the series of recognized word constituent elements and outputting a second word recognition result;
  
  recognition result output means connected to said first and second word recognizing means, for obtaining a final recognition result from the first and second word recognition results of said first and second word recognizing means; and
  
  learning means for executing learning of the word reference pattern on the basis of the recognition result obtained by said recognition result output means and a word speech pattern extracted in the course of the recognition processing, whereinsaid recognition result output means includes means for increasing a value representing a contribution of the first word recognition result to the final recognition result to be larger than a value representing a contribution of the second word recognition result to the final recognition result as the amount of learning executed by said learning means increases and means for determining the final recognition result in accordance with the increasing value representing the contribution of the first word recognition result thereto.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 10. A system according to claim 9, wherein said first and second word recognizing means each output discrimination results and said recognition result output means includes means for determining the amount of learning executed by said learning means, and contribution control means for controlling values representing contributions of the first and second recognition results to the final recognition result, so as to increase a relative weight of the discrimination result obtained by said second word recognizing means to the discrimination result obtained by said first word recognizing means as the amount of learning executed increases.
  - 11. A system according to claim 10, wherein said means for determining an amount of learning includes means for discriminating an accumulation value of the number of word speech patterns supplied for learning of said learning means.
  - 12. A system according to claim 10, wherein said contribution control means includes means for controlling priorities of recognition results which are obtained by said first and second word recognizing means so as to obtain a final recognition result.
  - 13. A system according to claim 10, wherein said contribution control means includes means for selecting a processing algorithm from a plurality of processing algorithms having substantially different priorities for recognition results, which are obtained by said first and second word recognizing means so as to obtain a final recognition result, in accordance with a discrimination result obtained by said process discriminating means.
  - 14. A system according to claim 9, wherein said first word recognizing means includes means for storing a word reference pattern and said learning means includes means for updating a storage content of said storage means.
  - 15. A system according to claim 9, wherein said word constituent elements comprise phonemes.
  - 16. A system according to claim 9, wherein said word constituent elements comprise syllables.
  - 17. A system according to claim 9, wherein said recognition result output means includes means for obtaining a recognition result by weighting the first and second recognition results obtained by said first and second word recognizing means in accordance with recognition characteristics of said first and second word recognizing means, respectively.
  - 18. A system according to claim 9, wherein said learning means includes input means for designation a category name of the recognized input speech as a learning word speech pattern and means for collecting the word speech patterns in response to a designation input from said input means.
  - 19. A system according to claim 9, wherein said learning means includes means for storing recognition information in said first and second word recognizing means and said recognition result output means, counting means for counting an accumulation value of the number of word speech patterns subjected to recognition processing, and said means for executing learning executes learning on the basis of the recognition information in response to said counting means every time the accumulation value reaches a predetermined value.

20. A speech recognition system comprising:
- parameter extracting means for analyzing input speech and extracting a speech parameter from the input speech;
  
  first reference pattern output means for outputting a word reference pattern;
  
  first word recognizing means for segmenting the speech parameter extracted by said parameter extracting means into units of words and outputting a word speech pattern corresponding to one of the words, and for performing word recognition by matching the word speech pattern with the word reference pattern output from said first reference pattern output means and outputting a first word-recognition result;
  
  second reference pattern output means for outputting a word constituent element reference pattern;
  
  second word recognizing means for segmenting the speech parameter into units of word constituent elements, for outputting a word constituent element speech pattern corresponding to one of the word constituent elements, for performing recognition of each of the word constituent elements by matching the word constituent element speech pattern with the word constituent element reference pattern output from said second reference pattern output means to obtain a series of recognized word constituent elements, and for performing word recognition on the basis of a series of recognized word constituent elements and outputting a second word recognition result;
  
  recognition result output means connected to said first and second word recognizing means, for obtaining a final recognition result from the first and second word recognition results from said first and second word recognizing means, said recognition result output means including means for increasing a value representing a contribution of the first word recognition result to the final recognition result to be larger than a value representing a contribution of the second word recognition result to the final recognition result as the number of word speech patterns used in learning for the first word recognition increases and means for determining the final recognition result in accordance with the increasing value representing the contribution of the first word recognition result thereto; and
  
  learning means for executing learning for forming a new word reference pattern from the final recognition result obtained by said recognition result output means and the word speech pattern, thereby increasing the number of word reference patterns.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Kanazawa, Hiroshi, Takebayashi, Yoichi
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/996,859
Time in Patent Office

519 Days
Field of Search

381/41-47, 395/2.47, 395/2.6
US Class Current

704/238
CPC Class Codes

G10L 15/00 Speech recognition G10L17/0...

Speech recognition system having word-based and phoneme-based recognition means

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system having word-based and phoneme-based recognition means

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links