Speech recognition by concatenating fenonic allophone hidden Markov models in parallel among subwords

US 5,502,791 A
Filed: 09/01/1993
Issued: 03/26/1996
Est. Priority Date: 09/29/1992
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognizer comprising:

means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word;

means for retaining hidden Markov models respectively for one or more allophones of subwords of each speech transformation candidate;

dictionary means for retaining a plurality of candidate words to be recognized;

means for composing a speech model by concatenating each hidden Markov model for allophones of each speech transformation candidate in parallel among subwords in correspondence to a candidate word;

means for determining a probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting the candidate word corresponding to a speech model of a highest probability as a result of recognition.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Analysis of a word input from a speech input device 1 for its features is made by a feature extractor 4 to obtain a feature vector sequence corresponding to said word, or to obtain a label sequence by applying a further transformation in a labeler 8. Fenonic hidden Markov models for speech transformation candidates are combined with N-gram probabilities (where N is all integer greater than or equal to 2) to produce models of words. The recognizer determines the probability that the speech model composed for each candidate word would output the label sequence or feature vector sequence input as speech, and outputs the candidate word corresponding to the speech model having the highest probability to a display 19.

258 Citations

6 Claims

1. A speech recognizer comprising:
- means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word;
  
  means for retaining hidden Markov models respectively for one or more allophones of subwords of each speech transformation candidate;
  
  dictionary means for retaining a plurality of candidate words to be recognized;
  
  means for composing a speech model by concatenating each hidden Markov model for allophones of each speech transformation candidate in parallel among subwords in correspondence to a candidate word;
  
  means for determining a probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting the candidate word corresponding to a speech model of a highest probability as a result of recognition.

2. A speech recognizer comprising:
- means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word;
  
  means for retaining fenonic hidden Markov models;
  
  means for retaining the label sequence for each speech transformation candidate on which subwords of a word are transformed as speech;
  
  dictionary means for retaining a plurality of candidate words to be recognized;
  
  means for applying fenonic hidden Markov models of allophones of subwords of each speech transformation candidate in correspondence to said candidate words and concatenating the models for each speech transformation candidate in parallel among the subwords to compose a speech model;
  
  means for determining a probability of a speech model composed with regard to each candidate word so as to output the label sequence or feature vector sequence of said word inputted as speech, and outputting the candidate word corresponding to the speech model of a highest probability as a result of recognition.

3. A speech recognizer comprising:
- means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word;
  
  means for retaining hidden Markov models for each speech transformation candidate, the hidden Markov models comprising allophones of one or more subwords of the candidate, by assigning to each allophone an N-gram relation (N=an integer greater than or equal to
  
  2) with the speech transformation candidates of other preceding subwords in the word;
  
  dictionary means for retaining a plurality of candidate words to be recognized;
  
  means for concatenating each hidden Markov model for each speech transformation candidate in parallel among the subwords in correspondence to said candidate words and on the basis of said N-gram relation to compose a speech model;
  
  means for determining the probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting a candidate word corresponding to the speech model of the highest probability as a result of recognition.

4. A speech recognizer comprising:
- means for analyzing a word inputted as speech for its features and thus obtaining a label sequence or feature vector sequence corresponding to said word;
  
  means for retaining fenonic hidden Markov models for each speech transformation candidate, the hidden Markov models comprising allophones of one or more subwords of the candidate, by assigning to each allophone an N-gram relation (N=an integer greater than or equal to
  
  2) with the speech transformation candidates of other preceding subwords in the word;
  
  dictionary means for retaining a plurality of candidate words to be recognized;
  
  means for applying hidden Markov models to each speech transformation candidate in correspondence to said candidate words and on the basis of said N-gram relation, and concatenating each hidden Markov model for each of these speech transformation candidates in parallel among the subwords to compose a speech model;
  
  means for determining the probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting a candidate word corresponding to the speech model of the highest probability as a result of recognition.

5. A method of speech recognition comprising the steps of:
- retaining fenonic hidden Markov models for each speech transformation candidate, the hidden Markov models comprising allophones of one or more subwords of the candidate, by assigning to each allophone an N-gram relation (N=an integer greater than or equal to
  
  2) with the speech transformation candidates of other preceding subwords in the word;
  
  retaining label sequences for each speech transformation candidate on which the subwords of a word are transformed as speech;
  
  retaining a plurality of candidate words to be recognized;
  
  analyzing a word inputted as speech for its features and obtaining a label sequence or feature vector sequence corresponding to the word concerned;
  
  applying fenonic hidden Markov models to each speech transformation candidate in correspondence to said candidate words and on the basis of said N-gram relation;
  
  concatenating each fenonic hidden Markov model for each of these speech transformation candidates in parallel among the subwords to compose a speech model;
  
  determining the probability of a speech model composed with regard to each candidate word to output the label sequence or feature vector sequence of said word inputted as speech, and outputting a candidate word corresponding to the speech model of the highest probability as a result of recognition.
- View Dependent Claims (6)
- - 6. A method of training a speech model as set forth in claim 5 wherein said hidden Markov model consists of a fenonic hidden Markov model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Nishimura, Masafumi, Okochi, Masaaki
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Sartori, Michael A.

Application Number

US08/114,709
Time in Patent Office

937 Days
Field of Search

395/2, 395/2.64-2.66, 395/2.6-2.63, 381/41, 381/42, 381/43, 381/45
US Class Current

704/256
CPC Class Codes

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/187   Phonemic context, e.g. pron...

G10L 15/197   Probabilistic grammars, e.g...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2015/0631   Creating reference template...

Speech recognition by concatenating fenonic allophone hidden Markov models in parallel among subwords

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

258 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition by concatenating fenonic allophone hidden Markov models in parallel among subwords

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

258 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links