Speech recognition method using a two-pass search

US 5,515,475 A
Filed: 06/24/1993
Issued: 05/07/1996
Est. Priority Date: 06/24/1993
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition method comprising the stepsgenerating a first set of allophone models for use with acoustic parameter vectors of a first type;

generating a second set of allophone models for use with acoustic parameter vectors of a second type;

providing a network representing a recognition vocabulary, wherein each branch of the network is one of the allophone models and each complete path through the network is a sequence of models representing a word in the recognition vocabulary;

analyzing an unknown utterance to generate a frame sequence of acoustic parameter vectors for each of the first and second types of acoustic parameter vectors;

generating a reduced trellis for determining a path through the network having a highest likelihood;

computing model distances for each frame of acoustic parameter vectors of the first type for all allophone models of the first set;

finding a maximum model distance for each model of the first set;

updating the reduced trellis for every frame assuming each allophone model is one-state model with a minimum duration of two frames and a transition probability equal to its maximum model distance;

sorting end values from the reduced trellis of each path through the vocabulary network;

choosing a first plurality of candidates for recognition having the highest end values;

rescoring the first plurality of candidates using a full viterbi method trellis corresponding to the vocabulary network with the model distances computed for the first set of allophone models;

sorting candidates by score in descending order;

choosing a second plurality of candidates smaller than the first plurality from the first plurality, for further rescoring using the second set of allophone models and second type of acoustic parameter vectors;

finding allophone segmentation using the first type of acoustic parameter vectors to identify frames for model distance computations for the second type of acoustic parameter vectors;

computing model distances for the frames of acoustic parameter vectors of the second type identified for the allophone models of the second set found in the second plurality of candidates;

rescoring the second plurality of candidates using the Viterbi method with the model distances computed for the allophone models of the second set; and

comparing the second plurality of candidates'"'"' scores for acoustic parameter vectors of first and second types to select a recognition candidate.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of recognizing speech comprises searching a vocabulary of words for a match to an unknown utterance. Words in the vocabulary are represented by concatenated allophone models and the vocabulary is represented as a network. On a first pass of the search, a one-state duration constrained model is used to search the vocabulary network. The one-state model has as its transition probability the maximum observed transitional probability (model distance) of the unknown utterance for the corresponding allophone model. Words having top scores are chosen from the first pass search and, in a second pass of the search, rescored using a full Viterbi trellis with the complete allophone models and model distances. The rescores are sorted to provide a few top choices. Using a second set of speech parameters these few top choices are again rescored. Comparison of the scores using each set of speech parameters determines a recognition choice. Post processing is also possible to further enhance recognition accuracy. Test results indicate that the two-pass search provides approximately the same recognition accuracy as a full Viterbi search of the vocabulary network.

310 Citations

8 Claims

1. A speech recognition method comprising the stepsgenerating a first set of allophone models for use with acoustic parameter vectors of a first type;
- generating a second set of allophone models for use with acoustic parameter vectors of a second type;
  
  providing a network representing a recognition vocabulary, wherein each branch of the network is one of the allophone models and each complete path through the network is a sequence of models representing a word in the recognition vocabulary;
  
  analyzing an unknown utterance to generate a frame sequence of acoustic parameter vectors for each of the first and second types of acoustic parameter vectors;
  
  generating a reduced trellis for determining a path through the network having a highest likelihood;
  
  computing model distances for each frame of acoustic parameter vectors of the first type for all allophone models of the first set;
  
  finding a maximum model distance for each model of the first set;
  
  updating the reduced trellis for every frame assuming each allophone model is one-state model with a minimum duration of two frames and a transition probability equal to its maximum model distance;
  
  sorting end values from the reduced trellis of each path through the vocabulary network;
  
  choosing a first plurality of candidates for recognition having the highest end values;
  
  rescoring the first plurality of candidates using a full viterbi method trellis corresponding to the vocabulary network with the model distances computed for the first set of allophone models;
  
  sorting candidates by score in descending order;
  
  choosing a second plurality of candidates smaller than the first plurality from the first plurality, for further rescoring using the second set of allophone models and second type of acoustic parameter vectors;
  
  finding allophone segmentation using the first type of acoustic parameter vectors to identify frames for model distance computations for the second type of acoustic parameter vectors;
  
  computing model distances for the frames of acoustic parameter vectors of the second type identified for the allophone models of the second set found in the second plurality of candidates;
  
  rescoring the second plurality of candidates using the Viterbi method with the model distances computed for the allophone models of the second set; and
  
  comparing the second plurality of candidates'"'"' scores for acoustic parameter vectors of first and second types to select a recognition candidate.
- View Dependent Claims (2, 3, 4)
- - 2. A speech recognition method as claimed in claim 1, wherein the acoustic parameter vectors of a first type include Cepstrum parameter vectors.
  - 3. A speech recognition method as claimed in claim 2, wherein the acoustic parameter vectors of the second type include LSP parameter vectors.
  - 4. A method of speech recognition as claimed in claim 1 further comprising the steps of:
    - identifying, with an endpointer, the beginning of words or phrases prior to the step of generating the reduced trellis; and
      
      identifying, with the endpointer, the end of speech to stop the updating of the reduced trellis.

5. A speech recognition method comprising the steps of:
- generating a first set of allophone models for use with Cepstrum parameter vectors;
  
  generating a second set of allophone models for use with (line spectral pair) parameter vectors;
  
  generating a network representing a recognition vocabulary, wherein each branch of the network is one of the allophone models and each complete path through the network is a sequence of models representing a word in the recognition vocabulary;
  
  generating a reduced trellis for determining a path through the network having a highest likelihood;
  
  analyzing an unknown utterance to generate a frame sequence of both Cepstrum and LSP parameter vectors;
  
  computing of Cepstrum model distances for each frame for all Cepstrum allophone models;
  
  finding a maximum model distance for each model;
  
  updating the reduced trellis for every frame assuming a one-state model with a minimum duration of two frames and a transition probability equal to its maximum model distance;
  
  sorting end values of each vocabulary network path for the reduced trellis;
  
  choosing top n values to provide n candidates for recognition;
  
  rescoring the top n candidates using a full Viterbi method trellis with the computed model distances;
  
  sorting candidates by score in descending order;
  
  choosing the top m candidates for further rescoring using the LSP parameter vectors, where m is less than n;
  
  finding allophone segmentation using Cepstrum parameters to identify frames for model distance computations for LSP parameters;
  
  computing LSP model distances for frames identified and for the LSP models found in the m candidates;
  
  rescoring the m candidates using the Viterbi method with the LSP model distances computed; and
  
  comparing the top m candidates'"'"' scores for Cepstrum and LSP parameters to select a recognition candidate.
- View Dependent Claims (6, 7, 8)
- - 6. A method of speech recognition as claimed in claim 5 further comprising the steps of:
    - identifying, with an endpointer, the beginning of words or phrases prior to the step of generating the reduced trellis; and
      
      identifying, with the endpointer, the end of speech to stop the updating of the reduced trellis.
  - 7. A method of speech recognition as claimed in claim 6 wherein the step of comparing the top m candidates includes the steps of multiplying together the probabilities resulting from Cepstrum and LSP parameters for each respective candidate and choosing the candidate with the highest combined probability as the recognition candidate.
  - 8. A method of speech recognition as claimed in claim 7 wherein the frames are constrained to be within 18 frames of the segment boundaries found using the Cepstrum parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Clearinghouse LLC (RPX Corporation)
Original Assignee
Northern Telecom Limited (Nortel Networks Corporation)
Inventors
Gupta, Vishwa N., Lennig, Matthew
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Dorvil, Richemond

Application Number

US08/080,543
Time in Patent Office

1,048 Days
Field of Search

395/2, 395/2.51, 395/2.62, 395/2.47, 395/2.63, 395/2.52, 395/2.62, 381/41, 381/42, 381/43
US Class Current

704/242
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/142 Hidden Markov Models [HMMs]

Speech recognition method using a two-pass search

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

310 Citations

8 Claims

Specification

Use Cases

Quick Links

Others

Speech recognition method using a two-pass search

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

310 Citations

8 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others