Method and apparatus for a time-synchronous tree-based search strategy

US 5,884,259 A
Filed: 02/12/1997
Issued: 03/16/1999
Est. Priority Date: 02/12/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition method for recognizing an entire utterance, for a system including an asynchronous detailed match procedure, said method comprising the step of performing a synchronous fast match process for said entire utterance prior to executing said detailed match procedure.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for using a tree structure to constrain a time-synchronous, fast search for candidate words in an acoustic stream is described. A minimum stay of three frames in each graph node visited is imposed by allowing transitions only every third frame. This constraint enables the simplest possible Markov model for each phoneme while enforcing the desired minimum duration. The fast, time-synchronous search for likely words is done for an entire sentence/utterance. The list of hypotheses beginning at each time frame is stored for providing, on-demand, lists of contender/candidate words to the asynchronous, detailed match phase of decoding.

79 Citations

View as Search Results

27 Claims

1. A speech recognition method for recognizing an entire utterance, for a system including an asynchronous detailed match procedure, said method comprising the step of performing a synchronous fast match process for said entire utterance prior to executing said detailed match procedure.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method as recited in claim 1, wherein said fast match process is performed in an iterative manner with an iteration performed for each of a plurality of frame triplets.
  - 3. A method as recited in claim 2, wherein each phoneme in a fast match graph is represented as a single state with a self loop.
  - 4. A method as recited in claim 1 wherein said fast match process proceeds backward from an end of said entire utterance towards a beginning of said entire utterance.
  - 5. A method as recited in claim 1, further comprising:
    - providing a fast match graph for a speech language vocabulary, wherein said fast match graph corresponds to a backward search, and wherein said graph has arcs having destinations exiting a given node and are stored as the successors to that node, while the sources of the incoming arcs are stored as its predecessors;
      
      storing the phoneme identity of each node in said fast match graph for use in a Viterbi search;
      
      storing an identity of each word formed by a group of phonemes; and
      
      invoking dynamic programming of said Viterbi search to enable construction of lists of potential words at each of said plurality of frame triplets.

6. A speech recognition system for recognizing an entire utterance and having means for receiving and executing a detailed match procedure, said system comprising:
- means for performing a synchronous fast match on said entire utterance prior to asynchronously executing said detailed match procedure.
- View Dependent Claims (7, 8, 9)
- - 7. A system as recited in claim 6, wherein said fast match process is performed in an iterative manner with an iteration performed for each of a plurality of frame triplets.
  - 8. A system as recited in claim 7, wherein each phoneme in a fast match graph is represented as a single state with a self loop.
  - 9. A system as recited in claim 6, wherein said fast match process proceeds backward from an end of said entire utterance towards a beginning of said entire utterance.

10. A speech recognition method for recognizing an entire utterance segmented into a plurality of frames and based upon a speech language vocabulary, said method comprising:
- receiving an utterance;
  
  forming an acoustic signal of a plurality of phoneme constituents making up said utterance;
  
  combining three of said frames to form a frame triplet;
  
  initiating a fast match for said utterance by forming a phoneme probability matrix table giving probabilities of each phoneme versus an acoustic observation time, wherein said phoneme matrix table has each column corresponding to a single frame;
  
  multiplying together a group of three individual probabilities of the three frames that make up each said triplet to produce a joint probability of the triplet for each particular said phoneme and triplet;
  
  forming a triplet probability matrix representing a complete observation time of said utterance and having a row for each phoneme of said utterance and a column for each said triplet; and
  
  invoking a synchronous iterative process to perform the fast match for the entire utterance in steps of frame triplets.

11. A speech recognition method for recognizing an entire utterance segmented into a plurality of frames and based upon a speech language vocabulary, said method comprising:
- receiving an utterance;
  
  forming an acoustic signal of a plurality of phoneme constituents making up said utterance;
  
  combining three of said frames to form a frame triplet;
  
  initiating a fast match for said utterance by forming a phoneme probability matrix table giving probabilities of each phoneme versus an acoustic observation time, wherein said phoneme matrix table has each column corresponding to a single frame;
  
  multiplying together a group of three individual probabilities of the three frames that make up each said triplet to produce a joint probability of the triplet for each particular said phoneme and triplet;
  
  forming a triplet probability matrix representing a complete observation time of said utterance and having a row for each phoneme of said utterance and a column for each said triplet;
  
  invoking a synchronous iterative process to perform the fast match for the entire utterance in steps of frame triplets;
  
  initializing to the root node and to the end of the utterance;
  
  determining for each potentially active node `n` at a next time τ
  
  , a maximum of a node at time τ
  
  +3 which maximizes the product of a score of said node with the transition probability from said node into a potentially active node;
  
  computing the score s(τ
  
  ,n) of the potentially active node given by a product of said maximum and an observation probability at a current time of the phoneme identified with state `n`;
  
  determining a maximum score of the node scores at the current time;
  
  comparing the score for each potentially active node to said maximum score;
  
  including in a next active list, only active nodes for which the difference between the log of said active node score and the log of the maximum score is less than a user-specified range constant; and
  
  adding to a matrix of contender words at an appropriate time, a new node placed in said next active list which corresponds to a beginning of a whole word, and a new node score of said new node.
- View Dependent Claims (12, 13, 14)
- - 12. A method as recited in claim 11, further comprising the step of making available said matrix of contender words to a detailed match process.
  - 13. A method as recited in claim 11, wherein said new node score is obtained by multiplying an unnormalized backward score of said new node by an unnormalized forward score of a root node.
  - 14. A method as recited in claim 13, wherein said unnormalized forward score of said root node is obtained from a detailed match procedure.

15. A speech recognition method for recognizing an entire utterance segmented into a plurality of frames and based upon a speech language vocabulary, said method comprising:
- receiving an utterance;
  
  forming an acoustic signal of a plurality of phoneme constituents making up said utterance;
  
  combining three of said frames to form a frame triplet;
  
  initiating a fast match for said utterance by forming a phoneme Probability matrix table giving probabilities of each phoneme versus an acoustic observation time, wherein said phoneme matrix table has each column corresponding to a single frame;
  
  multiplying together a group of three individual probabilities of the three frames that make up each said triplet to produce a joint probability of the triplet for each particular said phoneme and triplet;
  
  forming a triplet probability matrix representing a complete observation time of said utterance and having a row for each phoneme of said utterance and a column for each said triplet;
  
  invoking a synchronous iterative process to Perform the fast match for the entire utterance in steps of frame triplets;
  
  forming a `next potentials list` from the `current active list` if an utterance beginning has not been reached;
  
  computing and storing a score for each node in the `potentials list`;
  
  finding and storing a current highest node score;
  
  choosing and using an inclusion range parameter to form the `next active list`;
  
  entering and storing active list entries for each triplet in a `matrix of contender words`;
  
  decrementing to a next backward frame triplet;
  
  modifying the `current active list` to correspond with the next active list; and
  
  stopping the fast match process if the utterance beginning has been reached.

16. A speech recognition method for recognizing an entire utterance, for a system including a fast match process and a detailed match procedure, wherein said fast match process proceeds backward from an end of said entire utterance towards a beginning of said entire utterance.

17. A speech recognition method comprising:
- recognizing an utterance by performing an asynchronous detailed match and a synchronous fast match, wherein said fast match is performed in an iterative manner with an iteration performed for each of a plurality of frames.
- View Dependent Claims (18, 19)
- - 18. A method as recited in claim 17, further comprising representing each phoneme in a fast match graph as a single state with a self loop.
  - 19. A method as recited in claim 17, further comprising forming said plurality of frames comprised of a frame triplet.

20. A speech recognition system for recognizing an utterance, said system comprising a fast match process which proceeds backward from an end of said utterance towards a beginning of said utterance.
- View Dependent Claims (21, 22, 23)
- - 21. A speech recognition system as recited in claim 20, including a fast match process, wherein said fast match process is performed in an iterative manner with an iteration performed for each of a plurality of frames.
  - 22. A system as recited in claim 21, wherein each phoneme in a fast match graph is represented as a single state with a self loop.
  - 23. A system as recited in claim 21, wherein said plurality of frames comprises a frame triplet.

24. A speech recognition apparatus comprising:
- means for synchronously performing a fast match on an entire utterance; and
  
  means for executing a detailed match procedure asynchronously on said entire utterance so as to recognize said entire utterance.
- View Dependent Claims (25)
- - 25. A speech recognition apparatus as recited in claim 24, further comprising means for receiving said entire utterance.

26. A speech recognition method comprising:
- multiplying phoneme probabilities together in groups of three frames, each group forming a triplet, andemploying each triplet in a fast match process using a non-replicated one state model.
- View Dependent Claims (27)
- - 27. A speech recognition method as recited in claim 26, further comprising constructing a matrix of phoneme probabilities versus time triplets.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Eide, Ellen Marie, Bahl, Lalit Rai
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/798,011
Time in Patent Office

762 Days
Field of Search

704/231, 704/241, 704/253, 704/243, 704/240, 704/242, 704/252, 704/256
US Class Current

704/252
CPC Class Codes

G10L 15/08 Speech classification or se...

Method and apparatus for a time-synchronous tree-based search strategy

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

79 Citations

27 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for a time-synchronous tree-based search strategy

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

79 Citations

27 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others