Speech recognition providing multiple outputs

US 5,388,183 A
Filed: 09/30/1991
Issued: 02/07/1995
Est. Priority Date: 09/30/1991
Status: Expired due to Term

First Claim

Patent Images

1. In a speech recognition system of the type in which, in response to an input utterance characterized by a sequence of input acoustic segments, a path is dynamically decoded through a recognition network of sound characterizing arcs, predetermined ones of which connect at nodes which define permissible transitions, by matching sequential input acoustic segments to corresponding points along each possible arc and by measuring closeness of match to obtain a cost metric for reaching each node by the best path for each input time;

a computer implemented method of generating an output network structure which is composed of a selection of said arcs explaining segments of the input utterance and interconnections thereof, said method comprising;

for each arc, calculating and storing along with said node cost metrics, an arc score which represents the best path cost of reaching the last point on that arc at the corresponding time;

when the end of the recognition network is reached, initiating a traceback procedure which comprises;

assigning a minimal node out score to the end of the recognition network;

for each successive node in said traceback procedure, selecting all incoming arcs having arc scores which, when combined with the node out score of the respective terminating node, do not exceed the best path score for the end of the recognition network by more than a preselected margin; and

calculating, for each node which initiated selected arcs, a node out score which represents the cost of the best path from that node to the end of the recognition network, whereby the successively selected arcs and the joining nodes define the output network structure.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the speech recognition system disclosed herein, the Viterbi decoding of an acoustic recognition network is augmented by implementing additional data structures for each arc in the network which represent, for each arc, the best path cost of reaching the last point on that arc and an arc in time which represents, for the best path, the time of leaving the previous arc. These additional data structures enable a trace back procedure which identifies not only the presumably optimal path but also alternate paths having good scores.

Citations

5 Claims

1. In a speech recognition system of the type in which, in response to an input utterance characterized by a sequence of input acoustic segments, a path is dynamically decoded through a recognition network of sound characterizing arcs, predetermined ones of which connect at nodes which define permissible transitions, by matching sequential input acoustic segments to corresponding points along each possible arc and by measuring closeness of match to obtain a cost metric for reaching each node by the best path for each input time;
- a computer implemented method of generating an output network structure which is composed of a selection of said arcs explaining segments of the input utterance and interconnections thereof, said method comprising;
  
  for each arc, calculating and storing along with said node cost metrics, an arc score which represents the best path cost of reaching the last point on that arc at the corresponding time;
  
  when the end of the recognition network is reached, initiating a traceback procedure which comprises;
  
  assigning a minimal node out score to the end of the recognition network;
  
  for each successive node in said traceback procedure, selecting all incoming arcs having arc scores which, when combined with the node out score of the respective terminating node, do not exceed the best path score for the end of the recognition network by more than a preselected margin; and
  
  calculating, for each node which initiated selected arcs, a node out score which represents the cost of the best path from that node to the end of the recognition network, whereby the successively selected arcs and the joining nodes define the output network structure.
- View Dependent Claims (2, 3, 4)
- - 2. The method as set forth in claim 1 wherein each arc comprises a succession of spectral states.
  - 3. The method as set forth in claim 2 wherein an input utterance is converted to a succession of spectral frames and said frames are matched to said states with said cost metric being a function of the closeness of match.
  - 4. The method as set forth in claim 3 wherein said margin is a fixed offset in the cost metric.

5. In a speech recognition system of the type in which, in response to an input utterance characterized by a sequence of input acoustic segments, a path is dynamically decoded through a recognition network of sound characterizing arcs each of which comprises a succession of preselected spectral states, predetermined ones of said arcs being connected at nodes which define permissible transitions, by matching the spectral characteristics of sequential input acoustic segments to corresponding spectral states along each possible arc and by measuring closeness of match to obtain a cost metric for reaching each node by the best patch for each input time;
- a computer implemented method of generating an output network structure which is composed of a selection of said arcs and interconnections thereof together with corresponding costs, said method comprising;
  
  for each arc, calculating and storing along with said node cost metrics, an arc score which represents the best path cost of reaching the last point on that arc at the corresponding time and an arc intime which represents, for the best path, the time of leaving the previous arc;
  
  when the end of the recognition network is reached, initiating a traceback procedure which comprises;
  
  assigning a minimal node out score to the end of the recognition network;
  
  for each successive node encountered in said traceback procedure, selecting all incoming arcs having arc scores which, when combined with the node out score of the respective terminating node, do not exceed the best path score for the end of the recognition network by more than a preselected margin; and
  
  calculating, for each node which initiates selected arcs as identified by the respective arc intime, a node out score which represents the cost of the best path from that node to the end of the recognition network, whereby the successively selected arcs and the joining nodes define the output structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Kurzwell Applied Intelligence, Inc.
Inventors
Lynch, Thomas E.
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
HAFIZ, TARIQ R

Application Number

US07/767,437
Time in Patent Office

1,226 Days
Field of Search

381/36-53, 392/9, 395/2, 395/2.1-2.87
US Class Current

704/242
CPC Class Codes

G10L 15/083 Recognition networks G10L15...

Speech recognition providing multiple outputs

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition providing multiple outputs

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links