Speech recognition providing multiple outputs
First Claim
1. In a speech recognition system of the type in which, in response to an input utterance characterized by a sequence of input acoustic segments, a path is dynamically decoded through a recognition network of sound characterizing arcs, predetermined ones of which connect at nodes which define permissible transitions, by matching sequential input acoustic segments to corresponding points along each possible arc and by measuring closeness of match to obtain a cost metric for reaching each node by the best path for each input time;
- a computer implemented method of generating an output network structure which is composed of a selection of said arcs explaining segments of the input utterance and interconnections thereof, said method comprising;
for each arc, calculating and storing along with said node cost metrics, an arc score which represents the best path cost of reaching the last point on that arc at the corresponding time;
when the end of the recognition network is reached, initiating a traceback procedure which comprises;
assigning a minimal node out score to the end of the recognition network;
for each successive node in said traceback procedure, selecting all incoming arcs having arc scores which, when combined with the node out score of the respective terminating node, do not exceed the best path score for the end of the recognition network by more than a preselected margin; and
calculating, for each node which initiated selected arcs, a node out score which represents the cost of the best path from that node to the end of the recognition network, whereby the successively selected arcs and the joining nodes define the output network structure.
11 Assignments
0 Petitions
Accused Products
Abstract
In the speech recognition system disclosed herein, the Viterbi decoding of an acoustic recognition network is augmented by implementing additional data structures for each arc in the network which represent, for each arc, the best path cost of reaching the last point on that arc and an arc in time which represents, for the best path, the time of leaving the previous arc. These additional data structures enable a trace back procedure which identifies not only the presumably optimal path but also alternate paths having good scores.
-
Citations
5 Claims
-
1. In a speech recognition system of the type in which, in response to an input utterance characterized by a sequence of input acoustic segments, a path is dynamically decoded through a recognition network of sound characterizing arcs, predetermined ones of which connect at nodes which define permissible transitions, by matching sequential input acoustic segments to corresponding points along each possible arc and by measuring closeness of match to obtain a cost metric for reaching each node by the best path for each input time;
-
a computer implemented method of generating an output network structure which is composed of a selection of said arcs explaining segments of the input utterance and interconnections thereof, said method comprising; for each arc, calculating and storing along with said node cost metrics, an arc score which represents the best path cost of reaching the last point on that arc at the corresponding time; when the end of the recognition network is reached, initiating a traceback procedure which comprises; assigning a minimal node out score to the end of the recognition network; for each successive node in said traceback procedure, selecting all incoming arcs having arc scores which, when combined with the node out score of the respective terminating node, do not exceed the best path score for the end of the recognition network by more than a preselected margin; and calculating, for each node which initiated selected arcs, a node out score which represents the cost of the best path from that node to the end of the recognition network, whereby the successively selected arcs and the joining nodes define the output network structure. - View Dependent Claims (2, 3, 4)
-
-
5. In a speech recognition system of the type in which, in response to an input utterance characterized by a sequence of input acoustic segments, a path is dynamically decoded through a recognition network of sound characterizing arcs each of which comprises a succession of preselected spectral states, predetermined ones of said arcs being connected at nodes which define permissible transitions, by matching the spectral characteristics of sequential input acoustic segments to corresponding spectral states along each possible arc and by measuring closeness of match to obtain a cost metric for reaching each node by the best patch for each input time;
-
a computer implemented method of generating an output network structure which is composed of a selection of said arcs and interconnections thereof together with corresponding costs, said method comprising; for each arc, calculating and storing along with said node cost metrics, an arc score which represents the best path cost of reaching the last point on that arc at the corresponding time and an arc intime which represents, for the best path, the time of leaving the previous arc; when the end of the recognition network is reached, initiating a traceback procedure which comprises;
assigning a minimal node out score to the end of the recognition network;for each successive node encountered in said traceback procedure, selecting all incoming arcs having arc scores which, when combined with the node out score of the respective terminating node, do not exceed the best path score for the end of the recognition network by more than a preselected margin; and calculating, for each node which initiates selected arcs as identified by the respective arc intime, a node out score which represents the cost of the best path from that node to the end of the recognition network, whereby the successively selected arcs and the joining nodes define the output structure.
-
Specification