System for using silence in speech recognition
First Claim
1. A method of recognizing speech based on an input data stream indicative of the speech, the method comprising:
- providing possible words represented by the input data stream, and formed of phonemes, as a prefix tree including a plurality of phoneme branches connected at nodes, each phoneme branch corresponding to a phoneme, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree; and
traversing the prefix tree to obtain a word that is likely represented by the input data stream.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for recognizing speech based on an input data stream indicative of the speech provides possible words represented by the input data stream as a prefix tree including a plurality of phoneme branches connected at nodes. The plurality of phoneme branches is bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree. The prefix tree is traversed to obtain a word that is likely represented by the input data stream. The silence phones provided in the prefix tree can vary based on context.
-
Citations
23 Claims
-
1. A method of recognizing speech based on an input data stream indicative of the speech, the method comprising:
-
providing possible words represented by the input data stream, and formed of phonemes, as a prefix tree including a plurality of phoneme branches connected at nodes, each phoneme branch corresponding to a phoneme, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree; and
traversing the prefix tree to obtain a word that is likely represented by the input data stream. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
traversing the prefix tree by assigning a score to a plurality of successive nodes from the input side of the prefix tree to the output side of the prefix tree, the score being indicative of a likelihood that the input data is representative of the phonemes corresponding to branches leading to the nodes to which the score is then being assigned; and
choosing N words corresponding to the silence nodes at the output side of the prefix tree, having scores assigned thereto which meet a threshold level, as likely words represented by the input data stream.
-
-
3. The method of claim 1 wherein providing possible words comprises:
providing the prefix tree with a plurality of silence branches on the input side of the prefix tree each silence branch being connected at nodes to at least one phoneme branch.
-
4. The method of claim 3 wherein providing the prefix tree with a plurality of silence branches comprises:
providing the prefix tree with the plurality of silence branches wherein the silence phones represented by the plurality of silence branches vary based on context.
-
5. The method of claim 3 wherein providing possible words comprises:
providing the prefix tree with the plurality of silence branches on the input side of the prefix tree, a silence phone represented by each silence branch varying from phones represented by other silence branches based on the phonemes to which the silence branch is connected.
-
6. The method of claim 3 wherein traversing the prefix tree comprises:
assigning a score to the nodes connected between the silence branches and the phoneme branches indicative of a likelihood that the input data is representative of the silence phone corresponding to the silence branch leading to the node to which the score is then being assigned.
-
7. The method of claim 6 wherein traversing the prefix tree comprises:
pruning branches from the prefix tree based on the scores assigned to the nodes connected between the silence branches and the phoneme branches.
-
8. The method of claim 7 wherein pruning comprises:
discontinuing further traversing of branches in the prefix tree leading out of nodes for which the scores assigned thereto meet a pruning threshold level.
-
9. A method of recognizing speech based on an input data stream indicative of the speech, the method comprising:
-
providing a lexicon including entries formed of possible words represented by the input data stream, the entries being bracketed by silence phones; and
searching the lexicon, based on the input data stream, to determine a word likely represented by the input data stream;
wherein providing a lexicon includes providing the lexicon as a prefix tree including a plurality of phoneme branches connected at nodes, each phoneme branch corresponding to a phoneme, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree. - View Dependent Claims (10, 11, 12, 13)
providing the prefix tree with a plurality of silence branches on the input side of the prefix tree each silence branch being connected at nodes to at least one phoneme branch.
-
-
11. The method of claim 10 wherein providing the prefix tree with a plurality of silence branches comprises:
providing the prefix tree with the plurality of silence branches wherein the silence phones represented by the plurality of silence branches vary based on context.
-
12. The method of claim 10 wherein providing the lexicon comprises:
providing the prefix tree with the plurality of silence branches on the input side of the prefix tree, a silence phone represented by each silence branch varying from phones represented by other silence branches based on the phonemes to which the silence branch is connected.
-
13. The method of claim 10 wherein traversing the prefix tree comprises:
assigning a score to the nodes connected between the silence branches and the phoneme branches indicative of a likelihood that the input data is representative of the silence phone corresponding to the silence branch leading to the node to which the score is then being assigned.
-
14. A method of recognizing speech from input data indicative of the speech, the method comprising:
-
providing speech unit models representative of speech units;
providing silence models of context dependent silence phones; and
selecting speech units and context dependent silence phones, based on the input data and based on the speech unit models and the silence models, that are likely represented by the input data;
wherein providing the speech unit models and providing the silence models comprises providing the speech unit models and the silence models as a prefix tree including a plurality of phoneme branches connected at nodes, each phoneme branch corresponding to a phoneme, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree.- View Dependent Claims (15)
traversing the prefix tree to obtain a word that is likely represented by the input data stream.
-
-
16. A computer readable medium having stored thereon components comprising:
-
a prefix tree including a plurality of phonemes corresponding to phoneme branches connected at nodes, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree. - View Dependent Claims (17, 18, 19, 20)
a traversing component configured to traverse the prefix tree to obtain a word that is likely represented by an input data stream which is indicative of speech to be recognized.
-
-
18. The computer readable medium of claim 17 wherein the prefix tree further includes:
a plurality of silence branches on the input side of the prefix tree, each silence branch being connected at a node to at least one of the phoneme branches.
-
19. The computer readable medium of claim 18 wherein silence phones represented by the plurality of silence branches vary based on context.
-
20. The computer readable medium of claim 17 wherein the plurality of silence branches are provided on the input side of the prefix tree and wherein a silence phone represented by a silence branch varies from silence phones represented by other silence branches based on the phonemes to which the silence branch is connected.
-
21. A computer readable medium having stored thereon a data structure, comprising:
-
a first data portion containing data indicative of at least one input silence phone;
a second data portion containing data indicative of a plurality of phonemes;
a third data portion containing data indicative of at least one output silence phone; and
the first, second and third data portions being arranged to function, when traversed, as a prefix tree which yields a word likely representative of an input data stream. - View Dependent Claims (22, 23)
-
Specification