System for using silence in speech recognition

US 6,374,219 B1
Filed: 02/20/1998
Issued: 04/16/2002
Est. Priority Date: 09/19/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of recognizing speech based on an input data stream indicative of the speech, the method comprising:

providing possible words represented by the input data stream, and formed of phonemes, as a prefix tree including a plurality of phoneme branches connected at nodes, each phoneme branch corresponding to a phoneme, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree; and

traversing the prefix tree to obtain a word that is likely represented by the input data stream.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for recognizing speech based on an input data stream indicative of the speech provides possible words represented by the input data stream as a prefix tree including a plurality of phoneme branches connected at nodes. The plurality of phoneme branches is bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree. The prefix tree is traversed to obtain a word that is likely represented by the input data stream. The silence phones provided in the prefix tree can vary based on context.

Citations

23 Claims

1. A method of recognizing speech based on an input data stream indicative of the speech, the method comprising:
- providing possible words represented by the input data stream, and formed of phonemes, as a prefix tree including a plurality of phoneme branches connected at nodes, each phoneme branch corresponding to a phoneme, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree; and
  
  traversing the prefix tree to obtain a word that is likely represented by the input data stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein traversing the prefix tree comprises:
3. The method of claim 1 wherein providing possible words comprises:
- providing the prefix tree with a plurality of silence branches on the input side of the prefix tree each silence branch being connected at nodes to at least one phoneme branch.
4. The method of claim 3 wherein providing the prefix tree with a plurality of silence branches comprises:
- providing the prefix tree with the plurality of silence branches wherein the silence phones represented by the plurality of silence branches vary based on context.
5. The method of claim 3 wherein providing possible words comprises:
- providing the prefix tree with the plurality of silence branches on the input side of the prefix tree, a silence phone represented by each silence branch varying from phones represented by other silence branches based on the phonemes to which the silence branch is connected.
6. The method of claim 3 wherein traversing the prefix tree comprises:
- assigning a score to the nodes connected between the silence branches and the phoneme branches indicative of a likelihood that the input data is representative of the silence phone corresponding to the silence branch leading to the node to which the score is then being assigned.
7. The method of claim 6 wherein traversing the prefix tree comprises:
- pruning branches from the prefix tree based on the scores assigned to the nodes connected between the silence branches and the phoneme branches.
8. The method of claim 7 wherein pruning comprises:
- discontinuing further traversing of branches in the prefix tree leading out of nodes for which the scores assigned thereto meet a pruning threshold level.

9. A method of recognizing speech based on an input data stream indicative of the speech, the method comprising:
- providing a lexicon including entries formed of possible words represented by the input data stream, the entries being bracketed by silence phones; and
  
  searching the lexicon, based on the input data stream, to determine a word likely represented by the input data stream;
  
  wherein providing a lexicon includes providing the lexicon as a prefix tree including a plurality of phoneme branches connected at nodes, each phoneme branch corresponding to a phoneme, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9 wherein providing the lexicon comprises:
11. The method of claim 10 wherein providing the prefix tree with a plurality of silence branches comprises:
- providing the prefix tree with the plurality of silence branches wherein the silence phones represented by the plurality of silence branches vary based on context.
12. The method of claim 10 wherein providing the lexicon comprises:
- providing the prefix tree with the plurality of silence branches on the input side of the prefix tree, a silence phone represented by each silence branch varying from phones represented by other silence branches based on the phonemes to which the silence branch is connected.
13. The method of claim 10 wherein traversing the prefix tree comprises:
- assigning a score to the nodes connected between the silence branches and the phoneme branches indicative of a likelihood that the input data is representative of the silence phone corresponding to the silence branch leading to the node to which the score is then being assigned.

14. A method of recognizing speech from input data indicative of the speech, the method comprising:
- providing speech unit models representative of speech units;
  
  providing silence models of context dependent silence phones; and
  
  selecting speech units and context dependent silence phones, based on the input data and based on the speech unit models and the silence models, that are likely represented by the input data;
  
  wherein providing the speech unit models and providing the silence models comprises providing the speech unit models and the silence models as a prefix tree including a plurality of phoneme branches connected at nodes, each phoneme branch corresponding to a phoneme, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree.
- View Dependent Claims (15)
- - 15. The method of claim 14 wherein selecting speech units and context dependent silence phones comprises:

16. A computer readable medium having stored thereon components comprising:
- a prefix tree including a plurality of phonemes corresponding to phoneme branches connected at nodes, the plurality of phoneme branches being bracketed by at least one input silence branch corresponding to a silence phone on an input side of the prefix tree and at least one output silence branch corresponding to a silence phone on an output side of the prefix tree.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer readable medium of claim 16 wherein the components further comprise:
18. The computer readable medium of claim 17 wherein the prefix tree further includes:
- a plurality of silence branches on the input side of the prefix tree, each silence branch being connected at a node to at least one of the phoneme branches.
19. The computer readable medium of claim 18 wherein silence phones represented by the plurality of silence branches vary based on context.
20. The computer readable medium of claim 17 wherein the plurality of silence branches are provided on the input side of the prefix tree and wherein a silence phone represented by a silence branch varies from silence phones represented by other silence branches based on the phonemes to which the silence branch is connected.

21. A computer readable medium having stored thereon a data structure, comprising:
- a first data portion containing data indicative of at least one input silence phone;
  
  a second data portion containing data indicative of a plurality of phonemes;
  
  a third data portion containing data indicative of at least one output silence phone; and
  
  the first, second and third data portions being arranged to function, when traversed, as a prefix tree which yields a word likely representative of an input data stream.
- View Dependent Claims (22, 23)
- - 22. The computer readable medium of claim 21 wherein the first and third data portions each include a plurality of silence phones such that the prefix tree includes different input silence phones and output silence phones connected to each of the plurality of phonemes.
  - 23. The computer readable medium of claim 21 wherein the data in the first and third data portions is indicative of context dependent silence phones, the context dependent silence phones varying based on the phonemes to which they are connected in the prefix tree.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Jiang, Li
Primary Examiner(s)
Korzuch, William
Assistant Examiner(s)
Azad, Abul K.

Application Number

US09/026,841
Time in Patent Office

1,516 Days
Field of Search

704/251, 704/252, 704/253, 704/255, 704/256, 704/257, 704/242
US Class Current

704/255
CPC Class Codes

G10L 15/05   Word boundary detection

G10L 15/08   Speech classification or se...

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/085   Methods for reducing search...

System for using silence in speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

System for using silence in speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links