Word boundary acoustic units

US 6,606,594 B1
Filed: 09/29/1999
Issued: 08/12/2003
Est. Priority Date: 09/29/1998
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system for recognizing an input utterance of spoken words, the system comprising:

a set of word models for modeling vocabulary to be recognized, each word model being associated with a word in the vocabulary, each word in the vocabulary considered as a sequence of phones including a first phone and a last phone, wherein each word model begins in the middle of the first phone of its associated word and ends in the middle of the last phone of its associated word;

a set of word connecting models for modeling acoustic transitions between the middle of a word'"'"'s last phone and the middle of an immediately succeeding word'"'"'s first phone; and

a recognition engine for processing the input utterance in relation to the set of word models and the set of word connecting models to cause recognition of the input utterance.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system recognizes an input utterance of spoken words. The system includes a set of word models for modeling vocabulary to be recognized, each word model being associated with a word in the vocabulary, each word in the vocabulary considered as a sequence of phones including a first phone and a last phone, wherein each word model begins in the middle of the first phone of its associated word and ends in the middle of the last phone of its associated word; a set of word connecting models for modeling acoustic transitions between the middle of a word'"'"'s last phone and the middle of an immediately succeeding word'"'"'s first phone; and a recognition engine for processing the input utterance in relation to the set of word models and the set of word connecting models to cause recognition of the input utterance.

Citations

36 Claims

1. A speech recognition system for recognizing an input utterance of spoken words, the system comprising:
- a set of word models for modeling vocabulary to be recognized, each word model being associated with a word in the vocabulary, each word in the vocabulary considered as a sequence of phones including a first phone and a last phone, wherein each word model begins in the middle of the first phone of its associated word and ends in the middle of the last phone of its associated word;
  
  a set of word connecting models for modeling acoustic transitions between the middle of a word'"'"'s last phone and the middle of an immediately succeeding word'"'"'s first phone; and
  
  a recognition engine for processing the input utterance in relation to the set of word models and the set of word connecting models to cause recognition of the input utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A system as in claim 1, wherein each word model uses context-dependent phone models to represent the sequence of phones.
  - 3. A system as in claim 2, wherein the context-dependent phone models are triphones.
  - 4. A system as in claim 1, wherein the acoustic transitions include a pause.
  - 5. A system as in claim 1, wherein the acoustic transitions include a period of silence.
  - 6. A system as in claim 1, wherein the acoustic transitions include a period of noise.
  - 7. A system as in claim 1, wherein each word connecting model further includes a previous word identification field which represents the word associated with the word model immediately preceding the word connecting model.
  - 8. A system as in claim 1, wherein each word connecting model further includes an ending score field which represents a best score from the beginning of the input utterance to reach the word connecting model.
  - 9. A system as in claim 1, wherein each word connecting model further includes a type field which represents specific details of the word connecting model.

10. A method of a speech recognition system for recognizing an input utterance of spoken words, the method comprising:
- modeling vocabulary to be recognized with a set of word models, each word model being associated with a word in the vocabulary, each word in the vocabulary being considered as a sequence of phones including a first phone and a last phone, wherein each word model begins in the middle of the first phone of its associated word and ends in the middle of the last phone of its associated word;
  
  modeling acoustic transitions between the middle of a word'"'"'s last phone and the middle of an immediately succeeding word'"'"'s first phone with a set of word connecting models; and
  
  processing with a recognition engine the input utterance in relation to the set of word models and the set of word connecting models to cause recognition of the input utterance.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. A method as in claim 10, wherein each word model uses context-dependent phone models to represent the sequence of phones.
  - 12. A method as in claim 11, wherein the context-dependent phone models are triphones.
  - 13. A method as in claim 10, wherein the acoustic transitions include a pause.
  - 14. A method as in claim 10, wherein the acoustic transitions include a period of silence.
  - 15. A method as in claim 10, wherein the acoustic transitions include a period of noise.
  - 16. A method as in claim 10, wherein each word connecting model further includes a previous word identification field which represents the word associated with the word model immediately preceding the word connecting model.
  - 17. A method as in claim 10, wherein each word connecting model further includes an ending score field which represents a best score from the beginning of the input utterance to reach the word connecting model.
  - 18. A method as in claim 10, wherein each word connecting model further includes a type field which represents specific details of the word connecting model.

19. An improved speech recognition system of the type employing word models, wherein the improvement comprises:
- a set of word models for modeling vocabulary to be recognized, each word model being associated with a word in the vocabulary, each word in the vocabulary considered as a sequence of phones including a first phone and a last phone, wherein each word model begins in the middle of the first phone of its associated word and ends in the middle of the last phone of its associated word; and
  
  a set of word connecting models for modeling acoustic transitions between the middle of a word'"'"'s last phone and the middle of an immediately succeeding word'"'"'s first phone.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. A system as in claim 19, wherein each word model uses context-dependent phone models to represent the sequence of phones.
  - 21. A system as in claim 20, wherein the context-dependent phone models are triphones.
  - 22. A system as in claim 19, wherein the acoustic transitions include a pause.
  - 23. A system as in claim 19, wherein the acoustic transitions include a period of silence.
  - 24. A system as in claim 19, wherein the acoustic transitions include a period of noise.
  - 25. A system as in claim 19, wherein each word connecting model further includes a previous word identification field which represents the word associated with the word model immediately preceding the word connecting model.
  - 26. A system as in claim 19, wherein each word connecting model further includes an ending score field which represents a best score from the beginning of the input utterance to reach the word connecting model.
  - 27. A system as in claim 19, wherein each word connecting model further includes a type field which represents specific details of the word connecting model.

28. An improved method of a speech recognition system for recognizing an input utterance of spoken words, the improvement comprising:
- modeling vocabulary to be recognized with a set of word models, each word model being associated with a word in the vocabulary, each word in the vocabulary being considered as a sequence of phones including a first phone and a last phone, wherein each word model begins in the middle of the first phone of its associated word and ends in the middle of the last phone of its associated word; and
  
  modeling acoustic transitions between the middle of a word'"'"'s last phone and the middle of an immediately succeeding word'"'"'s first phone with a set of word connecting models.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36)
- - 29. A method as in claim 28, wherein each word model uses context-dependent phone models to represent the sequence of phones.
  - 30. A method as in claim 29, wherein the context-dependent phone models are triphones.
  - 31. A method as in claim 28, wherein the acoustic transitions include a pause.
  - 32. A method as in claim 28, wherein the acoustic transitions include a period of silence.
  - 33. A method as in claim 28, wherein the acoustic transitions include a period of noise.
  - 34. A method as in claim 28, wherein each word connecting model further includes a previous word identification field which represents the word associated with the word model immediately preceding the word connecting model.
  - 35. A method as in claim 28, wherein each word connecting model further includes an ending score field which represents a best score from the beginning of the input utterance to reach the word connecting model.
  - 36. A method as in claim 28, wherein each word connecting model further includes a type field which represents specific details of the word connecting model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
ScanSoft, Inc. n/k/a Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Sarukkai, Ramesh, Lynch, Tom, Sejnoha, Vladimir
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US09/408,388
Time in Patent Office

1,413 Days
Field of Search

704/231, 704/232, 704/242, 704/243, 704/245, 704/256, 704/257
US Class Current

704/250
CPC Class Codes

G10L 15/05   Word boundary detection

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/022   Demisyllables, biphones or ...

Word boundary acoustic units

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Word boundary acoustic units

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links