SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION

US 20090112591A1
Filed: 10/31/2007
Published: 04/30/2009
Est. Priority Date: 10/31/2007
Status: Active Grant

First Claim

Patent Images

1. The method for recognizing speech, the method comprising:

receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant;

generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result;

distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech;

calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post-vocalic consonant in the input speech and the first score;

determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score; and

refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are systems and methods for recognizing speech in a spoken dialogue system. The method includes (1) receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant, (2) generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result; (3) distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech, (4) calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post vocalic consonant in the input speech and the first score, (5) determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score, and (6) refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.

6 Citations

21 Claims

1. The method for recognizing speech, the method comprising:
- receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant;
  
  generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result;
  
  distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech;
  
  calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post-vocalic consonant in the input speech and the first score;
  
  determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score; and
  
  refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21)
- - 2. The method of claim 1, wherein if there is a match between the second output lattice and the second score, a word probability is increased.
  - 3. The method of claim 1, wherein at least one output lattice comprises syllabified words.
  - 4. The method of claim 1, wherein if there is a mismatch between the second output lattice and the second score, a word probability is decreased.
  - 5. The method of claim 1, wherein a goodness score is determined by calculating the similarity between the input speech and the training model.
  - 6. The method of claim 1, wherein the ASR system is trained to distinguish each of the pre-vocalic consonants and the post-vocalic consonants.
  - 7. The method of claim 1, wherein the training model distinguishes between the at least one pre-vocalic consonant and the at least one post-vocalic consonant can be accomplished by using hidden Markov models (HMMs), support vector machines (SVMs) and neural networks (NNs).
  - 9. The system of claim 1, wherein if there is a match between the second output lattice and the second score, a word probability is increased.
  - 10. The system of claim 1, wherein at least one output lattice comprises syllabified words.
  - 11. The system of claim 1, wherein if there is a mismatch between the second output lattice and the second score, a word probability is decreased.
  - 12. The system of claim 1, wherein a goodness score is determined by calculating the similarity between the input speech and the training model.
  - 13. The system of claim 1, wherein the ASR system is trained to distinguish each of the pre-vocalic consonants and the post-vocalic consonants.
  - 14. The system of claim 1, wherein the training model distinguishes between the at least one pre-vocalic consonant and the at least one post-vocalic consonant can be accomplished by using hidden Markov models (HMMs), support vector machines (SVMs) and neural networks (NNs).
  - 16. The computer-readable medium of claim 1, wherein if there is a match between the second output lattice and the second score, a word probability is increased.
  - 17. The computer-readable medium of claim 1, wherein at least one output lattice comprises syllabified words.
  - 18. The computer-readable medium of claim 1, wherein if there is a mismatch between the second output lattice and the second score, a word probability is decreased.
  - 19. The computer-readable medium of claim 1, wherein a goodness score is determined by calculating the similarity between the input speech and the training model.
  - 20. The computer-readable medium of claim 1, wherein the ASR system is trained to distinguish each of the pre-vocalic consonants and the post-vocalic consonants.
  - 21. The computer-readable medium of claim 1, wherein the training model distinguishes between the at least one pre-vocalic consonant and the at least one post-vocalic consonant can be accomplished by using hidden Markov models (HMMs), support vector machines (SVMs) and neural networks (NNs).

8. A system for recognizing speech, the system comprising:
- a module configured to receive an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant;
  
  a module configured to generate at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result;
  
  a module configured to distinguish between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech;
  
  a module configured to calculate a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post vocalic consonant in the input speech and the first score;
  
  a module configured to determine at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score; and
  
  a module configured to refine the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.

15. A computer-readable medium storing instructions for controlling a computing device to process speech, the instructions comprising:
- receiving an input speech having at least one pre-vocalic consonant or at least one post-vocalic consonant;
  
  generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result;
  
  distinguishing between the at least one pre-vocalic consonant and the at least one post-vocalic consonant in the input speech;
  
  calculating a second score by measuring a similarity between the at least one pre-vocalic consonant or the at least one post-vocalic consonant in the input speech and the first score;
  
  determining at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch by using the second score; and
  
  refining the results of the an automated speech recognition (ASR) system by using the at least one category for at least one pre-vocalic match or mismatch or at least one post-vocalic match or mismatch.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Labs Incorporated (AT&T, Inc.)
Inventors
KIM, Yeon-Jun, Conkie, Alistair, Syrdal, Ann K., Ljolje, Andrej

Granted Patent

US 8,024,191 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/246
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 25/78 Detection of presence or ab...

SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

6 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

SYSTEM AND METHOD OF WORD LATTICE AUGMENTATION USING A PRE/POST VOCALIC CONSONANT DISTINCTION

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

6 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others