CONTEXT SENSITIVE MULTI-STAGE SPEECH RECOGNITION

US 20090182559A1
Filed: 10/07/2008
Published: 07/16/2009
Est. Priority Date: 10/08/2007
Status: Abandoned Application

First Claim

Patent Images

10. A method of enrolling a voice segment comprising:

detecting a speech signal representing a verbal utterance;

digitizing the detected speech signal;

generating a phonetic representation of the speech signal that is designated a first recognition result;

generating variants of the phonetic representation based on a plurality of context information provided for the phonetic representation;

selecting one or more variants of the phonetic representation that is designated a second recognition result;

matching the second recognition result with stored phonetic representations of entries of one or more stored lexical lists; and

adding the second recognition result to the stored phonetic representations when none of the phonetic representations of the entries of the one or more stored lexical list match the second recognition result better than a predetermined matching threshold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system enables devices to recognize and process speech. The system includes a database that retains one or more lexical lists. A speech input detects a verbal utterance and generates a speech signal corresponding to the detected verbal utterance. A processor generates a phonetic representation of the speech signal that is designated a first recognition result. The processor generates variants of the phonetic representation based on context information provided by the phonetic representation. One or more of the variants of the phonetic representation selected by the processor are designated as a second recognition result. The processor matches the second recognition result with stored phonetic representations of one or more of the stored lexical lists.

32 Citations

View as Search Results

21 Claims

10. A method of enrolling a voice segment comprising:
- detecting a speech signal representing a verbal utterance;
  
  digitizing the detected speech signal;
  
  generating a phonetic representation of the speech signal that is designated a first recognition result;
  
  generating variants of the phonetic representation based on a plurality of context information provided for the phonetic representation;
  
  selecting one or more variants of the phonetic representation that is designated a second recognition result;
  
  matching the second recognition result with stored phonetic representations of entries of one or more stored lexical lists; and
  
  adding the second recognition result to the stored phonetic representations when none of the phonetic representations of the entries of the one or more stored lexical list match the second recognition result better than a predetermined matching threshold.
- View Dependent Claims (11)
- - 11. The method of claim 10, further comprising generating an entry in the one or more lexical lists that corresponds to the second recognition result.

12. A computer-readable storage medium that stores instructions that, when executed by processor, cause the processor to recognize speech by executing software that causes the following act comprising:
- digitizing a speech signal representing a verbal utterance;
  
  generating a phonetic representation of the speech signal that is designated as a first recognition result;
  
  generating variants of the phonetic representation based on a plurality of context information provided for the phonetic representation;
  
  selecting one or more variants of the phonetic representation that is designated as a second recognition result;
  
  matching the second recognition result with stored phonetic representations of entries of one or more stored lexical lists; and
  
  adding the second recognition result to the stored phonetic representations when the phonetic representations of the entries of the one or more stored lexical list do not match the second recognition result better than a predetermined matching threshold.
- View Dependent Claims (13)
- - 13. The computer-readable storage medium of claim 12, further comprising generating an entry in the one or more lexical lists that corresponds to the second recognition result.

14. Speech recognition device, comprisinga database comprising one or more lexical lists;
- a speech input interface configured to detect a verbal utterance and configured to generate a speech signal corresponding to the detected verbal utterance;
  
  a processor programmed to;
  
  generate a phonetic representation of the speech signal as a first recognition result;
  
  generate variants of the phonetic representation based on context information provided for the phonetic representation;
  
  select one or more of the variants of the phonetic representation as a second recognition result; and
  
  match the second recognition result with stored phonetic representations of entries of the one or more stored lexical lists.
- View Dependent Claims (1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 16, 17, 18)
- - 1. A method that recognizes speech comprising:
    - detecting a speech signal representing a voiced or unvoiced segment;
      
      converting the speech signal into a discrete output;
      
      generating a representation of the speech signal as a set of distinct characters or symbols;
      
      designating the representation as a first recognition result;
      
      generating variants of the first recognition result; and
      
      selecting one or more variants of the first representation result as a second recognition result.
  - 2. The method of claim 1 where the phonetic representation comprises phonemes and the variants are based on a predetermined probability of mistaking one phoneme for another phoneme.
  - 3. The method of claim 2 further comprising scoring the variants of the phonetic representation and generating the second recognition result based on the scores of the variants of the phonetic representation.
  - 4. The method of claim 3 where the variants of the phonetic representation are based on context information comprising a polyphone model.
  - 5. The method of claim 3 where the variants of the phonetic representation are based on context information comprising a triphone model.
  - 6. The method of claim 1 further comprising accessing a database comprising phonetic representations of entries of one or more lexical lists scored against variants of the second recognition result using the same acoustic features of the second recognition.
  - 7. The method of claim 1 further comprising dividing the speech signal into speech segment intervals.
  - 8. The method of claim 7 where the division of the speech signal into speech segment intervals is based on a plurality of prosodic features of a verbal utterance.
  - 9. The method of claim 8 where the division of the speech signal into speech segment intervals is based on a plurality of speech pauses included in the verbal utterance.
  - 15. The speech recognition system of claim 14 where the processor is further configured to add the second recognition result to the stored phonetic representations, if none of the phonetic representations of the entries of the one or more stored lexical lists match the second recognition result better than a predetermined matching threshold retained in the database.
  - 16. The speech recognition system of claim 15 where the processor is configured to generate the variants of the phonetic representation based on a predetermined probability of mistaking one phoneme for another phoneme.
  - 17. The speech recognition system of claim 14 where the processor is configured to generate the variants of the phonetic representation based on a predetermined probability of mistaking one phoneme for another phoneme.
  - 18. The speech recognition system of claim 14 where the processor is configured to score the variants of the one phonetic representation and generate the second recognition result based on the scores of the variants of the one phonetic representation.

18-1. The speech recognition system of claim 14, where the processor is further configured to divide the verbal utterance into intervals.

20. Speech recognition device, comprisinga database comprising one or more lexical lists;
- a speech input interface configured to detect a verbal utterance and configured to generate a speech signal corresponding to the detected verbal utterance;
  
  a signal processor in communication with the speech input to beamform the verbal utterance received through a sensor array; and
  
  a processor programmed to;
  
  generate a phonetic representation of the speech signal as a first recognition result;
  
  generate variants of the phonetic representation based on context information provided for the phonetic representation;
  
  select one or more of the variants of the phonetic representation as a second recognition result; and
  
  match the second recognition result with stored phonetic representations of entries of at least one stored lexical list.
- View Dependent Claims (21)
- - 21. The speech recognition device of claim 20 where the processor comprises the signal processor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Romer, Roland, Schatz, Ulrich, Gerl, Franz, Hillebrecht, Christian

Application Number

US12/247,201
Publication Number

US 20090182559A1
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 2015/025 Phonemes, fenemes or fenone...

CONTEXT SENSITIVE MULTI-STAGE SPEECH RECOGNITION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

32 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

CONTEXT SENSITIVE MULTI-STAGE SPEECH RECOGNITION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links