Semi-discrete utterance recognizer for carefully articulated speech

US 20040210437A1
Filed: 04/15/2003
Published: 10/21/2004
Est. Priority Date: 04/15/2003
Status: Abandoned Application

First Claim

Patent Images

1. A method for performing speech recognition of a user'"'"'s speech, comprising:

performing a first speech recognition process on each utterance of the user'"'"'s speech, using acoustic models that are based on training data of non-discrete utterances;

performing a second speech recognition process on each utterance of the user'"'"'s speech, using acoustic models that are based on training data of discrete utterances;

obtaining a first match score for each utterance of the user'"'"'s speech from the first speech recognition process and obtaining a second match score for each utterance of the user'"'"'s speech from the second speech recognition process, determining a highest match score from the first and second match scores; and

providing a speech recognition output for the user'"'"'s speech, based on highest match scores of each utterance as obtained from the first and second speech recognition processes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for performing speech recognition of a user'"'"'s speech includes performing a first speech recognition process on each utterance of the user'"'"'s speech, using acoustic models that are based on training data of non-discrete utterances. The method also includes performing a second speech recognition process on each utterance of the user'"'"'s speech, using acoustic models that are based on training data of discrete utterances. The method further includes obtaining a first match score for each utterance of the user'"'"'s speech from the first speech recognition process and obtaining a second match score for each utterance of the user'"'"'s speech from the second speech recognition process. The method also includes determining a highest match score from the first and second match scores. The method further includes providing a speech recognition output for the user'"'"'s speech, based on highest match scores of each utterance as obtained from the first and second speech recognition processes.

68 Citations

View as Search Results

17 Claims

1. A method for performing speech recognition of a user'"'"'s speech, comprising:
- performing a first speech recognition process on each utterance of the user'"'"'s speech, using acoustic models that are based on training data of non-discrete utterances;
  
  performing a second speech recognition process on each utterance of the user'"'"'s speech, using acoustic models that are based on training data of discrete utterances;
  
  obtaining a first match score for each utterance of the user'"'"'s speech from the first speech recognition process and obtaining a second match score for each utterance of the user'"'"'s speech from the second speech recognition process, determining a highest match score from the first and second match scores; and
  
  providing a speech recognition output for the user'"'"'s speech, based on highest match scores of each utterance as obtained from the first and second speech recognition processes.
- View Dependent Claims (2, 3)
- - 2. The method according to claim 1, wherein each utterance of the user'"'"'s speech corresponds to portions of the user'"'"'s speech that exist between pauses of at least a predetermined duration in the user'"'"'s speech.
  - 3. The method according to claim 1, wherein the user'"'"'s speech is divided into frames, and wherein each utterance of the user'"'"'s speech is disposed within a particular group of adjacent frames.

4. A method for performing speech recognition of a user'"'"'s speech;
- comprising;
  
  performing a first speech recognition process on the user'"'"'s speech in a first mode of operation, using acoustic models that are based on training data of non-discrete utterances;
  
  performing a second speech recognition process on the user'"'"'s speech in a second mode of operation, using acoustic models that are based on training data of discrete utterances, and providing a speech recognition output for the user'"'"'s speech, based on respective outputs from the first and second speech recognition processes, wherein only one of the first and second speech recognition processes is capable of being operative at any particular moment in time.
- View Dependent Claims (5, 6, 8)
- - 5. The method according to claim 4, wherein the first mode of operation corresponds to a normal dictation mode of a speech recognizer, and the second mode of operation corresponds to an error correction mode of the speech recognizer.
  - 6. The method according to claim 4, wherein the first mode of operation corresponds to a normal dictation mode of a speech recognizer, and the second mode of operation corresponds to a command and control mode.
  - 8. The system according to claim 6, further comprising:
    - a display unit configured to display a textual output corresponding to speech recognition output of the first speech recognition unit, wherein a user reviews the textual output to make a determination as to whether or not to initiate the error correction mode.

7. A system for performing speech recognition of a user'"'"'s speech;
- comprising;
  
  a control unit for receiving the user'"'"'s speech and for determining whether or not an error correction mode is to be initiated based on utterances made in the user'"'"'s speech, and to output a control signal indicative of whether or not the error correction mode is in operation;
  
  a first speech recognition unit configured to receive the user'"'"'s speech and to perform a first speech recognition processing on the user'"'"'s speech when the control signal provided by the control unit indicates that the error correction mode is not in operation; and
  
  a second speech recognition unit configured to receive the user'"'"'s speech and to perform a second speech recognition processing on the user'"'"'s speech when the control signal provided by the control unit indicates that the error correction mode is in operation;
  
  wherein the second speech recognition unit utilizes training data of speech that is spoken in a slower word rate than training data of speech used by the first speech recognition unit.

9. A system for performing speech recognition of a user'"'"'s speech;
- comprising;
  
  a first speech recognition unit configured to receive the user'"'"'s speech and to perform a first speech recognition processing on the user'"'"'s speech based in part on training data of speech spoken at a first speech rate or higher, the first speech recognition unit outputting a first match score for each utterance of the user'"'"'s speech;
  
  a second speech recognition unit configured to receive the user'"'"'s speech and to perform a first speech recognition processing on the user'"'"'s speech based in part on training data of speech spoken at a speech rate lower than the first speech rate, the second speech recognition unit outputting a second match score for each utterance of the user'"'"'s speech; and
  
  a comparison unit configured to receive the first and second match scores and to determine, for each utterance of the user'"'"'s speech, which of the first and second match scores is highest, wherein a speech recognition output corresponds to a highest match score for each utterance of the user'"'"'s speech, as output from the comparison unit.
- View Dependent Claims (10)
- - 10. The system according to claim 9, wherein the second speech recognition unit utilizes training data of speech that is spoken in a slower word rate than training data of speech used by the first speech recognition unit.

11. A program product having machine readable code for performing speech recognition of a user'"'"'s speech, the program code, when executed, causing a machine to perform the following steps:
- performing a first speech recognition process on each utterance of the user'"'"'s speech, using acoustic models that are based on training data of non-discrete utterances;
  
  performing a second speech recognition process on each utterance of the user'"'"'s speech, using acoustic models that are based on training data of discrete utterances;
  
  obtaining a first match score for each utterance of the user'"'"'s speech from the first speech recognition process and obtaining a second match score for each utterance of the user'"'"'s speech from the second speech recognition process, determining a highest match score from the first and second match scores; and
  
  providing a speech recognition output for the user'"'"'s speech, based on highest match scores of each utterance as obtained from the first and second speech recognition processes.
- View Dependent Claims (12, 13)
- - 12. The program product according to claim 11, wherein each utterance of the user'"'"'s speech corresponds to portions of the user'"'"'s speech that exist between pauses of at least a predetermined duration in the user'"'"'s speech.
  - 13. The program product according to claim 11, wherein the user'"'"'s speech is divided into frames, and wherein each utterance of the user'"'"'s speech is disposed within a particular group of adjacent frames.

14. A program product for performing speech recognition of a user'"'"'s speech;
- comprising;
  
  performing a first speech recognition process on the user'"'"'s speech in a first mode of operation, using acoustic models that are based on training data of non-discrete utterances;
  
  performing a second speech recognition process on the user'"'"'s speech in a second mode of operation, using acoustic models that are based on training data of discrete utterances, and providing a speech recognition output for the user'"'"'s speech, based on respective outputs from the first and second speech recognition processes, wherein only one of the first and second speech recognition processes is capable of being operative at any particular moment in time.
- View Dependent Claims (15, 16, 17)
- - 15. The program product according to claim 14, wherein each utterance of the user'"'"'s speech corresponds to portions of the user'"'"'s speech that exist between pauses of at least a predetermined duration in the user'"'"'s speech.
  - 16. The program product according to claim 14, wherein the first mode of operation corresponds to a normal dictation mode of a speech recognizer, and the second mode of operation corresponds to an error correction mode of the speech recognizer.
  - 17. The program product according to claim 14, wherein the user'"'"'s speech is divided into frames, and wherein each utterance of the user'"'"'s speech is disposed within a particular group of adjacent frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Aurilab LLC
Original Assignee
Aurilab LLC
Inventors
Baker, James K.

Application Number

US10/413,375
Publication Number

US 20040210437A1
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/08 Speech classification or se...

G10L 15/32 Multiple recognisers used i...

Semi-discrete utterance recognizer for carefully articulated speech

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

68 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Semi-discrete utterance recognizer for carefully articulated speech

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

68 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links