SYSTEM AND METHOD OF PERFORMING USER-SPECIFIC AUTOMATIC SPEECH RECOGNITION

US 20120185237A1
Filed: 03/26/2012
Published: 07/19/2012
Est. Priority Date: 03/20/2001
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a voice request from a speaker;

identifying the speaker based on samples of the voice request, the samples obtained at periodic intervals;

receiving a data field selection by the speaker; and

applying, via a processor, one of a plurality of language models to the voice request for speech recognition based on the data field, wherein the speech recognition uses a different language model for each data field selected by the speaker.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.

308 Citations

20 Claims

1. A method comprising:
- receiving a voice request from a speaker;
  
  identifying the speaker based on samples of the voice request, the samples obtained at periodic intervals;
  
  receiving a data field selection by the speaker; and
  
  applying, via a processor, one of a plurality of language models to the voice request for speech recognition based on the data field, wherein the speech recognition uses a different language model for each data field selected by the speaker.
- View Dependent Claims (2, 3, 4, 5, 7)
- - 2. The method of claim 1, further comprising:
    - generating a response to the voice request.
  - 3. The method of claim 2, wherein the response is one of a voice response, a text response, a tactile response, and a Braille response.
  - 4. The method of claim 1, wherein the plurality of language models comprises at least a background model and a transducer model.
  - 5. The method of claim 1, further comprising:
    - determining a confidence score for the voice request from the application of one of the plurality of language models to the voice request.
  - 7. The method of claim 1, further comprising:
    - adjusting the periodic intervals based at least in part on the samples.

6. The method of claim 6, further comprising:
- restarting speech recognition when the confidence score is below a threshold.

8. A system comprising:
- a processor; and
  
  a non-transitory computer-readable storage medium storing instructions which, when executed on the processor, perform a method comprising;
  
  receiving a voice request from a speaker;
  
  identifying the speaker based on samples of the voice request, the samples obtained at periodic intervals;
  
  receiving a data field selection by the speaker; and
  
  applying one of a plurality of language models to the voice request for speech recognition based on the data field, wherein the speech recognition uses a different language model for each data field selected by the speaker.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, the instructions causing the method to further comprise:
    - generating a response to the voice request.
  - 10. The system of claim 9, wherein the response is one of a voice response, a text response, a tactile response, and a Braille response.
  - 11. The system of claim 8, wherein the plurality of language models comprises at least a background model and a transducer model.
  - 12. The system of claim 8, the instructions causing the method to further comprise:
    - determining a confidence score for the voice request from the application of one of the plurality of language models to the voice request.
  - 13. The system of claim 12, the instructions causing the method to further comprise:
    - restarting speech recognition when the confidence score is below a threshold.
  - 14. The system of claim 8, the instructions causing the method to further comprise:
    - adjusting the periodic intervals based at least in part on the samples.

15. A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
- receiving a voice request from a speaker;
  
  identifying the speaker based on samples of the voice request, the samples obtained at periodic intervals;
  
  receiving a data field selection by the speaker; and
  
  applying one of a plurality of language models to the voice request for speech recognition based on the data field, wherein the speech recognition uses a different language model for each data field selected by the speaker.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer-readable storage medium of claim 15, the steps further comprising:
    - generating a response to the voice request.
  - 17. The non-transitory computer-readable storage medium of claim 16, wherein the response is one of a voice response, a text response, a tactile response, and a Braille response.
  - 18. The non-transitory computer-readable storage medium of claim 15, wherein the plurality of language models comprises at least a background model and a transducer model.
  - 19. The non-transitory computer-readable storage medium of claim 15, the steps further comprising:
    - determining a confidence score for the voice request from the application of one of the plurality of language models to the voice request.
  - 20. The non-transitory computer-readable storage medium of claim 15, the steps further comprising:
    - restarting speech recognition when the confidence score is below a threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Narayanan, Shrikanth Sambasivan, Parthasarathy, Sarangarajan, Rose, Richard Cameron, Rosenberg, Aaron Edward, GAJIC, Bojana

Granted Patent

US 9,058,810 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/8
CPC Class Codes

G10L 15/07 to the speaker

G10L 15/20 Speech recognition techniqu...

SYSTEM AND METHOD OF PERFORMING USER-SPECIFIC AUTOMATIC SPEECH RECOGNITION

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

308 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

SYSTEM AND METHOD OF PERFORMING USER-SPECIFIC AUTOMATIC SPEECH RECOGNITION

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

308 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others