System and method for personalization of acoustic models for automatic speech recognition

US 9,837,072 B2
Filed: 05/15/2017
Issued: 12/05/2017
Est. Priority Date: 09/16/2009
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

starting a current automatic speech recognition session for recognizing speech received from a user via a device;

identifying, via a processor, a group of speech recognition models comprising a speaker independent model and a speaker dependent model;

recognizing the speech via each model in the group of speech recognition models, to yield recognition results;

selecting, based on the recognition results, a dominant speech model from the group of speech recognition models to yield a remainder set of dropped speech recognition models; and

continuously using only the dominant speech model, without applying the remainder set of dropped speech recognition models, to recognize additional speech received from the user.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

Citations

20 Claims

1. A method comprising:
- starting a current automatic speech recognition session for recognizing speech received from a user via a device;
  
  identifying, via a processor, a group of speech recognition models comprising a speaker independent model and a speaker dependent model;
  
  recognizing the speech via each model in the group of speech recognition models, to yield recognition results;
  
  selecting, based on the recognition results, a dominant speech model from the group of speech recognition models to yield a remainder set of dropped speech recognition models; and
  
  continuously using only the dominant speech model, without applying the remainder set of dropped speech recognition models, to recognize additional speech received from the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein recognizing the speech via each model in the group of speech recognition models is performed in parallel.
  - 3. The method of claim 1, wherein selecting the dominant speech model from the group of speech recognition models is performed using a heuristic search algorithm.
  - 4. The method of claim 1, wherein the additional speech received from the user is received during a remainder of the automatic speech recognition session.
  - 5. The method of claim 1, further comprising dropping a speech model from the group of speech recognition models when recognition accuracy is below a threshold.
  - 6. The method of claim 1, further comprising selecting the dominant speech model based on the device.
  - 7. The method of claim 6, further comprising selecting the dominant speech model based on a plurality of users associated with the device.
  - 8. The method of claim 1, further comprising receiving additional utterances from the device and clustering the additional utterances to generate a new speaker dependent model.
  - 9. The method of claim 1, further comprising iteratively generating a group of selected models, recognizing the speech, and selecting the dominant speech model, each time a new automatic speech recognition session is initiated.
  - 10. The method of claim 1, wherein the dominant speech model is associated with a current location of the device for use in future speech dialogs.

11. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising;
  
  starting a current automatic speech recognition session for recognizing speech received from a user via a device;
  
  identifying a group of speech recognition models comprising a speaker independent model and a speaker dependent model;
  
  recognizing the speech via each model in the group of speech recognition models, to yield recognition results;
  
  selecting, based on the recognition results, a dominant speech model from the group of speech recognition models to yield a remainder set of dropped speech recognition models; and
  
  continuously using only the dominant speech model, without applying the remainder set of dropped speech recognition models, to recognize additional speech received from the user.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The system of claim 11, wherein the computer-readable storage medium stores additional instructions which, when executed by the processor, cause the processor to perform further operations comprising:
    - recognizing the speech via each model in the group of speech recognition models in parallel.
  - 13. The system of claim 11, wherein selecting the dominant speech model from the group of speech recognition models is performed using a heuristic search algorithm.
  - 14. The system of claim 11, wherein the additional speech received from the user is received during a remainder of the automatic speech recognition session.
  - 15. The system of claim 11, wherein the computer-readable storage medium stores additional instructions which, when executed by the processor, cause the processor to perform further operations comprising:
    - dropping a speech model from the group of speech recognition models when recognition accuracy is below a threshold.
  - 16. The system of claim 11, wherein the computer-readable storage medium stores additional instructions which, when executed by the processor, cause the processor to perform further operations comprising:
    - selecting the dominant speech model based on the device.
  - 17. The system of claim 16, wherein the computer-readable storage medium stores additional instructions which, when executed by the processor, cause the processor to perform further operations comprising:
    - selecting the dominant speech model based on a plurality of users associated with the device.
  - 18. The system of claim 11, wherein the computer-readable storage medium stores additional instructions which, when executed by the processor, cause the processor to perform further operations comprising:
    - receiving additional utterances from the device and clustering the additional utterances to generate a new speaker dependent model.
  - 19. The system of claim 11, wherein the computer-readable storage medium stores additional instructions which, when executed by the processor, cause the processor to perform further operations comprising:
    - iteratively generating a group of selected models, recognizing the speech, and selecting the dominant speech model, each time a new automatic speech recognition session is initiated.

20. A computer-readable storage device having instructions stored which, when executed by a computing device, result in the computing device performing operations comprising:
- starting a current automatic speech recognition session for recognizing speech received from a user via a device;
  
  identifying a group of speech recognition models comprising a speaker independent model and a speaker dependent model;
  
  recognizing the speech via each model in the group of speech recognition models, to yield recognition results;
  
  selecting, based on the recognition results, a dominant speech model from the group of speech recognition models to yield a remainder set of dropped speech recognition models; and
  
  continuously using only the dominant speech model, without applying the remainder set of dropped speech recognition models, to recognize additional speech received from the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Ljolje, Andrej, Caseiro, Diamantino Antonio, Conkie, Alistair D.
Primary Examiner(s)
He, Jialong

Application Number

US15/595,131
Publication Number

US 20170249937A1
Time in Patent Office

204 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/04   Segmentation; Word boundary...

G10L 15/063   Training

G10L 15/07   to the speaker

G10L 15/083   Recognition networks G10L15...

G10L 15/14   using statistical models, e...

G10L 15/22   Procedures used during a sp...

G10L 15/28   Constructional details of s...

G10L 15/32   Multiple recognisers used i...

System and method for personalization of acoustic models for automatic speech recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for personalization of acoustic models for automatic speech recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links