System and method for rapid customization of speech recognition models

US 9,679,561 B2
Filed: 03/28/2011
Issued: 06/13/2017
Est. Priority Date: 03/28/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving speech from a user as part of a speech dialog between the user and a speech recognition service;

identifying, based on the speech, a speech pattern of the user;

identifying, based on the speech pattern of the user, a plurality of speech recognition models stored in a cloud computing storage environment, each speech recognition model of the plurality of speech recognition models being from a respective speech recognition domain;

combining, via a processor, the plurality of speech recognition models, to yield a multi-domain combined speech recognition model;

identifying in the speech a specific speech recognition domain, wherein the specific speech recognition domain does not match a specific speech recognition model in the plurality of speech recognition models;

receiving sample data associated with the specific speech recognition domain, wherein the sample data is independent of the speech dialog between the user and the speech recognition service;

when the sample data is more than a minimum threshold, generating a new domain-specific speech recognition model for the specific speech recognition domain; and

when the sample data is less than the minimum threshold, modifying the multi-domain combined speech recognition model specifically to the specific speech recognition domain by weighting components of the multi-domain combined speech recognition model associated with the specific speech recognition domain to have more influence in recognition of the speech from the user.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.

Citations

20 Claims

1. A method comprising:
- receiving speech from a user as part of a speech dialog between the user and a speech recognition service;
  
  identifying, based on the speech, a speech pattern of the user;
  
  identifying, based on the speech pattern of the user, a plurality of speech recognition models stored in a cloud computing storage environment, each speech recognition model of the plurality of speech recognition models being from a respective speech recognition domain;
  
  combining, via a processor, the plurality of speech recognition models, to yield a multi-domain combined speech recognition model;
  
  identifying in the speech a specific speech recognition domain, wherein the specific speech recognition domain does not match a specific speech recognition model in the plurality of speech recognition models;
  
  receiving sample data associated with the specific speech recognition domain, wherein the sample data is independent of the speech dialog between the user and the speech recognition service;
  
  when the sample data is more than a minimum threshold, generating a new domain-specific speech recognition model for the specific speech recognition domain; and
  
  when the sample data is less than the minimum threshold, modifying the multi-domain combined speech recognition model specifically to the specific speech recognition domain by weighting components of the multi-domain combined speech recognition model associated with the specific speech recognition domain to have more influence in recognition of the speech from the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein each respective speech recognition domain comprises one of business, finance, travel, medical, sports, news, politics, entertainment, and education.
  - 3. The method of claim 1, wherein each speech recognition model in the plurality of speech recognition models performs automatic speech recognition using a list of words specific to the respective speech recognition domain of the each speech recognition model.
  - 4. The method of claim 1, wherein the modifying of the multi-domain combined speech recognition model is performed on-demand in response to a request.
  - 5. The method of claim 1, wherein the plurality of speech recognition models comprises two speech recognition models from different domains.
  - 6. The method of claim 1, wherein the multi-domain combined speech recognition model and one of the plurality of speech recognition models are from different domains.
  - 7. The method of claim 1, further comprising receiving, in add specific to the specific speech recognition domain, one of text, transition data, metadata, and audio, specific to the specific speech recognition domain.
  - 8. The method of claim 1, wherein the specific speech recognition domain is specific to speech patterns of a particular user over time, wherein the speech patterns of the particular user are updated based on the speech.
  - 9. The method of claim 1, wherein the modifying of the multi-domain combined speech recognition model further comprises sampling the speech.
  - 10. The method of claim 1, further comprising recognizing additional speech using one of the new domain-specific speech recognition model and a modified multi-domain combined speech recognition model.

11. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instruction stored which, when executed by the processor, result in the processor performing operations comprising;
  
  receiving speech from a user as part of a speech dialog between the user and a speech recognition service;
  
  identifying, based on the speech, a speech pattern of the user;
  
  identifying, based on the speech pattern of the user, a plurality of speech recognition models stored in a cloud computing storage environment, each speech recognition model of the plurality of speech recognition models being from a respective speech recognition domain;
  
  combining the plurality of speech recognition models, to yield a multi-domain combined speech recognition model;
  
  identifying in the speech a specific speech recognition domain, wherein the specific speech recognition domain does not match a specific speech recognition model in the plurality of speech recognition models;
  
  receiving sample data associated with the specific speech recognition domain, wherein the sample data is independent of the speech dialog between the user and the speech recognition service;
  
  when the sample data is more than a minimum threshold, generating a new domain-specific speech recognition model for the specific speech recognition domain; and
  
  when sample data is less than the minimum threshold, modifying the multi-domain combined speech recognition model specifically to the specific speech recognition domain by weighting components of the multi-domain combined speech recognition model associated with the specific speech recognition domain to have more influence in recognition of the speech from the user.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The system of claim 11, wherein the modifying of the multi-domain combined speech recognition model is performed on-demand in response to a request.
  - 13. The system of claim 11, wherein the plurality of speech recognition models comprises two speech recognition models from different domains.
  - 14. The system of claim 11, wherein the multi-domain combined speech recognition model and one of the plurality of speech recognition models are from different domains.
  - 15. The system of claim 11, the computer-readable storage medium having additional instructions stored which, when executed by the processor, cause the processor to perform operations comprising receiving, in addition to the speech specific to the specific speech recognition domain, one of text, transition data, metadata, and audio, specific to the specific speech recognition domain.

16. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
- receiving speech from a user as part of a speech dialog between the user and a speech recognition service;
  
  identifying, based on the speech, a speech pattern of the user;
  
  identifying, based on the speech pattern of the user, a plurality of speech recognition models stored in a cloud computing storage environment, each speech recognition model of the plurality of speech recognition models being from a respective speech recognition domain;
  
  combining the plurality of speech recognition models, to yield a multi-domain combined speech recognition model;
  
  identifying in the speech a specific speech recognition domain, wherein the specific speech recognition domain does not match a specific speech recognition model in the plurality of speech recognition models;
  
  receiving sample data associated with the specific speech recognition domain, wherein the sample data is independent of the speech dialog between the user and the speech recognition service;
  
  when the sample data is more than a minimum threshold generating a new domain-specific speech recognition model for the specific speech recognition domain; and
  
  when the sample data is less than the minimum threshold, modifying the multi-domain combined speech recognition model specifically to the specific speech recognition domain by weighting components of the multi-domain combined speech recognition model associated with the specific speech recognition domain to have more influence in recognition of the speech from the user.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable storage device of claim 16, wherein combining the plurality of speech recognition models is performed at one of a core n-gram level and a sentence level.
  - 18. The computer-readable storage device of claim 16, wherein the modifying of the multi-domain combined speech recognition model further comprises sampling the speech.
  - 19. The computer-readable storage device of claim 16, having additional instructions stored which, when executed by the computing device, cause the computing device to perform operations comprising recognizing speech using the multi-domain combined speech recognition model.
  - 20. The computer-readable storage device of claim 16, wherein the specific speech recognition domain is specific to speech patterns of a particular user over time, wherein the speech patterns of the particular user are updated based on the speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Bell, Robert, Gilbert, Mazin, Caseiro, Diamantino Antonio, Haffner, Patrick, Bangalore, Srinivas
Primary Examiner(s)
BAKER, MATTHEW H

Application Number

US13/072,920
Publication Number

US 20120253799A1
Time in Patent Office

2,269 Days
Field of Search

704231, 704235, 704 9, 704270, 7042701
US Class Current
CPC Class Codes

G10L 15/06   Creation of reference templ...

G10L 15/065   Adaptation

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/0635   updating or merging of old ...

G10L 2015/0636   Threshold criteria for the ...

G10L 2015/228   of application context

System and method for rapid customization of speech recognition models

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for rapid customization of speech recognition models

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links