System and method for rapid customization of speech recognition models
First Claim
1. A method comprising:
- receiving speech from a user as part of a speech dialog between the user and a speech recognition service;
identifying, based on the speech, a speech pattern of the user;
identifying, based on the speech pattern of the user, a plurality of speech recognition models stored in a cloud computing storage environment, each speech recognition model of the plurality of speech recognition models being from a respective speech recognition domain;
combining, via a processor, the plurality of speech recognition models, to yield a multi-domain combined speech recognition model;
identifying in the speech a specific speech recognition domain, wherein the specific speech recognition domain does not match a specific speech recognition model in the plurality of speech recognition models;
receiving sample data associated with the specific speech recognition domain, wherein the sample data is independent of the speech dialog between the user and the speech recognition service;
when the sample data is more than a minimum threshold, generating a new domain-specific speech recognition model for the specific speech recognition domain; and
when the sample data is less than the minimum threshold, modifying the multi-domain combined speech recognition model specifically to the specific speech recognition domain by weighting components of the multi-domain combined speech recognition model associated with the specific speech recognition domain to have more influence in recognition of the speech from the user.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving speech from a user as part of a speech dialog between the user and a speech recognition service; identifying, based on the speech, a speech pattern of the user; identifying, based on the speech pattern of the user, a plurality of speech recognition models stored in a cloud computing storage environment, each speech recognition model of the plurality of speech recognition models being from a respective speech recognition domain; combining, via a processor, the plurality of speech recognition models, to yield a multi-domain combined speech recognition model; identifying in the speech a specific speech recognition domain, wherein the specific speech recognition domain does not match a specific speech recognition model in the plurality of speech recognition models; receiving sample data associated with the specific speech recognition domain, wherein the sample data is independent of the speech dialog between the user and the speech recognition service; when the sample data is more than a minimum threshold, generating a new domain-specific speech recognition model for the specific speech recognition domain; and when the sample data is less than the minimum threshold, modifying the multi-domain combined speech recognition model specifically to the specific speech recognition domain by weighting components of the multi-domain combined speech recognition model associated with the specific speech recognition domain to have more influence in recognition of the speech from the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system comprising:
-
a processor; and a computer-readable storage medium having instruction stored which, when executed by the processor, result in the processor performing operations comprising; receiving speech from a user as part of a speech dialog between the user and a speech recognition service; identifying, based on the speech, a speech pattern of the user; identifying, based on the speech pattern of the user, a plurality of speech recognition models stored in a cloud computing storage environment, each speech recognition model of the plurality of speech recognition models being from a respective speech recognition domain; combining the plurality of speech recognition models, to yield a multi-domain combined speech recognition model; identifying in the speech a specific speech recognition domain, wherein the specific speech recognition domain does not match a specific speech recognition model in the plurality of speech recognition models; receiving sample data associated with the specific speech recognition domain, wherein the sample data is independent of the speech dialog between the user and the speech recognition service; when the sample data is more than a minimum threshold, generating a new domain-specific speech recognition model for the specific speech recognition domain; and when sample data is less than the minimum threshold, modifying the multi-domain combined speech recognition model specifically to the specific speech recognition domain by weighting components of the multi-domain combined speech recognition model associated with the specific speech recognition domain to have more influence in recognition of the speech from the user. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
receiving speech from a user as part of a speech dialog between the user and a speech recognition service; identifying, based on the speech, a speech pattern of the user; identifying, based on the speech pattern of the user, a plurality of speech recognition models stored in a cloud computing storage environment, each speech recognition model of the plurality of speech recognition models being from a respective speech recognition domain; combining the plurality of speech recognition models, to yield a multi-domain combined speech recognition model; identifying in the speech a specific speech recognition domain, wherein the specific speech recognition domain does not match a specific speech recognition model in the plurality of speech recognition models; receiving sample data associated with the specific speech recognition domain, wherein the sample data is independent of the speech dialog between the user and the speech recognition service; when the sample data is more than a minimum threshold generating a new domain-specific speech recognition model for the specific speech recognition domain; and when the sample data is less than the minimum threshold, modifying the multi-domain combined speech recognition model specifically to the specific speech recognition domain by weighting components of the multi-domain combined speech recognition model associated with the specific speech recognition domain to have more influence in recognition of the speech from the user. - View Dependent Claims (17, 18, 19, 20)
-
Specification