Systems and methods for providing metadata-dependent language models

US 9,626,960 B2
Filed: 04/25/2013
Issued: 04/18/2017
Est. Priority Date: 04/25/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

using at least one computer hardware processor to perform;

(A) obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data;

(B) identifying, by processing the language data, a set of one or more of the metadata attributes to use for clustering the instances of training data, the set of metadata attributes comprising a first set of metadata attributes and a second set of metadata attributes;

(C) clustering, using an automated clustering technique, the training data instances based on their respective values for the first set of metadata attributes into a first plurality of clusters;

(D) generating a basis language model for each of the first plurality of clusters to obtain a plurality of basis language models and storing the plurality of basis language models in at least one computer hardware memory;

(E) clustering the training data instances based on their respective values for the second set of metadata attributes into a second plurality of clusters different from the first plurality of clusters, the second plurality of clusters comprising a first cluster of training data instances and a second cluster of training data instances;

(F) generating a language model for each of the second plurality of clusters as a respective mixture of the plurality of basis language models at least in part by;

generating a first language model for the first cluster of training data instances as a first mixture of basis language models in the plurality of basis language models, the first mixture of basis language models comprising at least a first basis language model weighted by a first mixture weight and a second basis language model weighted by a second mixture weight, wherein generating the first language model comprises using an expectation-maximization technique to estimate the first mixture weight and the second mixture weight using data in the first cluster of training data instances; and

generating a second language model for the second cluster of training data instances as a second mixture of basis language models in the plurality of basis language models by estimating mixture weights of basis language models in the second mixture using data in the second cluster of training data instances; and

(G) receiving a voice utterance and recognizing the voice utterance using the generated first language model to obtain text corresponding to the voice utterance.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for generating language models. The techniques include: obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data; identifying, by processing the language data using at least one processor, a set of one or more of the metadata attributes to use for clustering the instances of training data into a plurality of clusters; clustering the training data instances based on their respective values for the identified set of metadata attributes into the plurality of clusters; and generating a language model for each of the plurality of clusters.

Citations

17 Claims

1. A method comprising:
- using at least one computer hardware processor to perform;
  
  (A) obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data;
  
  (B) identifying, by processing the language data, a set of one or more of the metadata attributes to use for clustering the instances of training data, the set of metadata attributes comprising a first set of metadata attributes and a second set of metadata attributes;
  
  (C) clustering, using an automated clustering technique, the training data instances based on their respective values for the first set of metadata attributes into a first plurality of clusters;
  
  (D) generating a basis language model for each of the first plurality of clusters to obtain a plurality of basis language models and storing the plurality of basis language models in at least one computer hardware memory;
  
  (E) clustering the training data instances based on their respective values for the second set of metadata attributes into a second plurality of clusters different from the first plurality of clusters, the second plurality of clusters comprising a first cluster of training data instances and a second cluster of training data instances;
  
  (F) generating a language model for each of the second plurality of clusters as a respective mixture of the plurality of basis language models at least in part by;
  
  generating a first language model for the first cluster of training data instances as a first mixture of basis language models in the plurality of basis language models, the first mixture of basis language models comprising at least a first basis language model weighted by a first mixture weight and a second basis language model weighted by a second mixture weight, wherein generating the first language model comprises using an expectation-maximization technique to estimate the first mixture weight and the second mixture weight using data in the first cluster of training data instances; and
  
  generating a second language model for the second cluster of training data instances as a second mixture of basis language models in the plurality of basis language models by estimating mixture weights of basis language models in the second mixture using data in the second cluster of training data instances; and
  
  (G) receiving a voice utterance and recognizing the voice utterance using the generated first language model to obtain text corresponding to the voice utterance.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the one or more metadata attributes comprise a plurality of metadata attributes, wherein the act (B) comprises:
    - automatically evaluating multiple of the plurality of metadata attributes, the automatically evaluating a first of the multiple metadata attributes comprising generating at least one language model for at least one group of training data instances obtained by dividing the training data instances based on their respective values for the first metadata attribute; and
      
      identifying the set of metadata attributes based on results of the evaluation.
  - 3. The method of claim 2, wherein automatically evaluating the first metadata attribute comprises:
    - dividing the training data instances, based on their respective values for the first metadata attribute, to obtain two or more groups of training data instances;
      
      generating a child language model for each of the obtained groups of training data instances;
      
      generating a parent language model using all the training data instances; and
      
      calculating a score for the first metadata attribute based at least in part how a measure of the goodness of fit of the generated child language models compares with a measure of the goodness of fit of the generated parent language model.
  - 4. The method of claim 3, wherein generating the child language model for each of the obtained groups comprises generating a unigram language model for each of the obtained groups.
  - 5. The method of claim 3, wherein the measure of the goodness of fit of the generated parent language model is a likelihood, according to the generated parent language model, of the training data instances used to generate the parent language model.
  - 6. The method of claim 1, wherein the first set of metadata attributes is different from the second set of metadata attributes.

7. A system comprising:
- at least one computer hardware processor configured to perform acts of;
  
  (A) obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data;
  
  (B) identifying, by processing the language data, a set of one or more of the metadata attributes to use for clustering the instances of training data, the set of metadata attributes comprising a first set of metadata attributes and a second set of metadata attributes;
  
  (C) clustering, using an automated clustering technique, the training data instances based on their respective values for the first set of metadata attributes into a first plurality of clusters;
  
  (D) generating a basis language model for each of the first plurality of clusters to obtain a plurality of basis language models and storing the plurality of basis language models in at least one computer hardware memory;
  
  (E) clustering the training data instances based on their respective values for the second set of metadata attributes into a second plurality of clusters different from the first plurality of clusters, the second plurality of clusters comprising a first cluster of training data instances and a second cluster of training data instances;
  
  (F) generating a language model for each of the second plurality of clusters as a respective mixture of the plurality of basis language models at least in part by;
  
  generating a first language model for the first cluster of training data instances as a first mixture of basis language models in the plurality of basis language models, the first mixture of basis language models comprising at least a first basis language model weighted by a first mixture weight and a second basis language model weighted by a second mixture weight, wherein generating the first language model comprises using an expectation-maximization technique to estimate the first mixture weight and the second mixture weight using data in the first cluster of training data instances; and
  
  generating a second language model for the second cluster of training data instances as a second mixture of basis language models in the plurality of basis language models by estimating mixture weights of basis language models in the second mixture using data in the second cluster of training data instances; and
  
  (G) receiving a voice utterance and recognizing the voice utterance using the generated first language model to obtain text corresponding to the voice utterance.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the one or more metadata attributes comprise a plurality of metadata attributes, wherein the act (B) comprises:
    - automatically evaluating multiple of the plurality of metadata attributes, the automatically evaluating a first of the multiple metadata attributes comprising generating at least one language model for at least one group of training data instances obtained by dividing the training data instances based on their respective values for the first metadata attribute; and
      
      identifying the set of metadata attributes based on results of the evaluation.
  - 9. The system of claim 8, wherein automatically evaluating the first metadata attribute comprises:
    - dividing the training data instances, based on their respective values for the first metadata attribute, to obtain two or more groups of training data instances;
      
      generating a child language model for each of the obtained groups of training data instances;
      
      generating a parent language model using all the training data instances; and
      
      calculating a score for the first metadata attribute based at least in part how a measure of the goodness of fit of the generated child language models compares with a measure of the goodness of fit of the generated parent language model.
  - 10. The system of claim 9, wherein generating the child language model for each of the obtained groups comprises generating a unigram language model for each of the obtained groups.
  - 11. The system of claim 9, wherein the measure of the goodness of fit of the generated parent language model is a likelihood, according to the generated parent language model, of the training data instances used to generate the parent language model.
  - 12. The system of claim 7, wherein the first set of metadata attributes is different from the second set of metadata attributes.

13. At least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one computer hardware processor, cause the at least one computer hardware processor to perform a method comprising acts of:
- (A) obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data;
  
  (B) identifying, by processing the language data, a set of one or more of the metadata attributes to use for clustering the instances of training data, the set of metadata attributes comprising a first set of metadata attributes and a second set of metadata attributes;
  
  (C) clustering, using an automated clustering technique, the training data instances based on their respective values for the first set of metadata attributes into a first plurality of clusters;
  
  (D) generating a basis language model for each of the first plurality of clusters to obtain a plurality of basis language models and storing the plurality of basis language models in at least one computer hardware memory;
  
  (E) clustering the training data instances based on their respective values for the second set of metadata attributes into a second plurality of clusters different from the first plurality of clusters, the second plurality of clusters comprising a first cluster of training data instances and a second cluster of training data instances;
  
  (F) generating a language model for each of the second plurality of clusters as a respective mixture of the plurality of basis language models at least in part by;
  
  generating a first language model for the first cluster of training data instances as a first mixture of basis language models in the plurality of basis language models, the first mixture of basis language models comprising at least a first basis language model weighted by a first mixture weight and a second basis language model weighted by a second mixture weight, wherein generating the first language model comprises using an expectation-maximization technique to estimate the first mixture weight and the second mixture weight using data in the first cluster of training data instances; and
  
  generating a second language model for the second cluster of training data instances as a second mixture of basis language models in the plurality of basis language models by estimating mixture weights of basis language models in the second mixture using data in the second cluster of training data instances; and
  
  (G) receiving a voice utterance and recognizing the voice utterance using the generated first language model to obtain text corresponding to the voice utterance.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The at least one non-transitory computer-readable storage medium of claim 13, wherein the one or more metadata attributes comprise a plurality of metadata attributes, wherein the act (B) comprises:
    - automatically evaluating multiple of the plurality of metadata attributes, the automatically evaluating a first of the multiple metadata attributes comprising generating at least one language model for at least one group of training data instances obtained by dividing the training data instances based on their respective values for the first metadata attribute; and
      
      identifying the set of metadata attributes based on results of the evaluation.
  - 15. The at least one non-transitory computer-readable storage medium of claim 14, wherein automatically evaluating the first metadata attribute comprises:
    - dividing the training data instances, based on their respective values for the first metadata attribute, to obtain two or more groups of training data instances;
      
      generating a child language model for each of the obtained groups of training data instances;
      
      generating a parent language model using all the training data instances; and
      
      calculating a score for the first metadata attribute based at least in part how a measure of the goodness of fit of the generated child language models compares with a measure of the goodness of fit of the generated parent language model.
  - 16. The at least one non-transitory computer-readable storage medium of claim 15, wherein generating the child language model for each of the obtained groups comprises generating a unigram language model for each of the obtained groups.
  - 17. The at least one non-transitory computer-readable storage medium of claim 13, wherein the first set of metadata attributes is different from the second set of metadata attributes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Vozila, Paul J., Tam, Wilson, Lenke, Nils
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Shin, Seong Ah A

Application Number

US13/870,356
Publication Number

US 20140324434A1
Time in Patent Office

1,454 Days
Field of Search

704235, 704238, 704245, 704257, 704243
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 2015/0631   Creating reference template...

G10L 2015/226   using non-speech characteri...

Systems and methods for providing metadata-dependent language models

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for providing metadata-dependent language models

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links