Systems and methods for providing metadata-dependent language models

US 10,102,849 B2
Filed: 03/24/2017
Issued: 10/16/2018
Est. Priority Date: 04/25/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

training, using at least one computer hardware processor to perform an automated two-stage training procedure having a first training stage and a second training stage different from the first training stage, an automatic speech recognition (ASR) engine at least in part by generating one or more language models for use as part of the ASR engine, the training comprising;

obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data;

identifying, by processing the language data, a set of the one or more metadata attributes to use for clustering the instances of training data, the set of metadata attributes comprising first and second sets of metadata attributes;

performing the first training stage, comprising;

clustering the training data instances based on their respective values for the first set of metadata attributes to obtain a first plurality of clusters, the clustering comprising dividing the training data instances into the first plurality of clusters based on their respective values for the first set of metadata attributes; and

generating a respective language model for multiple clusters of the first plurality of clusters to obtain a plurality of language models, the generating comprising using training data in each of one or more of the multiple clusters to generate a respective language model in the plurality of language models;

performing the second training stage, comprising;

clustering the training data instances based on their respective values for the second set of metadata attributes to obtain a second plurality of clusters, the clustering comprising subdividing the training data instances in the first plurality of clusters based on their respective values for the second set of metadata attributes to obtain the second plurality of clusters; and

generating a first language model for a first cluster in the second plurality of clusters as a first weighted mixture of language models in the plurality of language models by estimating weights of the language models in the first weighted mixture using training data instances in the first cluster; and

storing the plurality of language models and the first language model for use as part of the ASR engine.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for generating language models. The techniques include: obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data; identifying, by processing the language data using at least one processor, a set of one or more of the metadata attributes to use for clustering the instances of training data into a plurality of clusters; clustering the training data instances based on their respective values for the identified set of metadata attributes into the plurality of clusters; and generating a language model for each of the plurality of clusters.

18 Citations

20 Claims

1. A method comprising:
- training, using at least one computer hardware processor to perform an automated two-stage training procedure having a first training stage and a second training stage different from the first training stage, an automatic speech recognition (ASR) engine at least in part by generating one or more language models for use as part of the ASR engine, the training comprising;
  
  obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data;
  
  identifying, by processing the language data, a set of the one or more metadata attributes to use for clustering the instances of training data, the set of metadata attributes comprising first and second sets of metadata attributes;
  
  performing the first training stage, comprising;
  
  clustering the training data instances based on their respective values for the first set of metadata attributes to obtain a first plurality of clusters, the clustering comprising dividing the training data instances into the first plurality of clusters based on their respective values for the first set of metadata attributes; and
  
  generating a respective language model for multiple clusters of the first plurality of clusters to obtain a plurality of language models, the generating comprising using training data in each of one or more of the multiple clusters to generate a respective language model in the plurality of language models;
  
  performing the second training stage, comprising;
  
  clustering the training data instances based on their respective values for the second set of metadata attributes to obtain a second plurality of clusters, the clustering comprising subdividing the training data instances in the first plurality of clusters based on their respective values for the second set of metadata attributes to obtain the second plurality of clusters; and
  
  generating a first language model for a first cluster in the second plurality of clusters as a first weighted mixture of language models in the plurality of language models by estimating weights of the language models in the first weighted mixture using training data instances in the first cluster; and
  
  storing the plurality of language models and the first language model for use as part of the ASR engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the one or more metadata attributes comprise a plurality of metadata attributes, wherein the act of identifying the set of metadata attributes to use for clustering the instances of training data comprises:
    - automatically evaluating multiple of the plurality of metadata attributes, the automatically evaluating a first of the multiple metadata attributes comprising generating at least one language model for at least one group of training data instances obtained by dividing the training data instances based on their respective values for the first metadata attribute; and
      
      identifying the set of metadata attributes based on results of the evaluation.
  - 3. The method of claim 2, wherein automatically evaluating the first metadata attribute comprises:
    - dividing the training data instances, based on their respective values for the first metadata attribute, to obtain two or more groups of training data instances;
      
      generating a child language model for each of the obtained groups of training data instances;
      
      generating a parent language model using all the training data instances; and
      
      calculating a score for the first metadata attribute based at least in part on how a measure of the goodness of fit of the generated child language models compares with a measure of the goodness of fit of the generated parent language model.
  - 4. The method of claim 3, wherein generating the child language model for each of the obtained groups comprises generating a unigram language model for each of the obtained groups.
  - 5. The method of claim 3, wherein the measure of the goodness of fit of the generated parent language model is a likelihood, according to the generated parent language model, of the training data instances used to generate the parent language model.
  - 6. The method of claim 1, further comprising:
    - generating a second language model for a second cluster in the second plurality of clusters as a second weighted mixture of language models in the plurality of language models by estimating weights of the language models in the second weighted mixture using training data instances in the second cluster.
  - 7. The method of claim 1, wherein estimating the weights is performed using an expectation maximization algorithm.

8. A system comprising:
- at least one processor configured to perform acts of;
  
  training, using an automated two-stage training procedure having a first training stage and a second training stage different from the first training stage, an automatic speech recognition (ASR) engine at least in part by generating one or more language models for use as part of the ASR engine, the training comprising;
  
  obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data;
  
  identifying, by processing the language data, a set of the one or more metadata attributes to use for clustering the instances of training data, the set of metadata attributes comprising first and second sets of metadata attributes;
  
  performing the first training stage, comprising;
  
  clustering the training data instances based on their respective values for the first set of metadata attributes to obtain a first plurality of clusters, the clustering comprising dividing the training data instances into the first plurality of clusters based on their respective values for the first set of metadata attributes; and
  
  generating a respective language model for multiple of the first plurality of clusters to obtain a plurality of language models, the generating comprising using training data in each of one or more of the multiple clusters to generate a respective language model in the plurality of language models;
  
  performing the second training stage, comprising;
  
  clustering the training data instances based on their respective values for the second set of metadata attributes to obtain a second plurality of clusters, the clustering comprising subdividing the training data instances in the first plurality of clusters based on their respective values for the second set of metadata attributes to obtain the second plurality of clusters; and
  
  generating a first language model for a first cluster in the second plurality of clusters as a first weighted mixture of language models in the plurality of language models by estimating weights of the language models in the first weighted mixture using training data instances in the first cluster; and
  
  storing the plurality of language models and the first language model for use as part of the ASR engine.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. the system of claim 8, wherein the one or more metadata attributes comprise a plurality of metadata attributes, wherein the act of identifying the set of metadata attributes to use for clustering the instances of training data comprises:
    - automatically evaluating multiple of the plurality of metadata attributes, the automatically evaluating a first of the multiple metadata attributes comprising generating at least one language model for at least one group of training data instances obtained by dividing the training data instances based on their respective values for the first metadata attribute; and
      
      identifying the set of metadata attributes based on results of the evaluation.
  - 10. the system of claim 9, wherein automatically evaluating the first metadata attribute comprises:
    - dividing the training data instances, based on their respective values for the first metadata attribute, to obtain two or more groups of training data instances;
      
      generating a child language model for each of the obtained groups of training data instances;
      
      generating a parent language model using all the training data instances; and
      
      calculating a score for the first metadata attribute based at least in part on how a measure of the goodness of fit of the generated child language models compares with a measure of the goodness of fit of the generated parent language model.
  - 11. The system of claim 10, wherein generating the child language model for each of the obtained groups comprises generating a unigram language model for each of the obtained groups.
  - 12. The system of claim 10, wherein the measure of the goodness of fit of the generated parent language model is a likelihood, according to the generated parent language model, of the training data instances used to generate the parent language model.
  - 13. The system of claim 8, wherein the at least one processor is further configured to generate a second language model for a second cluster in the second plurality of clusters as a second weighted mixture of language models in the plurality of language models by estimating weights of the language models in the second weighted mixture using training data instances in the second cluster.
  - 14. The system of claim 8, wherein estimating the weights is performed using an expectation maximization algorithm.

15. At least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to perform a method comprising acts of:
- training, using an automated two-stage training procedure having a first training stage and a second training stage different from the first training stage, an automatic speech recognition (ASR) engine at least in part by generating one or more language models for use as part of the ASR engine, the training comprising;
  
  obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data;
  
  identifying, by processing the language data, a set of the one or more metadata attributes to use for clustering the instances of training data, the set of metadata attributes comprising first and second sets of metadata attributes;
  
  performing the first stage, comprising;
  
  clustering the training data instances based on their respective values for the first set of metadata attributes to obtain a first plurality of clusters, the clustering comprising dividing the training data instances into the first plurality of clusters based on their respective values for the first set of metadata attributes; and
  
  generating a respective language model for multiple of the first plurality of clusters to obtain a plurality of language models, the generating comprising using the training data in each of the one or more of the multiple clusters to generate a respective language model in the plurality of language models;
  
  performing the second stage, comprising;
  
  clustering the training data instances based on their respective values for the second set of metadata attributes to obtain a second plurality of clusters, the clustering comprising subdividing the training data instances in the first plurality of clusters based on their respective values for the second set of metadata attributes to obtain the second plurality of clusters; and
  
  generating a first language model for a first cluster in the second plurality of clusters as a first weighted mixture of language models in the plurality of language models by estimating weights of the language models in the first weighted mixture using training data instances in the first cluster; and
  
  storing the plurality of language models and the first language model for use as part of the ASR engine.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The at least one non-transitory computer-readable storage medium of claim 15, wherein the one or more metadata attributes comprise a plurality of metadata attributes, wherein the act of identifying the set of metadata attributes to use for clustering the instances of training data comprises:
    - automatically evaluating multiple of the plurality of metadata attributes, the automatically evaluating a first of the multiple metadata attributes comprising generating at least one language model for at least one group of training data instances obtained by dividing the training data instances based on their respective values for the first metadata attribute; and
      
      identifying the set of metadata attributes based on results of the evaluation.
  - 17. The at least one non-transitory computer-readable storage medium of claim 16, wherein automatically evaluating the first metadata attribute comprises:
    - dividing the training data instances, based on their respective values for the first metadata attribute, to obtain two or more groups of training data instances;
      
      generating a child language model for each of the obtained groups of training data instances;
      
      generating a parent language model using all the training data instances; and
      
      calculating a score for the first metadata attribute based at least in part on how a measure of the goodness of fit of the generated child language models compares with a measure of the goodness of fit of the generated parent language model.
  - 18. The at least one non-transitory computer-readable storage medium of claim 17, wherein generating the child language model for each of the obtained groups comprises generating a unigram language model for each of the obtained groups.
  - 19. The at least one non-transitory storage medium of claim 15 further storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to generate a second language model for a second cluster in the second plurality of clusters as a second weighted mixture of language models in the plurality of language models by estimating weights of the language models in the second weighted mixture using training data instances in the second cluster.
  - 20. The at least one non-transitory storage medium of claim 15, wherein estimating the weights is performed using an expectation maximization algorithm.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Vozila, Paul J., Tam, Wilson, Lenke, Nils
Primary Examiner(s)
Shin, Seong Ah A

Application Number

US15/469,312
Publication Number

US 20170200447A1
Time in Patent Office

571 Days
Field of Search

704 9, 704235, 704243, 704245, 704257
US Class Current
CPC Class Codes

G10L 15/063   Training

G10L 15/18   using natural language mode...

G10L 15/183   using context dependencies,...

G10L 2015/0631   Creating reference template...

G10L 2015/226   using non-speech characteri...

Systems and methods for providing metadata-dependent language models

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for providing metadata-dependent language models

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links