Adaptation of exponential models
First Claim
1. A non-transitory computer storage medium having computer-executable instructions that when executed by a processor cause the processor to perform steps comprising:
- for each of a first set of feature threshold counts, performing steps comprising;
selecting a set of features from background data, where each feature in the set appears in the background data more than a number of times represented by the feature threshold count;
for each of a first set of variances of a prior model, performing steps comprising;
training a set of weights comprising a separate weight for each feature in the selected set of features from the background data such that the set of weights maximizes the likelihood of the set of background data using update equations for the weights that are based on an exponential probability model and relative frequencies in the background data of co-occurrences of contexts and capitalization tags, wherein each trained set of weights and respective selected set of features from the background data represent a separate model;
applying each separate model to a set of background development data and selecting the model with the best accuracy as an initial model having an initial set of weights and an initial set of features from the background data;
for each of a second set of feature threshold counts performing steps comprising;
selecting a set of features from adaptation data, where each feature in the set of features appears in the adaptation data more than a number of times represented by the feature threshold count from the second set of feature threshold counts, wherein the adaptation data is smaller than the background data;
for each of a second set of variances of the prior model, performing steps comprising;
the processor determining an adapted set of weights comprising a separate weight for each feature in a union of the selected set of features from the adaptation data and the initial set of features from the background data such that the set of weights maximize the likelihood of a set of adaptation data, and such that a weight for a feature that is present in the initial set of features from the background data but that is not present in the selected set of features from the adaptation data is updated when determining an adapted set of weights, wherein the likelihood of the set of adaptation data is based on;
a second exponential probability model;
a prior model for the set of weights that comprises means with values equal to the initial set of weights for features that are present in the initial set of features from the background data and means with values equal to zero for features that are not present in the initial set of features from the background data but that are present in the set of features from the adaptation data; and
relative frequencies in the adaptation data of co-occurrences of contexts and capitalization tags;
selecting a set of adapted weights as a final adapted model be determining which set of adapted weights provides the highest likelihood for a asset of adaptation development data.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for adapting an exponential probability model. In a first stage, a general-purpose background model is built from background data by determining a set of model parameters for the probability model based on a set of background data. The background model parameters are then used to define a prior model for the parameters of an adapted probability model that is adapted and more specific to an adaptation data set of interest. The adaptation data set is generally of much smaller size than the background data set. A second set of model parameters are then determined for the adapted probability model based on the set of adaptation data and the prior model.
11 Citations
10 Claims
-
1. A non-transitory computer storage medium having computer-executable instructions that when executed by a processor cause the processor to perform steps comprising:
-
for each of a first set of feature threshold counts, performing steps comprising; selecting a set of features from background data, where each feature in the set appears in the background data more than a number of times represented by the feature threshold count; for each of a first set of variances of a prior model, performing steps comprising; training a set of weights comprising a separate weight for each feature in the selected set of features from the background data such that the set of weights maximizes the likelihood of the set of background data using update equations for the weights that are based on an exponential probability model and relative frequencies in the background data of co-occurrences of contexts and capitalization tags, wherein each trained set of weights and respective selected set of features from the background data represent a separate model; applying each separate model to a set of background development data and selecting the model with the best accuracy as an initial model having an initial set of weights and an initial set of features from the background data; for each of a second set of feature threshold counts performing steps comprising; selecting a set of features from adaptation data, where each feature in the set of features appears in the adaptation data more than a number of times represented by the feature threshold count from the second set of feature threshold counts, wherein the adaptation data is smaller than the background data; for each of a second set of variances of the prior model, performing steps comprising; the processor determining an adapted set of weights comprising a separate weight for each feature in a union of the selected set of features from the adaptation data and the initial set of features from the background data such that the set of weights maximize the likelihood of a set of adaptation data, and such that a weight for a feature that is present in the initial set of features from the background data but that is not present in the selected set of features from the adaptation data is updated when determining an adapted set of weights, wherein the likelihood of the set of adaptation data is based on; a second exponential probability model;
a prior model for the set of weights that comprises means with values equal to the initial set of weights for features that are present in the initial set of features from the background data and means with values equal to zero for features that are not present in the initial set of features from the background data but that are present in the set of features from the adaptation data; andrelative frequencies in the adaptation data of co-occurrences of contexts and capitalization tags; selecting a set of adapted weights as a final adapted model be determining which set of adapted weights provides the highest likelihood for a asset of adaptation development data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
a processor selecting a set of features from a set of background data by selecting features that occur in the set of background data more than a number of times represented by a threshold count for the background data; the processor determining an initial set of weights that maximize the likelihood of a set of background data, wherein the likelihood is based on an exponential probability model and wherein there is a separate initial weight for each feature in the selected set of features from the background data; the processor selecting a set of features from a set adaptation data by selecting features that occur in the set of adaptation data more than a number of times represented by a threshold count for the adaptation data; the processor determining an adapted set of weights that maximize the likelihood of a set of adaptation data, wherein the set of adaptation data is smaller than the set of background data and wherein the likelihood is based on a second exponential probability model and a prior model of a distribution of weights comprising a separate mean for each feature in the union of the set of features from the set of background data and the set of features from the set of adaptation data, wherein each mean for a feature in the set of features from the background data has a value equal to the value of the initial weight for that feature and wherein each mean for a feature that is not in the set of features from the background data but is in the set of features from the adaptation data has a value equal to zero. - View Dependent Claims (7, 8, 9, 10)
-
Specification