Structured prediction model learning apparatus, method, program, and recording medium
First Claim
1. A structured prediction model learning apparatus, having a central processing unit, for learning a structured prediction model used to predict an output structure y corresponding to an input structure x, by using supervised data DL and unsupervised data DU, the structured prediction model learning apparatus comprising:
- an output candidate graph generator implemented by the central processing unit to generate a supervised data output candidate graph for the supervised data and an unsupervised data output candidate graph for the unsupervised data, by using a set of definition data for generating output candidates identified by a structured prediction problem;
a feature vector generator extracting features from the supervised data output candidate graph and the unsupervised data output candidate graph by using a feature extraction template, generating a D-dimensional base-model feature vector fx,y corresponding to a set of the features extracted from the supervised data output candidate graph, dividing a set of the features extracted from the unsupervised data output candidate graph into K subsets, and generating a Dk-dimensional auxiliary model feature vector g(k)x,y corresponding to features included in a subset k of the K subsets, where K is a natural number and kε
{1, 2, . . . , K};
a parameter generator generating a base-model parameter set λ
which includes a first parameter set w formed of D first parameters in one-to-one correspondence with D elements of the base-model feature vector fx,y, generating an auxiliary model parameter set θ
(k) formed of Dk auxiliary model parameters in one-to-one correspondence with Dk elements of the auxiliary model feature vector g(k)x,y, and to generate a set Θ
={θ
(1), θ
(2), . . . , θ
(K)} of auxiliary model parameter sets, formed of K auxiliary model parameter sets θ
(k);
an auxiliary model parameter estimating unit estimating the set Θ
of auxiliary model parameter sets which minimizes the Bregman divergence having a regularization term obtained from the auxiliary model parameter set θ
(k), between each auxiliary model qk and a reference function {tilde over (r)} (x,y) which is a nonnegative function and indicates the degree of pseudo accuracy of the output structure y corresponding to the input structure x, by using the regularization term and the unsupervised data DU, where the auxiliary model qk is obtained by defining the auxiliary model parameter set θ
(k) with a log-linear model; and
a base-model parameter estimating unit estimating a base-model parameter set λ
which minimizes an empirical risk function defined beforehand, by using the supervised data DL and the set Θ
of auxiliary model parameter sets, where the base-model parameter set λ
includes a second parameter set v={v1, v2, . . . , vK} formed of K second parameters in one-to-one correspondence with K auxiliary models;
wherein the auxiliary model parameter estimating unit uses the auxiliary model parameter set θ
(k) to obtain an L1 norm regularization term |θ
(k)|1, obtains the Bregman divergence having the regularization term as the following empirical generalized relative entropy having a regularization term
2 Assignments
0 Petitions
Accused Products
Abstract
A structured prediction model learning apparatus, method, program, and recording medium maintain prediction performance with a smaller amount of memory. An auxiliary model is introduced by defining the auxiliary model parameter set θ(k) with a log-linear model. A set Θ of auxiliary model parameter sets which minimizes the Bregman divergence between the auxiliary model and a reference function indicating the degree of pseudo accuracy is estimated by using unsupervised data. A base-model parameter set λ which minimizes an empirical risk function defined beforehand is estimated by using supervised data and the set Θ of auxiliary model parameter sets.
15 Citations
13 Claims
-
1. A structured prediction model learning apparatus, having a central processing unit, for learning a structured prediction model used to predict an output structure y corresponding to an input structure x, by using supervised data DL and unsupervised data DU, the structured prediction model learning apparatus comprising:
-
an output candidate graph generator implemented by the central processing unit to generate a supervised data output candidate graph for the supervised data and an unsupervised data output candidate graph for the unsupervised data, by using a set of definition data for generating output candidates identified by a structured prediction problem; a feature vector generator extracting features from the supervised data output candidate graph and the unsupervised data output candidate graph by using a feature extraction template, generating a D-dimensional base-model feature vector fx,y corresponding to a set of the features extracted from the supervised data output candidate graph, dividing a set of the features extracted from the unsupervised data output candidate graph into K subsets, and generating a Dk-dimensional auxiliary model feature vector g(k)x,y corresponding to features included in a subset k of the K subsets, where K is a natural number and kε
{1, 2, . . . , K};a parameter generator generating a base-model parameter set λ
which includes a first parameter set w formed of D first parameters in one-to-one correspondence with D elements of the base-model feature vector fx,y, generating an auxiliary model parameter set θ
(k) formed of Dk auxiliary model parameters in one-to-one correspondence with Dk elements of the auxiliary model feature vector g(k)x,y, and to generate a set Θ
={θ
(1), θ
(2), . . . , θ
(K)} of auxiliary model parameter sets, formed of K auxiliary model parameter sets θ
(k);an auxiliary model parameter estimating unit estimating the set Θ
of auxiliary model parameter sets which minimizes the Bregman divergence having a regularization term obtained from the auxiliary model parameter set θ
(k), between each auxiliary model qk and a reference function {tilde over (r)} (x,y) which is a nonnegative function and indicates the degree of pseudo accuracy of the output structure y corresponding to the input structure x, by using the regularization term and the unsupervised data DU, where the auxiliary model qk is obtained by defining the auxiliary model parameter set θ
(k) with a log-linear model; anda base-model parameter estimating unit estimating a base-model parameter set λ
which minimizes an empirical risk function defined beforehand, by using the supervised data DL and the set Θ
of auxiliary model parameter sets, where the base-model parameter set λ
includes a second parameter set v={v1, v2, . . . , vK} formed of K second parameters in one-to-one correspondence with K auxiliary models;wherein the auxiliary model parameter estimating unit uses the auxiliary model parameter set θ
(k) to obtain an L1 norm regularization term |θ
(k)|1, obtains the Bregman divergence having the regularization term as the following empirical generalized relative entropy having a regularization term - View Dependent Claims (2, 3, 4, 5, 6, 13)
-
-
7. A structured prediction model learning method for learning a structured prediction model used to predict an output structure y corresponding to an input structure x, by using supervised data DL and unsupervised data DU, the structured prediction model learning method comprising:
-
an output candidate graph generating step of generating a supervised data output candidate graph for the supervised data and an unsupervised data output candidate graph for the unsupervised data, by using a set of definition data for generating output candidates identified by a structured prediction problem; a feature vector generating step of extracting features from the supervised data output candidate graph and the unsupervised data output candidate graph by using a feature extraction template, generating a D-dimensional base-model feature vector fx,y corresponding to a set of the features extracted from the supervised data output candidate graph, dividing a set of the features extracted from the unsupervised data output candidate graph into K subsets, and generating a Dk-dimensional auxiliary model feature vector g(k)x,y corresponding to features included in a subset k of the K subsets, where K is a natural number and kε
{1, 2, . . . , K};a parameter generating step of generating a base-model parameter set λ
which includes a first parameter set w formed of D first parameters in one-to-one correspondence with D elements of the base-model feature vector fx,y, generating an auxiliary model parameter set θ
(k) formed of Dk auxiliary model parameters in one-to-one correspondence with Dk elements of the auxiliary model feature vector g(k)x,y, and generating a set Θ
={θ
(1), θ
(2), . . . , θ
(K)} of auxiliary model parameter sets, formed of K auxiliary model parameter sets θ
(k);an auxiliary model parameter estimating step of estimating the set Θ
of auxiliary model parameter sets which minimizes the Bregman divergence having a regularization term obtained from the auxiliary model parameter set θ
(k), between each auxiliary model qk and a reference function {tilde over (r)}(x, y) which is a nonnegative function and indicates the degree of pseudo accuracy of the output structure y corresponding to the input structure x, by using the regularization term and the unsupervised data DU, where the auxiliary model qk is obtained by defining the auxiliary model parameter set θ
(k) with a log-linear model; anda base-model parameter estimating step of estimating a base-model parameter set λ
which minimizes an empirical risk function defined beforehand, by using the supervised data DL and the set Θ
of auxiliary model parameter sets, where the base-model parameter set λ
includes a second parameter set v={v1, v2, . . . , vK} formed of K second parameters in one-to-one correspondence with K auxiliary models;wherein, in the auxiliary model parameter estimating step, the auxiliary model parameter set θ
(k) is used to obtain an L1 norm regularization term |θ
(k)|1, the Bregman divergence having the regularization term is obtained as the following empirical generalized relative entropy having a regularization term - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification