Semi-supervised learning based on semiparametric regularization
First Claim
1. A semisupervised learning method, comprising:
- analyzing a data set using at least one automated processor, comprising labeled data and unlabeled data, by performing a principal component analysis to derive parameters of a parametric function of the feature space reflecting a geometric structure of a marginal distribution of the data set according to its principal components;
performing supervised learning on the labeled data using the at least one automated processor, based on the parametric function of the feature space reflecting the geometric structure of the marginal distribution of the entire data set; and
storing information derived from said supervised learning in a computer memory,wherein the parametric function is dependent on both the data set and said principal component analysis.
3 Assignments
0 Petitions
Accused Products
Abstract
Semi-supervised learning plays an important role in machine learning and data mining. The semi-supervised learning problem is approached by developing semiparametric regularization, which attempts to discover the marginal distribution of the data to learn the parametric function through exploiting the geometric distribution of the data. This learned parametric function can then be incorporated into the supervised learning on the available labeled data as the prior knowledge. A semi-supervised learning approach is provided which incorporates the unlabeled data into the supervised learning by a parametric function learned from the whole data including the labeled and unlabeled data. The parametric function reflects the geometric structure of the marginal distribution of the data. Furthermore, the proposed approach which naturally extends to the out-of-sample data is an inductive learning method in nature.
19 Citations
20 Claims
-
1. A semisupervised learning method, comprising:
-
analyzing a data set using at least one automated processor, comprising labeled data and unlabeled data, by performing a principal component analysis to derive parameters of a parametric function of the feature space reflecting a geometric structure of a marginal distribution of the data set according to its principal components; performing supervised learning on the labeled data using the at least one automated processor, based on the parametric function of the feature space reflecting the geometric structure of the marginal distribution of the entire data set; and storing information derived from said supervised learning in a computer memory, wherein the parametric function is dependent on both the data set and said principal component analysis. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus for performing semisupervised learning on a data set, comprising:
-
a memory adapted to store a data set, comprising labeled data and unlabeled data; at least one automated processor, configured to analyze the data set through a parametric function derived by principal component analysis of the feature space reflecting a geometric structure of a marginal distribution of the data set according to its principal components, and performing supervised learning on the labeled data based on the parametric function derived by principal component analysis of the feature space reflecting the geometric structure of the entire data set; and a memory adapted to store information derived from said supervised learning in a computer memory. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method, comprising:
-
storing a data set comprising both labeled and unlabeled data; analyzing, with at least one automated processor, the entire data set using a statistical analysis of variance within the feature space of the data set to determine a geometric structure of the data set dependent on the statistical analysis of variance, by performing at least one orthogonal linear transform; analyzing, with the at least one automated processor, the labeled data in dependence on the determined geometric structure of the data set dependent on the statistical analysis of variance, to learn at least one classification criterion from the classification and features of the labeled data; and automatically classifying unlabeled data based on the learned classification criterion. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification