SEMI-SUPERVISED RANDOM DECISION FORESTS FOR MACHINE LEARNING
First Claim
1. A machine learning process comprising:
- accessing, using a processor, a plurality of labeled observations each labeled observation having a label indicating one of a plurality of classes that the labeled observation is a member of;
accessing a plurality of unlabeled observations which are unlabeled in that, for each unlabeled observation, it is not known to which one of the plurality of classes the unlabeled observation belongs;
training a plurality of random decision trees to form a semi-supervised random decision forest using both the labeled observations and the unlabeled observations such that each random decision tree partitions the labeled and the unlabeled observations into clusters according to similarity of the observations and according to the labels.
2 Assignments
0 Petitions
Accused Products
Abstract
Semi-supervised random decision forests for machine learning are described, for example, for interactive image segmentation, medical image analysis, and many other applications. In examples, a random decision forest comprising a plurality of hierarchical data structures is trained using both unlabeled and labeled observations. In examples, a training objective is used which seeks to cluster the observations based on the labels and similarity of the observations. In an example, a transducer assigns labels to the unlabeled observations on the basis of the clusters and certainty information. In an example, an inducer forms a generic clustering function by counting examples of class labels at leaves of the trees in the forest. In an example, an active learning module identifies regions in a feature space from which the observations are drawn using the clusters and certainty information; new observations from the identified regions are used to train the random decision forest.
-
Citations
20 Claims
-
1. A machine learning process comprising:
-
accessing, using a processor, a plurality of labeled observations each labeled observation having a label indicating one of a plurality of classes that the labeled observation is a member of; accessing a plurality of unlabeled observations which are unlabeled in that, for each unlabeled observation, it is not known to which one of the plurality of classes the unlabeled observation belongs; training a plurality of random decision trees to form a semi-supervised random decision forest using both the labeled observations and the unlabeled observations such that each random decision tree partitions the labeled and the unlabeled observations into clusters according to similarity of the observations and according to the labels. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A machine learning process comprising:
-
accessing, using a processor, a plurality of labeled observations each labeled observation having a label indicating one of a plurality of classes that the labeled observation is a member of; accessing a plurality of unlabeled observations which are unlabeled in that, for each unlabeled observation, it is not known to which one of the plurality of classes the unlabeled observation belongs; training a plurality of random decision trees to form a semi-supervised random decision forest using both the labeled observations and the unlabeled observations and according to a training objective which optimizes an information gain comprising an unsupervised term and a supervised term.
-
-
16. A machine learning system comprising:
-
an input arranged to receive a plurality of labeled observations each labeled observation having a label indicating one of a plurality of classes that the labeled observation is a member of; the input also arranged to access a plurality of unlabeled observations which are unlabeled in that, for each unlabeled observation, it is not known to which one of the plurality of classes the unlabeled observation belongs; a training engine arranged to train a plurality of random decision trees to form a semi-supervised random decision forest using both the labeled observations and the unlabeled observations such that each random decision tree partitions the labeled and the unlabeled observations into clusters according to similarity of the observations and according to the labels. - View Dependent Claims (17, 18, 19, 20)
-
Specification