UNSUPERVISED LEARNING USING GLOBAL FEATURES, INCLUDING FOR LOG-LINEAR MODEL WORD SEGMENTATION
First Claim
1. In a computing environment, a method performed on at least one processor, comprising, performing unsupervised learning on examples in training data, including processing the examples to extract global features, in which the global features are based on a plurality of the examples, and learning a model from the global features.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.
17 Citations
20 Claims
- 1. In a computing environment, a method performed on at least one processor, comprising, performing unsupervised learning on examples in training data, including processing the examples to extract global features, in which the global features are based on a plurality of the examples, and learning a model from the global features.
-
10. In a computing environment, a method performed on at least one processor, comprising:
-
(a) processing unlabeled examples of words into an interim segmented corpus and an interim log-linear model; (b) using the interim log-linear model to reprocess the interim segmented corpus into a revised segmented corpus and a revised log-linear model; (c) iterating until a stop criterion is met by returning to step (b) with the revised segmented corpus being used as the interim corpus and the revised log-linear model being used as the interim model; and (d) when the stop criterion is met, outputting the log-linear model for use in morphological segmentation. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising:
-
(a) processing unlabeled examples of words into global features, in which the global features are based on a plurality of the examples; (b) sampling segmentations of the examples to produce an interim segmented corpus and an interim log-linear model that uses the global features; c) using the interim log-linear model to reprocess the interim segmented corpus into a revised segmented corpus and a revised log-linear model; d) iterating by returning to step (c) until a stop criterion is met, with the revised segmented corpus being used as the interim corpus and the revised log-linear model being used as the interim model. - View Dependent Claims (17, 18, 19, 20)
-
Specification