Creating taxonomies and training data in multiple languages
First Claim
1. A method of creating a taxonomy and categorization system in a target language based on a set of training documents in a source language comprising the steps of:
- selecting a source set of training documents in said source language, said set representing one or more categories;
translating said source set of training documents into a target set of target language training documents; and
extracting a set of differentiating features for each category from said target set.
1 Assignment
0 Petitions
Accused Products
Abstract
The problem of creating of taxonomies of objects, particularly objects that can be represented as text in various languages, and categorizing such objects is addressed by a method for taking the training documents generated in a first language, translating it to a target language, and then generating from a plurality of training documents one or more sets of features representing one or more categories in the target language. The method includes the steps of: forming a first list of items such that each item in the first list represents a particular training document having an association with one or more elements related to a particular category; developing a second list from the first list by deleting one or more candidate documents which satisfy at least one deletion criterion; translating the documents in the second list from the source language to the target language, and extracting the one or more sets of features from the translated second list using one or more feature selection criteria.
60 Citations
20 Claims
-
1. A method of creating a taxonomy and categorization system in a target language based on a set of training documents in a source language comprising the steps of:
-
selecting a source set of training documents in said source language, said set representing one or more categories;
translating said source set of training documents into a target set of target language training documents; and
extracting a set of differentiating features for each category from said target set. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system for creating a categorization system in a target language based on a set of training data in a source language, comprising a processing unit for processing data and a storing unit for storing data, in which said processing unit contains instructions for executing a method comprising:
-
selecting a source set of training documents in said source language;
translating said source set of training documents into a target set of target language training documents; and
extracting a set of differentiating features, corresponding to a set of categories, from said target set. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. An article of manufacture in computer readable form comprising means for performing a method for operating a computer system having a program, said method comprising the steps of:
-
selecting a source set of training documents in said source language;
translating said source set of training documents into a target set of target language training documents; and
extracting a set of differentiating features, corresponding to a set of categories, from said target set. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification