Learning system for classification of terminology
First Claim
1. A method for automated learning of terminology from an input set of documents, the method comprising the steps of:
- storing a classification system comprising a plurality of categories of terminology arranged to reflect associations among related categories;
processing the input set of documents to classify each of the documents into a category of the classification system, wherein the documents classified contain a term for learning;
generating contextual data for the term for learning by mapping the categories selected to the classification system so as reflect associations among the categories selected;
analyzing the contextual data, including analyzing all of the categories selected in the classification system for the term, prior to selecting a single category for the term; and
selecting a single category, based on the associations of the categories in the classification system, for the term to learn the term as the single category.
3 Assignments
0 Petitions
Accused Products
Abstract
A learning system learns terms in the context of a set of documents. During an accumulation phase, the learning system accumulates contextual data for a term in the form of a categorization schema. The categorization schema, which is based on a classification hierarchy, classifies the term in categories such that the classifications are based on uses of the terms in the set of documents During a computational phase, the learning system analyzes the categorization schema, and selects, if sufficient contextual data has been accumulated, a single category in the classification system to classify the term. A content processing system, which understands the thematic content of documents, is used in conjunction with the learning system.
284 Citations
28 Claims
-
1. A method for automated learning of terminology from an input set of documents, the method comprising the steps of:
-
storing a classification system comprising a plurality of categories of terminology arranged to reflect associations among related categories; processing the input set of documents to classify each of the documents into a category of the classification system, wherein the documents classified contain a term for learning; generating contextual data for the term for learning by mapping the categories selected to the classification system so as reflect associations among the categories selected; analyzing the contextual data, including analyzing all of the categories selected in the classification system for the term, prior to selecting a single category for the term; and selecting a single category, based on the associations of the categories in the classification system, for the term to learn the term as the single category. - View Dependent Claims (2)
-
-
3. A method for automated learning of at least one term from an input set of documents, the method comprising the steps of:
-
storing a classification system comprising a plurality of categories of terminology arranged in a hierarchy of categories so as to reflect associations among related categories; processing the input set of documents to generate contextual data by classifying the term in a plurality of categories of the classification system; and analyzing the contextual data by performing hierarchical clustering analysis on the selected categories of the hierarchy of categories in the classification system to identify a cluster of categories and to select a single category in the cluster from the plurality of selected categories to learn the term as the single category selected. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer readable medium having a set of instructions stored therein, which when executed by a computer, causes the computer to perform the steps of:
-
storing a classification system comprising a plurality of categories of terminology arranged in a hierarchy of categories so as to reflect associations among related categories; processing the input set of documents to generate contextual data by classifying the term in a plurality of categories of the classification system; and analyzing the contextual data by performing hierarchical clustering analysis on the selected categories of the hierarchy of categories in the classification system to identify a cluster of categories and to select a single category in the cluster from the plurality of selected categories to learn the term as the single category selected. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A text processing system for learning at least one term in an input set of documents comprising:
-
a classification system comprising a plurality of categories of terminology arranged to reflect associations among the categories; a content processing system for processing the input set of documents to generate a thematic profile for each document that identifies a category for classification of the documents that include the term for learning; and a learning system coupled to receive the thematic profile and categories of the classification system for generating a categorization schema by classifying the term into a plurality of different categories in the classification system, and for analyzing the categorization schema to select a single category from the plurality of different categories in the classification system to learn the term as the single category selected.
-
-
28. A computer readable medium having a set of instructions stored therein, which when executed by a computer, causes the computer to perform the steps of:
-
storing a classification system comprising a plurality of categories of terminology arranged to reflect associations among the categories; processing the input set of documents to select a plurality of categories for a plurality of documents, one category for each document, to classify the documents, wherein the documents classified include a term for learning; generating contextual data for the term for learning by mapping the categories selected to the classification system so as reflect associations among said category selected; analyzing the contextual data, including analyzing all of the categories selected in the classification system for the term, prior to selecting a single category for the term; and selecting a single category from the plurality of categories in the contextual data for the term to learn the term as the single category.
-
Specification