SYSTEM AND METHOD FOR TEXT CATEGORIZATION BASED ON ONTOLOGIES
First Claim
1. A system for text categorization based on ontologies, the system comprising:
- a plurality of data collector software modules stored and operating on a plurality of network-attached computers;
a categorizer software module stored and operating on a network-attached server computer; and
a database server comprising an indexed database of documents and their categorizations, and further comprising a plurality of ontologies, each ontology comprising a plurality of hierarchical taxonomies and each hierarchical taxonomy comprising a plurality of taxons;
wherein the data collector software modules receive a document to be classified and submit them to the categorizer software module; and
further wherein the categorizer performs the following steps to categorize each received document;
splitting the document into sentences;
selecting words or phrases that are present in one or more of the plurality of ontologies stored in the database server;
selecting a plurality of subtrees from the plurality of ontologies based on the presence one or more of a set of specific subcategories in the document;
determining a weight for each subcategory within the set of specific subcategories;
creating a plurality of modified subtrees by pruning subcategories having a weight below a threshold from each of the selected plurality of subtrees; and
for each of the plurality of modified subtrees, computing a conditionality coefficient.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for text categorization based on ontologies comprising data collector software modules; a categorizer software module; and a database comprising an indexed database of documents and their categorizations, and further comprising a plurality of ontologies, each ontology comprising a plurality of hierarchical taxonomies and each hierarchical taxonomy comprising a plurality of taxons. The data collector software modules receive a document to be classified and submit them to the categorizer software module; and the categorizer performs the following steps to categorize each document: splitting the document into sentences; selecting words or phrases that are present in ontologies stored in the database server; selecting a plurality of subtrees from the ontologies based on the presence of specific subcategories in the document; determining a weight for each subcategory; pruning subcategories having a weight below a threshold; and for each of the plurality of modified subtrees, computing a conditionality coefficient.
-
Citations
2 Claims
-
1. A system for text categorization based on ontologies, the system comprising:
-
a plurality of data collector software modules stored and operating on a plurality of network-attached computers; a categorizer software module stored and operating on a network-attached server computer; and a database server comprising an indexed database of documents and their categorizations, and further comprising a plurality of ontologies, each ontology comprising a plurality of hierarchical taxonomies and each hierarchical taxonomy comprising a plurality of taxons; wherein the data collector software modules receive a document to be classified and submit them to the categorizer software module; and further wherein the categorizer performs the following steps to categorize each received document; splitting the document into sentences; selecting words or phrases that are present in one or more of the plurality of ontologies stored in the database server; selecting a plurality of subtrees from the plurality of ontologies based on the presence one or more of a set of specific subcategories in the document; determining a weight for each subcategory within the set of specific subcategories; creating a plurality of modified subtrees by pruning subcategories having a weight below a threshold from each of the selected plurality of subtrees; and for each of the plurality of modified subtrees, computing a conditionality coefficient.
-
-
2. A method for text categorization based on ontologies, the method comprising the steps of:
-
(a) receiving, via a plurality of data collector software modules stored and operating on a plurality of network-attached computers, a document to be classified; (b) submitting the received document to a categorizer software module stored and operating on a network-attached server computer; (c) performing the following using the categorizer software module; (c1) splitting the document into sentences; (c2) selecting words or phrases that are present in one or more the plurality of ontologies stored in the database server; (c3) selecting a plurality of subtrees from the plurality of ontologies based on the presence one or more of a set of specific subcategories in the document; (c4) determining a weight for each subcategory within the set of specific subcategories; (c5) creating a plurality of modified subtrees by pruning subcategories having a weight below a threshold from each of the selected plurality of subtrees; and (c6) for each of the plurality of modified subtrees, computing a conditionality coefficient; (d) storing a resulting document categorization in a database server comprising an indexed database of documents and their categorizations, and further comprising a plurality of ontologies, each ontology comprising a plurality of hierarchical taxonomies and each hierarchical taxonomy comprising a plurality of taxons.
-
Specification