Method and system for classifying text
First Claim
1. A method comprising:
- creating a data structure using a data structure generation engine by identifying a plurality of words and mapping each word to one or more categories and storing the data structure in one or more databases;
indexing the data structure using an index generation engine;
identifying an item of content; and
classifying the item of content using a classification engine based on the data structure, the classifying comprising;
identifying all one—
or more—
word combinations in the item of content;
for each word of at least a pre-determined number of characters in length in each of the word combinations, identifying each of the categories to which it is mapped; and
determining a weight for each of the words based on an inverse proportion to the number of categories to which it is mapped.
11 Assignments
0 Petitions
Accused Products
Abstract
A content classification system, method and computer product is presented. In connection with the invention, a data structure is created by identifying a plurality of words and mapping each word to one or more categories. The data structure is indexed. An item of content is identified and classified based on the data structure. The classification includes identifying all one—or more—word combinations in the item of content; for each word of at least a pre-determined number of characters in length in each of the word combinations, identifying each of the categories to which it is mapped; and determining a weight for each of the words based on an inverse proportion to the number of categories to which it is mapped.
-
Citations
39 Claims
-
1. A method comprising:
-
creating a data structure using a data structure generation engine by identifying a plurality of words and mapping each word to one or more categories and storing the data structure in one or more databases; indexing the data structure using an index generation engine; identifying an item of content; and classifying the item of content using a classification engine based on the data structure, the classifying comprising; identifying all one—
or more—
word combinations in the item of content;for each word of at least a pre-determined number of characters in length in each of the word combinations, identifying each of the categories to which it is mapped; and determining a weight for each of the words based on an inverse proportion to the number of categories to which it is mapped. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
an data structure generation engine configured to create a data structure by identifying a plurality of words and mapping each word to one or more categories; one or more databases that store the data structure; an index generation engine configured to index the data structure; a classification engine configured to identify an item of content and classifying the item of content based on the data structure by identifying all one—
or more—
word combinations in the item of content;
for each word of at least a pre-determined number of characters in length in each of the word combinations, identifying each of the categories to which it is mapped; and
determining a weight for each of the words based on an inverse proportion to the number of categories to which it is mapped. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer program product including a computer readable medium having stored thereon computer executable instructions that, when executed by a computer, direct the computer to perform a method comprising the steps of:
-
creating a data structure by identifying a plurality of words and mapping each word to one or more categories; indexing the data structure; identifying an item of content; and classifying the item of content based on the data structure, the classifying comprising; identifying all one—
or more—
word combinations in the item of content;for each word of at least a pre-determined number of characters in length in each of the word combinations, identifying each of the categories to which it is mapped; and determining a weight for each of the words based on an inverse proportion to the number of categories to which it is mapped. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
Specification