Method and system for classifying text
First Claim
1. A method comprising:
- using a programmed computer,creating a data structure by identifying a plurality of words and mapping each word to one or more categories;
storing the data structure in one or more databases;
indexing the data structure;
identifying an item of electronic content;
classifying the item of electronic content using the data structure, the classifying comprising;
identifying all single words and word combinations comprising two or more words in the item of electronic content that include one or more words;
for each of the single words of at least a pre-determined number of characters in length and each of the words in the word combinations words of at least a pre-determined number of characters in length in each of the word combinations, identifying each of the categories to which the word is mapped;
assigning a weight for each of the words based on an inverse proportion to the number of categories to which the word is mapped, andassigning a weight based on a direct proportion to a number of words in the word combination using a multiplier; and
adding a result of classifying the electronic content to the data structure.
11 Assignments
0 Petitions
Accused Products
Abstract
A content classification system, method and computer product is presented. In connection with the invention, a data structure is created by identifying a plurality of words and mapping each word to one or more categories. The data structure is indexed. An item of content is identified and classified based on the data structure. The classification includes identifying all one—or more—word combinations in the item of content; for each word of at least a pre-determined number of characters in length in each of the word combinations, identifying each of the categories to which it is mapped; and determining a weight for each of the words based on an inverse proportion to the number of categories to which it is mapped.
33 Citations
33 Claims
-
1. A method comprising:
using a programmed computer, creating a data structure by identifying a plurality of words and mapping each word to one or more categories; storing the data structure in one or more databases; indexing the data structure; identifying an item of electronic content; classifying the item of electronic content using the data structure, the classifying comprising; identifying all single words and word combinations comprising two or more words in the item of electronic content that include one or more words; for each of the single words of at least a pre-determined number of characters in length and each of the words in the word combinations words of at least a pre-determined number of characters in length in each of the word combinations, identifying each of the categories to which the word is mapped; assigning a weight for each of the words based on an inverse proportion to the number of categories to which the word is mapped, and assigning a weight based on a direct proportion to a number of words in the word combination using a multiplier; and adding a result of classifying the electronic content to the data structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A system comprising:
one or more processors that are programmed to; create a data structure by identifying a plurality of words and mapping each word to one or more categories; store the data structure in one or more databases; index the data structure; identify an item of electronic content; classify the item of electronic content using the data structure by identifying all single words and word combinations comprising two or more words in the item of electronic content;
for each of the single words of at least a pre-determined number of characters in length and each of the words in the word combinations of at least a pre-determined number of characters in length, identifying each of the categories to which the word is mapped;
assigning a weight for each of the words based on an inverse proportion to the number of categories to which the word is mapped;
assigning a weight based on a direct proportion to a number of words in the word combination using a multiplier; andadding a result of classifying the electronic content to the data structure. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
23. A non-transitory computer readable medium having stored thereon computer executable instructions that, when executed by a computer, direct the computer to perform a method comprising the steps of:
-
creating a data structure by identifying a plurality of words and mapping each word to one or more categories; storing the data structure in one or more databases; indexing the data structure; identifying an item of electronic content; classifying the item of electronic content using the data structure, the classifying comprising; identifying all single words and word combinations comprising two or more words in the item of electronic content; for each of the single words of at least a pre-determined number of characters in length and each of the words in the word combinations of at least a pre-determined number of characters in length, identifying each of the categories to which the word is mapped; assigning a weight for each of the words based on an inverse proportion to the number of categories to which the word is mapped; and assigning a weight based on a direct proportion to a number of words in the word combination using a multiplier; and adding a result of classifying the electronic content to the data structure. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification