System and method for automatically classifying text
First Claim
1. In a system comprising perspectives and categories, each perspective including at least one category representative of that perspective, a computerized method for classifying at least one item across multiple perspectives, said computerized method comprising:
- associating category features with each category, wherein each of said category features represents one of a plurality of tokens;
producing a category vector for each category, wherein each category vector includes a weight corresponding to each category feature, said weight indicative of a degree of association between said category feature and said category;
associating item features with each item, wherein each of said item features represents one of a plurality of tokens found in said item;
producing a feature vector for each item, wherein each feature vector includes said item features with a count corresponding to each item feature, said count indicative of the number of times said item feature appears in said item;
multiplying said category vector by said item vector to produce a plurality of category scores for each item; and
for each perspective, across multiple perspectives, classifying an item into a category provided said category score exceeds a predetermined threshold.
25 Assignments
0 Petitions
Accused Products
Abstract
A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.
-
Citations
11 Claims
-
1. In a system comprising perspectives and categories, each perspective including at least one category representative of that perspective, a computerized method for classifying at least one item across multiple perspectives, said computerized method comprising:
-
associating category features with each category, wherein each of said category features represents one of a plurality of tokens; producing a category vector for each category, wherein each category vector includes a weight corresponding to each category feature, said weight indicative of a degree of association between said category feature and said category; associating item features with each item, wherein each of said item features represents one of a plurality of tokens found in said item; producing a feature vector for each item, wherein each feature vector includes said item features with a count corresponding to each item feature, said count indicative of the number of times said item feature appears in said item; multiplying said category vector by said item vector to produce a plurality of category scores for each item; and for each perspective, across multiple perspectives, classifying an item into a category provided said category score exceeds a predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
Specification