System and method for automatically classifying text
6 Assignments
0 Petitions
Accused Products
Abstract
A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.
185 Citations
92 Claims
-
1-12. -12. (canceled)
- 13. A method for associating at least one of a plurality of features with at least one of a plurality of categories, said method comprising at least one of manually or automatically associating at least one of said plurality of features to at least a first category, said plurality of features contributing to a decision to classify a document into said at least first category.
-
17-18. -18. (canceled)
-
35-37. -37. (canceled)
-
43-47. -47. (canceled)
-
48. In a system including perspectives and categories, each perspective comprising at least one category representative of that perspective, a method for constructing a classifier to classify at least one item across multiple perspectives, the method including:
-
associating at least one feature with each category, in which each feature is configured for being detected in at least a portion of the at least one item for classification of that item;
determining an initial weight indicating a degree of association between each associated feature and category; and
in which weights for a category are initially related to weights for other categories of the same perspective but are initially substantially unrelated to weights for categories in different perspectives. - View Dependent Claims (49, 50, 51, 52, 53, 54, 55)
-
-
56. In a system comprising perspectives and categories, each perspective comprising at least one category representative of that perspective, the system also comprising weights, each weight indicating a degree of an association between a feature and a category, a method for classifying at least one item across multiple perspectives, the method comprising:
-
identifying feature instances in the items;
representing, for each item, which features were identified in that item and the number of instances each such feature was identified in that item;
computing, for each item, a category score for each category associated with at least one feature identified in that item, the computing using the weight associating the category and the at least one feature identified in that item;
selecting one or more categories to represent each perspective according to the category scores; and
classifying the items across the selected categories representing the multiple perspectives. - View Dependent Claims (57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71)
-
-
72. In a system for classifying items to categories, a method including:
-
receiving user-input defining all associations between classification features and categories; and
statistically determining weights corresponding to the user-defined associations, each weight indicating a degree to which the association'"'"'s feature identifies the association'"'"'s category and discriminates against other categories. - View Dependent Claims (73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88)
-
-
89. In a system for classifying items to categories, a method including:
-
receiving user-input creating user-defined associations between classification features and categories;
statistically determining machine-defined associations that are capable of being different from the user-defined associations; and
classifying items to the categories using weights corresponding to the user-defined associations and the machine-defined associations. - View Dependent Claims (90, 91, 92)
-
Specification