Systems, methods, and software for classifying text from judicial opinions and other documents
First Claim
1. An automated method of classifying input text according to a target classification system having two or more target classes, the method comprising:
- for each target class, determining a composite score based on a first score scaled by a first class-specific weight for the target class and a second score scaled by a second class-specific weight for the target class, with the first and second scores based on an input text and text associated with the target class; and
for each target class, classifying or recommending classification of the input text to the target class based on the composite score and a class-specific decision threshold for the target class.
8 Assignments
0 Petitions
Accused Products
Abstract
To reduce cost and improve accuracy, the inventors devised systems, methods, and software to aid classification of text, such as headnotes and other documents, to target classes in a target classification system. For example, one system computes composite scores based on: similarity of input text to text assigned to each of the target classes; similarity of non-target classes assigned to the input text and target classes; probability of a target class given a set of one or more non-target classes assigned to the input text; and/or probability of the input text given text assigned to the target classes. The exemplary system then evaluates the composite scores using class-specific decision criteria, such as thresholds, ultimately assigning or recommending assignment of the input text to one or more of the target classes. The exemplary system is particularly suitable for classification systems having thousands of classes.
-
Citations
21 Claims
-
1. An automated method of classifying input text according to a target classification system having two or more target classes, the method comprising:
-
for each target class, determining a composite score based on a first score scaled by a first class-specific weight for the target class and a second score scaled by a second class-specific weight for the target class, with the first and second scores based on an input text and text associated with the target class; and
for each target class, classifying or recommending classification of the input text to the target class based on the composite score and a class-specific decision threshold for the target class. - View Dependent Claims (2, 3)
-
-
4. An automated method of classifying text to one or more target classes in a target classification system, the method comprising:
-
identifying one or more noun-word pairs in a portion of text; and
determining one or more scores based on frequencies of one or more of the identified noun-word pairs in the portion of text and one or more noun-word pairs in text associated with one of the target classes. - View Dependent Claims (5, 6, 7, 8, 9)
-
-
10. An automated method of classifying input text to one or more target classes in a target classification system, the method comprising:
-
identifying a first set of noun-word pairs in the input text, with the first set including at least one noun-word pair formed from a noun and non-adjacent word in the input text;
identifying two or more second sets of noun-word pairs, with each second set including at least one noun-word pair formed from a noun and non-adjacent word in text associated with a respective one of the target classes;
determining a set of scores based on the first and second sets of noun-word pairs; and
classifying or recommending classification of the input text to one or more of the target classes based on the set of scores.
-
-
11. A system for classifying input text to a target classification system having two or more target classes, the system comprising:
-
a scoring module for determining for each of the target classes at least first and second scores based on the input text and the target class;
a composite scoring module for determining for each of the target classes a corresponding composite score based on the first score scaled by a first class-specific weight for the target class and the second score scaled by a second class-specific weight for the target class; and
a classification module for determining for each of the target classes whether to classify or recommend classification of the input text to the target class based on the corresponding composite score and a class-specific decision threshold for the target class. - View Dependent Claims (12)
-
-
13. A machine-readable medium comprising instructions related to classifying input text to a target classification system having two or more target classes, the instructions comprising:
-
a first set of instructions for determining first and second scores based on the input text and one of the target classes, wherein the first score is based on;
similarity of at least one or more portions of the input text to text associated with the one target class;
orsimilarity of a set of one or more non-target classes associated with the input text and a set of one or more non-target classes associated with the one target class; and
wherein the second score is based on;
probability of the one target class given a set of one or more non-target classes associated with the input text;
orprobability of the one target class given at least a portion of the input text;
a second set of instructions for determining a composite score based on the first and second scores; and
a third set of instructions for comparing the composite score to a decision threshold. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A machine-readable medium comprising instructions for classifying input text to a target classification system having two or more target classes, the instructions comprising:
-
a first set of instructions for determining first and second scores based on the input text and one of the target classes, wherein the first score is based on similarity of a set of one or more non-target classes associated with the input text and a set of one or more non-target classes associated with the one target class; and
wherein the second score is based on probability of the one target class given at least a portion of the input text;
a second set of instructions for determining a composite score based on a linear combination of the first and second scores; and
a third set of instructions for comparing the composite score to a decision threshold. - View Dependent Claims (19, 20, 21)
-
Specification