Computer-based system for classifying documents into a hierarchy and linking the classifications to the hierarchy
First Claim
1. A computer system for classifying electronic text according to multiple classifications arranged in a hierarchy, comprising:
- a memory for storing and retrieving electronic text;
identification means for identifying embedded citations contained in the electronic text;
means for stripping embedded citations identified by said identification means and storing them in memory;
matching means for comparing stripped citations to stored citations associated with at least one classification in the hierarchy, and for identifying stripped citations which match at least one stored citation;
scoring means for assigning scores to the matching citations identified by said matching means, based on heuristic rules;
calculating means for calculating a classification score for each classification associated with the stored citations which match the matching citations identified by said matching means, based on the scores assigned to the matching citations and the heuristic rules;
comparison means for comparing each classification score with a threshold value;
classification means for classifying the electronic text within the hierarchy based on the comparison of the classification score with the threshold value; and
association means for associating the electronic text with stored classification identifying strings to produce a classified electronic text.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer system uses a legal hierarchy annotated with seed citations to generate a control file, and then using the control file, permits legal documents to be classified automatically into the legal hierarchy without the need for manual intervention. Each classification within the legal hierarchy receives a unique numerical classification key which identifies the location of the classification within the legal hierarchy. Each level of the hierarchy also receives a unique hierarchy location key which identifies a hierarchical document through which a user can retrieve a legal document which displays to the user a classification. The control file is an automatically-generated intermediate file which identifies the legal classifications, their classification keys, and the hierarchy location keys to which the classifications map. This automatically generated control file is input to a legal classification generator, along with a document to be classified. The unclassified legal document is scanned electronically for citations, which are stripped and normalized, and then compared to the seed citations in the control file for matches. For each match which occurs, each new classification with which the seed citation was associated is stored in memory along with a numerical initial classification score of zero; and each previously-identified classification results in the classification score being incremented. Heuristic rules are employed to increment the classification scores based on the seed citation matched. After all citations have been checked against the seed citations, all classification scores are checked against a threshold value. If the classification score for any particular classification is greater than or equal to the threshold value, then the classification key and the hierarchy location key associated with the classification are inserted into the legal document.
-
Citations
13 Claims
-
1. A computer system for classifying electronic text according to multiple classifications arranged in a hierarchy, comprising:
-
a memory for storing and retrieving electronic text; identification means for identifying embedded citations contained in the electronic text; means for stripping embedded citations identified by said identification means and storing them in memory; matching means for comparing stripped citations to stored citations associated with at least one classification in the hierarchy, and for identifying stripped citations which match at least one stored citation; scoring means for assigning scores to the matching citations identified by said matching means, based on heuristic rules; calculating means for calculating a classification score for each classification associated with the stored citations which match the matching citations identified by said matching means, based on the scores assigned to the matching citations and the heuristic rules; comparison means for comparing each classification score with a threshold value; classification means for classifying the electronic text within the hierarchy based on the comparison of the classification score with the threshold value; and association means for associating the electronic text with stored classification identifying strings to produce a classified electronic text. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for classifying electronic text according to multiple classifications arranged in a hierarchy using a computer having a memory and a processor, comprising the steps of:
-
(a) inputting into the memory an electronic text to be classified within the hierarchy; (b) identifying embedded citations contained in the electronic; (c) stripping embedded citations identified by the identification means and storing them in the memory; (d) comparing stripped citations to stored citations associated with at least one classification in the hierarchy, and identifying stripped citations which match at least one stored citation, using the computer processor; (e) assigning scores to the matching citations based on heuristic rules, using the computer processor; (f) calculating a classification score for each classification associated with the stored citations which match the stripped citations using the scores assigned to the matching citations, using the computer processor; (g) comparing each classification score with a threshold value, using the computer processor; (h) classifying the electronic text within the hierarchy based on the comparison of the classification score with the threshold value, using the computer processor; and (i) associating the electronic text with stored classification identifying strings to produce a classified electronic text, using the computer processor. - View Dependent Claims (8, 9, 10)
-
-
11. A computer system for linking a hierarchical electronic text representing an electronically compatible hierarchy having multiple levels and classifications associated with the levels, with individual electronic text documents containing the classifications within the hierarchy, comprising:
-
a memory; storing means for storing a hierarchy in memory, the hierarchy having levels classifications associated with the levels, and citations associated with the classifications; means for stripping the citations; generating means for generating a unique classification key associated with each classification within the hierarchy of each citation; generating means for generating a unique location key associated with the location within the hierarchy from which each classification came; means for writing the stripped citations, the classification keys, and the location keys to a control file; means for generating a hierarchical electronic text with the classification and hierarchy location keys from the computer file; first searching means for searching on the unique classification key associated with a classification; and second searching means for searching on the unique location key associated with each level of the hierarchy. - View Dependent Claims (12, 13)
-
Specification