Computer-implemented system and method for generating document training sets
First Claim
1. A computer-implemented method for generating document training sets, comprising:
- providing a set of unclassified documents to each of two or more trained classifiers and receiving a classification code assigned to each unclassified document from each classifier;
comparing via a server the classification codes assigned to each unclassified document by two or more of the classifiers, wherein the server comprises a central processing unit, memory, an input port to receive the set of unclassified documents, and an output port to provide a training set for a matter;
determining for at least one of the unclassified documents that a disagreement exists between the classification codes from the two or more classifiers;
providing via the server for further review the unclassified document with a disagreement in classification codes, wherein results of the further review comprise one of a new classification code and confirmation of one of the assigned classification codes;
generating the training set for the matter via the server by grouping the unclassified documents for which the disagreement exists; and
generating a further training set for a same or different matter, comprising;
training two or more other classifiers by identifying features within one or more coded documents, classifying the features, and utilizing the classified features for training the other classifiers;
identifying via the other classifiers one or more features within at least one of the unclassified documents;
assigning by each of the other classifiers, a classification code to each of the identified features;
comparing the classification codes assigned to each feature;
determining whether a disagreement exists between the classification codes assigned to at least one of the features via the other classifiers;
providing the features with a disagreement in classification codes for further review, wherein results of the further review comprise one of a new classification code and confirmation of one of the assigned classification codes; and
grouping as the further training set the unclassified documents associated with the features for which a disagreement exists.
3 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for generating document training sets is provided. Unclassified documents are provided to two or more classifiers. A classification code assigned to each unclassified document is received. A determination is made as to whether a disagreement exists between classification codes assigned to a common unclassified document via different classifiers. The common unclassified document with a disagreement in classification codes are provided for further review. Results of the further review include one of a new classification code and confirmation of one of the assigned classification codes. The unclassified documents for which a disagreement exists are grouped as a training set.
-
Citations
16 Claims
-
1. A computer-implemented method for generating document training sets, comprising:
-
providing a set of unclassified documents to each of two or more trained classifiers and receiving a classification code assigned to each unclassified document from each classifier; comparing via a server the classification codes assigned to each unclassified document by two or more of the classifiers, wherein the server comprises a central processing unit, memory, an input port to receive the set of unclassified documents, and an output port to provide a training set for a matter; determining for at least one of the unclassified documents that a disagreement exists between the classification codes from the two or more classifiers; providing via the server for further review the unclassified document with a disagreement in classification codes, wherein results of the further review comprise one of a new classification code and confirmation of one of the assigned classification codes; generating the training set for the matter via the server by grouping the unclassified documents for which the disagreement exists; and generating a further training set for a same or different matter, comprising; training two or more other classifiers by identifying features within one or more coded documents, classifying the features, and utilizing the classified features for training the other classifiers; identifying via the other classifiers one or more features within at least one of the unclassified documents; assigning by each of the other classifiers, a classification code to each of the identified features; comparing the classification codes assigned to each feature; determining whether a disagreement exists between the classification codes assigned to at least one of the features via the other classifiers; providing the features with a disagreement in classification codes for further review, wherein results of the further review comprise one of a new classification code and confirmation of one of the assigned classification codes; and grouping as the further training set the unclassified documents associated with the features for which a disagreement exists. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented system for generating document training sets, comprising:
-
a set of unclassified documents provided to each of two or more trained classifiers, wherein a classification code assigned to each unclassified document from each classifier; and a server comprising a central processing unit, memory, an input port to receive the set of unclassified documents, and an output port to provide a training set for a matter, wherein the central processing unit is configured to; compare the classification codes assigned to each unclassified document by two or more of the classifiers; determine for at least one of the unclassified documents that a disagreement exists between the classification codes from the two or more classifiers; provide for further review the unclassified document with a disagreement in classification codes; receiving results of the further review comprising one of a new classification code and confirmation of one of the assigned classification codes; generate the training set for the matter by grouping the unclassified documents for which the disagreement exists; and generate a further training set for a same or different matter, comprising; train two or more other classifiers by identifying features within one or more coded documents, classifying the features, and utilizing the classified features for training the classifiers; identify via the other classifiers one or more features within at least one of the unclassified documents; assign by each of the other classifiers, a classification code to each of the identified features; compare the classification codes assigned to each feature; determine whether a disagreement exists between the classification codes assigned to at least one of the features via the other classifiers; provide the features with a disagreement in classification codes for further review, wherein results of the further review comprise one of a new classification code and confirmation of one of the assigned classification codes; and group as the further training set the unclassified documents associated with the features for which a disagreement exists. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification