Classification system with methodology for efficient verification
First Claim
1. A method comprising:
- obtaining a document;
determining, using a trained classifier, a candidate label for the document from a plurality of different labels;
selecting two or more different linguistic structures from the document;
displaying a user interface that presents data from the document, including at least a portion of the two or more linguistic structures, and the plurality of labels including the candidate label, and respective scores in association with each different label among the plurality of labels, wherein the portion of the two or more linguistic structures are displayed by the user interface, wherein the user interface includes two or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the two or more user interface controls further presenting an element for highlighting the two or more linguistic structures within the document;
wherein one of the user interface controls is configured to allow selection from the plurality of different labels;
receiving, via the two or more user interface controls, input representing selection of the first option or the second option, and further input comprising a highlighted section of the two or more linguistic structures that was important to the selection of the first option or the second option;
associating the document with a verified label;
changing, based on the further input, one or more weights assigned to the highlighted section relative to a non-highlighted section during retraining of the trained classifier;
wherein the method is performed by one or more computing devices.
8 Assignments
0 Petitions
Accused Products
Abstract
Techniques for a classification system with methodology for enhanced verification are described. In one approach, a classification computer trains a classifier based on a set of training documents. After training is complete, the classification computer iterates over a collection unlabeled documents uses the trained classifier to predict a label for each unlabeled document. A verification computer retrieves one of the documents assigned a label by the classification computer. The verification computer then generates a user interface that displays select information from the document and provides an option to verify the label predicted by the classification computer or provide an alternative label. The document and the verified label are then fed back into the set of training documents and are used to retrain the classifier to improve subsequent classifications. In addition, the document is indexed by a query computer based on the verified label and made available for search and display.
-
Citations
18 Claims
-
1. A method comprising:
-
obtaining a document; determining, using a trained classifier, a candidate label for the document from a plurality of different labels; selecting two or more different linguistic structures from the document; displaying a user interface that presents data from the document, including at least a portion of the two or more linguistic structures, and the plurality of labels including the candidate label, and respective scores in association with each different label among the plurality of labels, wherein the portion of the two or more linguistic structures are displayed by the user interface, wherein the user interface includes two or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the two or more user interface controls further presenting an element for highlighting the two or more linguistic structures within the document; wherein one of the user interface controls is configured to allow selection from the plurality of different labels; receiving, via the two or more user interface controls, input representing selection of the first option or the second option, and further input comprising a highlighted section of the two or more linguistic structures that was important to the selection of the first option or the second option; associating the document with a verified label; changing, based on the further input, one or more weights assigned to the highlighted section relative to a non-highlighted section during retraining of the trained classifier; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable storage medium storing one or more instructions which, when executed by one or more processors, cause the one or more processors to perform steps comprising:
-
obtaining a document; determining, using a trained classifier, a candidate label for the document from a plurality of different labels; selecting two or more different linguistic structures from the document; displaying a user interface that presents data from the document, including at least a portion of the two or more linguistic structures, the plurality of labels including the candidate label and respective scores in association with each different label among the plurality of labels, wherein the portion of the two or more linguistic structures are displayed by the user interface, wherein the user interface includes two or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the two or more user interface controls further presenting an element for highlighting the two or more linguistic structures within the document; receiving, via the two or more user interface controls, input representing selection of the first option or the second option, and further input comprising a highlighted section of the two or more linguistic structures that was important to the selection of the first option or the second option; associating the document with a verified label; changing, based on the further input, one or more weights assigned to the highlighted section relative to a non-highlighted section during retraining of the trained classifier. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
an unlabeled document database storing one or more unlabeled documents; a classification computer configured to; obtain a document from the unlabeled document database; determine, using a trained classifier, a candidate label for the document from a plurality of different labels; change, based on a further input, one or more weights assigned to a highlighted section relative to a non-highlighted section during retraining of the trained classifier; a verification computer configured to; select two or more different linguistic structures from the document; display a user interface that presents data from the document, including at least a portion of the two or more linguistic structures, the plurality of labels including the candidate label and respective scores in association with each different label among the plurality of labels, wherein the portion of the two or more linguistic structures are displayed by the user interface, wherein the user interface includes two or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the two or more user interface controls further presenting an element for highlighting the two or more linguistic structures within the document; wherein one of the user interface controls is configured to allow selection from the plurality of different labels; receive, via the one or more user interface controls, input representing selection of the first option or the second option, and the further input comprising a highlighted section of the two or more linguistic structures that was important to the selection of the first option or the second option; associate the document with a verified label. - View Dependent Claims (18)
-
Specification