Classification system with methodology for efficient verification

US 9,390,086 B2
Filed: 09/11/2014
Issued: 07/12/2016
Est. Priority Date: 09/11/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

obtaining a document;

determining, using a trained classifier, a candidate label for the document from a plurality of different labels;

selecting two or more different linguistic structures from the document;

displaying a user interface that presents data from the document, including at least a portion of the two or more linguistic structures, and the plurality of labels including the candidate label, and respective scores in association with each different label among the plurality of labels, wherein the portion of the two or more linguistic structures are displayed by the user interface, wherein the user interface includes two or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the two or more user interface controls further presenting an element for highlighting the two or more linguistic structures within the document;

wherein one of the user interface controls is configured to allow selection from the plurality of different labels;

receiving, via the two or more user interface controls, input representing selection of the first option or the second option, and further input comprising a highlighted section of the two or more linguistic structures that was important to the selection of the first option or the second option;

associating the document with a verified label;

changing, based on the further input, one or more weights assigned to the highlighted section relative to a non-highlighted section during retraining of the trained classifier;

wherein the method is performed by one or more computing devices.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for a classification system with methodology for enhanced verification are described. In one approach, a classification computer trains a classifier based on a set of training documents. After training is complete, the classification computer iterates over a collection unlabeled documents uses the trained classifier to predict a label for each unlabeled document. A verification computer retrieves one of the documents assigned a label by the classification computer. The verification computer then generates a user interface that displays select information from the document and provides an option to verify the label predicted by the classification computer or provide an alternative label. The document and the verified label are then fed back into the set of training documents and are used to retrain the classifier to improve subsequent classifications. In addition, the document is indexed by a query computer based on the verified label and made available for search and display.

Citations

18 Claims

1. A method comprising:
- obtaining a document;
  
  determining, using a trained classifier, a candidate label for the document from a plurality of different labels;
  
  selecting two or more different linguistic structures from the document;
  
  displaying a user interface that presents data from the document, including at least a portion of the two or more linguistic structures, and the plurality of labels including the candidate label, and respective scores in association with each different label among the plurality of labels, wherein the portion of the two or more linguistic structures are displayed by the user interface, wherein the user interface includes two or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the two or more user interface controls further presenting an element for highlighting the two or more linguistic structures within the document;
  
  wherein one of the user interface controls is configured to allow selection from the plurality of different labels;
  
  receiving, via the two or more user interface controls, input representing selection of the first option or the second option, and further input comprising a highlighted section of the two or more linguistic structures that was important to the selection of the first option or the second option;
  
  associating the document with a verified label;
  
  changing, based on the further input, one or more weights assigned to the highlighted section relative to a non-highlighted section during retraining of the trained classifier;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the trained classifier has been trained using a set of labeled documents, the verified label is the candidate label if the first option is selected by the input, and the verified label is the different label if the second option is selected by the input, and further comprising:
    - adding the document and the verified label to the set of labeled documents;
      
      retraining the trained classifier based on the set of labeled documents to which the document and the verified label have been added.
  - 3. The method of claim 1, wherein determining, using the trained classifier, the candidate label for the document includes at least:
    - receiving, from the trained classifier, a particular respective score for each different label of the plurality of labels, wherein the particular respective score represents a confidence of the trained classifier with respect to the label being correct for the document.
  - 4. The method of claim 3, wherein the trained classifier determines the respective score for each classification by at least:
    - determining, for each document portion of a plurality of document portions of the document, a respective sub-score for the document portion;
      
      determining the particular respective score of the document based on aggregating the respective sub-score for each document portion of the plurality of document portions.
  - 5. The method of claim 4, further comprising:
    - assigning, to each document portion of the plurality of document portions a respective weight, wherein the sub-score for each document portion of the plurality of document portions is weighted by the respective weight for the document portion when aggregating the respective sub-score for each document portion of the plurality of document portions.
  - 6. The method of claim 4, further comprising:
    - selecting one or more document portions of the plurality of document portions of the document to include in the data based on the respective sub-score of each document portion of the plurality of document portions.
  - 7. The method of claim 1, wherein the document represents a medical report that discuss a respective target protein of a plurality of proteins and each label of the plurality of labels relates to at least one protein of the plurality of proteins.
  - 8. The method of claim 1, wherein each linguistic structure of the one or more linguistic structures is associated with at least one label of the plurality of labels, the two or more linguistic structures are displayed by the user interface in a visually distinguished manner compared to other two or more linguistic structures displayed by the user interface, and the user interface includes second two or more user interface controls associated with the plurality of labels and which, when selected, each cause toggling of the visually distinguished manner of one or more respective linguistic structures, of the portion of the two or more linguistic structures displayed by the user interface, that are associated with a respective label of the plurality of labels.

9. A non-transitory computer-readable storage medium storing one or more instructions which, when executed by one or more processors, cause the one or more processors to perform steps comprising:
- obtaining a document;
  
  determining, using a trained classifier, a candidate label for the document from a plurality of different labels;
  
  selecting two or more different linguistic structures from the document;
  
  displaying a user interface that presents data from the document, including at least a portion of the two or more linguistic structures, the plurality of labels including the candidate label and respective scores in association with each different label among the plurality of labels, wherein the portion of the two or more linguistic structures are displayed by the user interface, wherein the user interface includes two or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the two or more user interface controls further presenting an element for highlighting the two or more linguistic structures within the document;
  
  receiving, via the two or more user interface controls, input representing selection of the first option or the second option, and further input comprising a highlighted section of the two or more linguistic structures that was important to the selection of the first option or the second option;
  
  associating the document with a verified label;
  
  changing, based on the further input, one or more weights assigned to the highlighted section relative to a non-highlighted section during retraining of the trained classifier.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The non-transitory computer-readable storage medium of claim 9, wherein the trained classifier has been trained using a set of labeled documents, the verified label is the candidate label if the first option is selected by the input, and the verified label is the different label if the second option is selected by the input, and the steps further comprise:
    - adding the document and the verified label to the set of labeled documents;
      
      retraining the trained classifier based on the set of labeled documents to which the document and the verified label have been added.
  - 11. The non-transitory computer-readable storage medium of claim 9, wherein determining, using the trained classifier, the candidate label for the document includes at least:
    - receiving, from the trained classifier, a particular respective score for each label of the plurality of labels, wherein the particular respective score represents a confidence of the trained classifier with respect to the label being correct for the document.
  - 12. The non-transitory computer-readable storage medium of claim 11, wherein the trained classifier determines the particular respective score for each classification by at least:
    - determining, for each document portion of a plurality of document portions of the document, a respective sub-score for the document portion;
      
      determining the particular respective score of the document based on aggregating the respective sub-score for each document portion of the plurality of document portions.
  - 13. The non-transitory computer-readable storage medium of claim 12, wherein the steps further comprise:
    - assigning, to each document portion of the plurality of document portions a respective weight, wherein the sub-score for each document portion of the plurality of document portions is weighted by the respective weight for the document portion when aggregating the respective sub-score for each document portion of the plurality of document portions.
  - 14. The non-transitory computer-readable storage medium of claim 12, wherein the steps further comprise:
    - selecting one or more document portions of the plurality of document portions of the document to include in the data based on the respective sub-score of each document portion of the plurality of document portions.
  - 15. The non-transitory computer-readable storage medium of claim 9, wherein the document represents a medical report that discuss a respective target protein of a plurality of proteins and each label of the plurality of labels relates to at least one protein of the plurality of proteins.
  - 16. The non-transitory computer-readable storage medium of claim 9, wherein each linguistic structure of the two or more linguistic structures is associated with at least one label of the plurality of labels, the two or more linguistic structures are displayed by the user interface in a visually distinguished manner compared to other one or more linguistic structures displayed by the user interface, and the user interface includes second user interface controls which, when selected, each cause toggling of the visually distinguished manner of one or more respective linguistic structures, of the portion of the two or more linguistic structures displayed by the user interface, that are associated with a particular respective label of the plurality of labels.

17. A system comprising:
- an unlabeled document database storing one or more unlabeled documents;
  
  a classification computer configured to;
  
  obtain a document from the unlabeled document database;
  
  determine, using a trained classifier, a candidate label for the document from a plurality of different labels;
  
  change, based on a further input, one or more weights assigned to a highlighted section relative to a non-highlighted section during retraining of the trained classifier;
  
  a verification computer configured to;
  
  select two or more different linguistic structures from the document;
  
  display a user interface that presents data from the document, including at least a portion of the two or more linguistic structures, the plurality of labels including the candidate label and respective scores in association with each different label among the plurality of labels, wherein the portion of the two or more linguistic structures are displayed by the user interface, wherein the user interface includes two or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the two or more user interface controls further presenting an element for highlighting the two or more linguistic structures within the document;
  
  wherein one of the user interface controls is configured to allow selection from the plurality of different labels;
  
  receive, via the one or more user interface controls, input representing selection of the first option or the second option, and the further input comprising a highlighted section of the two or more linguistic structures that was important to the selection of the first option or the second option;
  
  associate the document with a verified label.
- View Dependent Claims (18)
- - 18. The system of claim 17, further comprising a labeled document database storing one or more unlabeled documents, wherein the trained classifier has been trained using a set of labeled documents, the verified label is the candidate label if the first option is selected by the input, the verified label is the different label if the second option is selected by the input, the verification computer is further configured to add the document and the verified label to the labeled document database, and the classification computer is further configured to retrain the trained classifier based on a second set of labeled documents from the labeled document database, wherein the second set of labeled documents includes the document and the verified label.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Palantir Technologies Incorporated
Original Assignee
Palantir Technologies Incorporated
Inventors
Lisuk, David, Holtzen, Steven
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
PELLETT, DANIEL T

Application Number

US14/483,527
Publication Number

US 20160078022A1
Time in Patent Office

670 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/353   into predefined classes

G06F 3/04842   Selection of displayed obje...

G06F 40/117   Tagging; Marking up details...

G06F 40/289   Phrasal analysis, e.g. fini...

G06F 40/53   Processing of non-Latin tex...

G06N 20/00   Machine learning

Classification system with methodology for efficient verification

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Classification system with methodology for efficient verification

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links