Interactive learning-based document annotation

US 8,726,144 B2
Filed: 12/23/2005
Issued: 05/13/2014
Est. Priority Date: 12/23/2005
Status: Active Grant

First Claim

Patent Images

1. A document annotation system comprising:

a graphical user interface for annotating documents, the graphical user interface including at least one user input device and a display device configured to display documents;

a probabilistic active learning component configured to train an annotation model and to propose annotations to documents based on the annotation model, the probabilistic active learning component also outputting a probability of acceptance associated with each proposed annotation; and

a request handler configured to convey annotation requests from the graphical user interface to the active learning component and to convey proposed annotations from the active learning component to the graphical user interface, the request handler including a mode selector that selects at least between (i) a training mode in which low probability proposed annotations are presented by the graphical user interface and (ii) an annotation mode in which high probability proposed annotations are presented by the graphical user interface.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A document annotation system includes a graphical user interface used by an annotator to annotate documents. An active learning component trains an annotation model and proposes annotations to documents based on the annotation model. A request handler conveys annotation requests from the graphical user interface to the active learning component, conveys proposed annotations from the active learning component to the graphical user interface, and selectably conveys evaluation requests from the graphical user interface to a domain expert. During annotation, at least some low probability proposed annotations are presented to the annotator by the graphical user interface. The presented low probability proposed annotations enhance training of the annotation model by the active learning component.

Citations

11 Claims

1. A document annotation system comprising:
- a graphical user interface for annotating documents, the graphical user interface including at least one user input device and a display device configured to display documents;
  
  a probabilistic active learning component configured to train an annotation model and to propose annotations to documents based on the annotation model, the probabilistic active learning component also outputting a probability of acceptance associated with each proposed annotation; and
  
  a request handler configured to convey annotation requests from the graphical user interface to the active learning component and to convey proposed annotations from the active learning component to the graphical user interface, the request handler including a mode selector that selects at least between (i) a training mode in which low probability proposed annotations are presented by the graphical user interface and (ii) an annotation mode in which high probability proposed annotations are presented by the graphical user interface.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The document annotation system as set forth in claim 1, wherein the documents being annotated are one of (i) extensible markup language (XML) documents, the annotation model being a target XML schema, and (ii) hypertext markup language (HTML) documents.
  - 3. The document annotation system as set forth in claim 1, wherein the mode selector is switchable during a document annotation to switchably effectuate both (i) rapid training of the annotation model through presentation of low probability proposed annotations in the training mode and (ii) rapid annotation through presentation of high probability proposed annotations in the annotation mode.
  - 4. The document annotation system as set forth in claim 1, wherein (i) in the training mode the graphical user interface requires one or more user operations to make an annotation and (ii) in the annotation mode the graphical user interface requires a single user operation to annotate a plurality of elements.
  - 5. The document annotation system as set forth in claim 1, wherein the probabilistic active learning component comprisesa probabilistic classifier that probabilistically classifies unannotated document elements respective to classes corresponding to annotations.
  - 6. The document annotation system as set forth in claim 5, wherein the probabilistic classifier is selected from a group consisting of:
    - a k-nearest neighbor classifier, a maximum entropy classifier, and an assembly method classifier.
  - 7. The document annotation system as set forth in claim 1, wherein the request handler further conveys learning requests from the graphical user interface to the active learning component, each learning request using previously annotated documents or document portions for training of the annotation model.

8. A document annotation system comprising:
- a graphical user interface for annotating documents, the graphical user interface including at least one user input device and a display device configured to display documents;
  
  an active learning component for training an annotation model and for proposing annotations to documents based on the annotation model, the active learning component comprising a probabilistic active learning component that outputs a probability of acceptance associated with each proposed annotation; and
  
  an asynchronous request handler configured to convey annotation requests from the graphical user interface to the active learning component and to convey proposed annotations from the active learning component to the graphical user interface, the asynchronous request handler (i) buffering annotation requests conveyed from the graphical user interface to the active learning component and (ii) buffering proposed annotations to documents conveyed from the active learning component to the graphical user interface, wherein the asynchronous request handler comprises a mode selector that selects at least between (i) a training mode in which low probability proposed annotations are presented by the graphical user interface and (ii) an annotation mode in which high probability proposed annotations are presented by the graphical user interface.
- View Dependent Claims (9, 10, 11)
- - 9. The document annotation system as set forth in claim 8, wherein the asynchronous request handler further comprises:
    - a domain expert request handler for conveying evaluation requests from the graphical user interface to a human domain expert and for conveying responses from the human domain expert to the graphical user interface.
  - 10. The document annotation system as set forth in claim 9, wherein the evaluation request conveyed by the domain expert request hander includes (i) at least one proposed annotation to a document generated by the active learning component and (ii) the document or a link to the document.
  - 11. The document annotation system as set forth in claim 10, wherein the domain expert request handler comprises:
    - an automated email message generator that generates an email addressed to the domain expert and having content including at least (i) the at least one proposed annotation and (ii) the document or the link to the document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Chidlovskii, Boris, Jacquin, Thierry
Primary Examiner(s)
Mills, Frank D

Application Number

US11/316,771
Publication Number

US 20070150801A1
Time in Patent Office

3,063 Days
Field of Search

715/231, 715/241
US Class Current

715/231
CPC Class Codes

G06F 18/2185   the supervisor being an aut...

G06F 18/41   Interactive pattern learnin...

G06F 40/143   Markup, e.g. Standard Gener...

G06F 40/169   Annotation, e.g. comment da...

Interactive learning-based document annotation

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Interactive learning-based document annotation

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links