Method for transforming data elements within a classification system based in part on input from a human annotator or expert

US 8,612,373 B2
Filed: 06/03/2010
Issued: 12/17/2013
Est. Priority Date: 12/14/2006
Status: Active Grant

First Claim

Patent Images

1. A method for evolving an annotating model for classifying a document or a data item therein, comprising:

composing a first concept evolution model as a training set comprised of a first set of selectively determinable class labels of element instances within the document that are detectable within the document to produce a result of predicting class labels to be assigned to unlabeled element instances and the first concept evolution model;

training a learning algorithm with the training set and the concept evolution model to generate a trained model wherein the learning algorithm comprises a global approach to reshape a list of the classes and adjusts the set of features, or wherein the learning algorithm comprises a local approach that creates a local model of one or few events, the definition set of classes remains unchanged, and the training set can be extended with new examples;

using the trained model to predict class labels for unlabeled element instances within the document;

computing a confidence factor for a predicted class label is accurately predicted for unlabeled elements;

identifying an unlabeled element instance within the document with a corresponding suggested annotation having a confidence factor less than zero; and

adjusting the classifying of the unlabeled element instance wherein a second concept evolution model is composed for more accurate classifying of the document, and wherein the composing and applying are executed by a designer of the annotating model and the computing is machine implemented.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for transforming data elements within a classification system based in part on input from a human annotator or expert. A first concept evolution model as a training set is composed from a first set of selectively determinable annotations and the first concept evolution model. A trained model is generated after training a learning algorithm with the training set and the concept evolution model. A confidence factor is computed that a predicted annotation is accurately identified. A selected element instance and a corresponding suggested annotation are identified to have a low confidence factor. The classifying of the applied annotation is adjusted where a second concept evolution model is composed for more accurate classifying of the data item.

Citations

9 Claims

1. A method for evolving an annotating model for classifying a document or a data item therein, comprising:
- composing a first concept evolution model as a training set comprised of a first set of selectively determinable class labels of element instances within the document that are detectable within the document to produce a result of predicting class labels to be assigned to unlabeled element instances and the first concept evolution model;
  
  training a learning algorithm with the training set and the concept evolution model to generate a trained model wherein the learning algorithm comprises a global approach to reshape a list of the classes and adjusts the set of features, or wherein the learning algorithm comprises a local approach that creates a local model of one or few events, the definition set of classes remains unchanged, and the training set can be extended with new examples;
  
  using the trained model to predict class labels for unlabeled element instances within the document;
  
  computing a confidence factor for a predicted class label is accurately predicted for unlabeled elements;
  
  identifying an unlabeled element instance within the document with a corresponding suggested annotation having a confidence factor less than zero; and
  
  adjusting the classifying of the unlabeled element instance wherein a second concept evolution model is composed for more accurate classifying of the document, and wherein the composing and applying are executed by a designer of the annotating model and the computing is machine implemented.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein the composing comprises associating a class with detectable annotations.
  - 3. The method of claim 2 wherein the computing comprises determining a probability that a detected annotation corresponds to a class, and when the probabilities for all classes correspond to the confidence factor satisfying the predetermined condition of the uncertainty, suggesting annotating of the class to an annotator or expert.
  - 4. The method of claim 3 wherein the adjusting comprises the local approach concept evaluation comprising associating a local model for each evolution event including a concept evolution command.
  - 5. The method of claim 4 wherein the associating a local model comprises corresponding an event model to an internal mode of a concept evolution DAG.
  - 6. The method of claim 3 wherein the adjusting comprises a global approach concept evolution including associating a global model for a most recent changing of the associate features for the predicted class comprising issuing of a concept evolution command by the annotator or expert.
  - 7. The method of claim 6 wherein the associating a global model comprises changing the set of classes in accordance with the issued concept evolution command and removing annotations for the data items that are obsolete from the changing.
  - 8. The method according to claim 1, wherein the confidence factor is calculated using the formula:
  - 9. The method according to claim 1, wherein the confidence factor is normalized using the formula:

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Chidlovskii, Boris
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Tran, Mai T

Application Number

US12/792,973
Publication Number

US 20100306141A1
Time in Patent Office

1,293 Days
Field of Search

706/20
US Class Current

706/20
CPC Class Codes

G06F 16/35 Clustering; Classification

G06N 20/00 Machine learning

Method for transforming data elements within a classification system based in part on input from a human annotator or expert

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Method for transforming data elements within a classification system based in part on input from a human annotator or expert

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links