System and method for incremental annotation of datasets

US 10,496,369 B2
Filed: 07/30/2018
Issued: 12/03/2019
Est. Priority Date: 07/31/2017
Status: Active Grant

First Claim

Patent Images

1. A system for incremental annotation of datasets, the system comprising:

at least one storage device configured to store a plurality of labeled examples and a plurality of unlabeled examples; and

at least one processor configured to;

use the plurality of labeled examples to generate a first inference model;

use the first inference model to assign labels to at least part of the unlabeled examples;

calculate confidence levels corresponding to the assigned labels;

use the confidence levels to select a subset of the plurality of unlabeled examples, where at least one of the plurality of unlabeled examples is not included in the selected subset of the plurality of unlabeled examples;

generate a second inference model based on the plurality of labeled examples, the selected subset of the plurality of unlabeled examples, and the assigned labels corresponding to the selected subset of the plurality of unlabeled examples;

use the confidence levels to select a user of a plurality of alternative users;

provide a request to the selected user to assign labels;

in response to the request, receive from the user an assignment of labels to one or more of the unlabeled examples; and

generate a third inference model based on the plurality of labeled examples, the one or more of the unlabeled examples, and the assignment of labels received from the user.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for incremental annotation of datasets are provided. For example, a group of labeled examples and a group of unlabeled examples may be obtained, a first inference model may be generated using the group of labeled examples, labels may be assigned to at least part of the group of unlabeled examples using the first inference model, confidence levels may be assigned to the assigned labels, a subset of the group of unlabeled examples may be selected using the confidence levels, and in some cases a second inference model may be generated using the selected subset and/or the corresponding assigned labels.

Citations

18 Claims

1. A system for incremental annotation of datasets, the system comprising:
- at least one storage device configured to store a plurality of labeled examples and a plurality of unlabeled examples; and
  
  at least one processor configured to;
  
  use the plurality of labeled examples to generate a first inference model;
  
  use the first inference model to assign labels to at least part of the unlabeled examples;
  
  calculate confidence levels corresponding to the assigned labels;
  
  use the confidence levels to select a subset of the plurality of unlabeled examples, where at least one of the plurality of unlabeled examples is not included in the selected subset of the plurality of unlabeled examples;
  
  generate a second inference model based on the plurality of labeled examples, the selected subset of the plurality of unlabeled examples, and the assigned labels corresponding to the selected subset of the plurality of unlabeled examples;
  
  use the confidence levels to select a user of a plurality of alternative users;
  
  provide a request to the selected user to assign labels;
  
  in response to the request, receive from the user an assignment of labels to one or more of the unlabeled examples; and
  
  generate a third inference model based on the plurality of labeled examples, the one or more of the unlabeled examples, and the assignment of labels received from the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein the at least one processor is further configured to:
    - use the confidence levels to determine that user intervention is required; and
      
      based on said determination, provide the request for the selected user to assign labels.
  - 3. The system of claim 1, wherein the selection of the subset of the plurality of unlabeled examples is further based on the assigned labels.
  - 4. The system of claim 1, wherein a size of the subset of the plurality of unlabeled examples depends on a size of the plurality of labeled examples.
  - 5. The system of claim 1, wherein the assigned labels comprise images.
  - 6. The system of claim 1, wherein the at least one processor is further configured to:
    - remove the selected subset of the plurality of unlabeled examples out of the plurality of unlabeled examples to obtain an updated plurality of unlabeled examples;
      
      use the second inference model to assign labels to at least part of the updated plurality of unlabeled examples;
      
      calculate confidence levels corresponding to the labels assigned using the second inference model; and
      
      based on the confidence levels corresponding to the labels assigned using the second inference model, select a subset of the updated plurality of unlabeled examples, wherein the generation of the third inference model is further based on the labels assigned using the second inference model and corresponding to the selected subset of the updated plurality of unlabeled examples.
  - 7. The system of claim 1, wherein the at least one processor is further configured to:
    - until at least a selected number of the plurality of unlabeled examples are included in at least one selected subset, repeat the following steps;
      
      using the second inference model to update at least part of the assigned labels;
      
      recalculating at least part of the confidence levels based on the assigned labels;
      
      based on the confidence levels, selecting a subset of the plurality of unlabeled examples; and
      
      updating the second inference model based on at least part of the selected subsets.

8. A method for incremental annotation of datasets, the method comprising:
- accessing a plurality of labeled examples and a plurality of unlabeled examples;
  
  using the plurality of labeled examples to generate a first inference model;
  
  using the first inference model to assign labels to at least part of the unlabeled examples;
  
  calculating confidence levels corresponding to the assigned labels;
  
  use the confidence levels to select a subset of the plurality of unlabeled examples, where at least one of the plurality of unlabeled examples is not included in the selected subset of the plurality of unlabeled examples;
  
  generating a second inference model based on the plurality of labeled examples, the selected subset of the plurality of unlabeled examples, and the assigned labels corresponding to the selected subset of the plurality of unlabeled examples;
  
  using the confidence levels to select a user of a plurality of alternative users;
  
  providing a request to the selected user to assign labels;
  
  in response to the request, receiving from the user an assignment of labels to one or more of the unlabeled examples; and
  
  generating a third inference model based on the plurality of labeled examples, the one or more of the unlabeled examples, and the assignment of labels received from the user.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 9. The method of claim 8, further comprising:
    - using the confidence levels to determine that user intervention is required; and
      
      based on said determination, providing the request for the selected user to assign labels.
  - 10. The method of claim 8, wherein the selection of the subset of the plurality of unlabeled examples is further based on the assigned labels.
  - 11. The method of claim 8, wherein the subset of the plurality of unlabeled examples is a single unlabeled example.
  - 12. The method of claim 8, wherein the subset of the plurality of unlabeled examples comprises at least two unlabeled examples.
  - 13. The method of claim 8, wherein a size of the subset of the plurality of unlabeled examples depends on a size of the plurality of labeled examples.
  - 14. The method of claim 8, wherein a size of the subset of the plurality of unlabeled examples depends on a size of the plurality of unlabeled examples.
  - 15. The method of claim 8, wherein the assigned labels comprise images.
  - 16. The method of claim 8, further comprising:
    - removing the selected subset of the plurality of unlabeled examples out of the plurality of unlabeled examples to obtain an updated plurality of unlabeled examples;
      
      using the second inference model to assign labels to at least part of the updated plurality of unlabeled examples;
      
      calculating confidence levels corresponding to the labels assigned using the second inference model; and
      
      based on the confidence levels corresponding to the labels assigned using the second inference model, selecting a subset of the updated plurality of unlabeled examples, wherein the generation of the third inference model is further based on the labels assigned using the second inference model and corresponding to the selected subset of the updated plurality of unlabeled examples.
  - 17. The method of claim 8, further comprising:
    - until at least a selected number of the plurality of unlabeled examples are included in at least one selected subset, repeating the following steps;
      
      using the second inference model to update at least part of the assigned labels;
      
      recalculating at least part of the confidence levels based on the assigned labels;
      
      based on the confidence levels, selecting a subset of the plurality of unlabeled examples; and
      
      updating the second inference model based on at least part of the selected subsets.

18. A non-transitory computer readable medium storing data and computer implementable instructions for carrying out a method for incremental annotation of datasets, the method comprising:
- accessing a plurality of labeled examples and a plurality of unlabeled examples;
  
  using the plurality of labeled examples to generate a first inference model;
  
  using the first inference model to assign labels to at least part of the unlabeled examples;
  
  calculating confidence levels corresponding to the assigned labels;
  
  use the confidence levels to select a subset of the plurality of unlabeled examples, where at least one of the plurality of unlabeled examples is not included in the selected subset of the plurality of unlabeled examples;
  
  generating a second inference model based on the plurality of labeled examples, the selected subset of the plurality of unlabeled examples, and the assigned labels corresponding to the selected subset of the plurality of unlabeled examples;
  
  using the confidence levels to select a user of a plurality of alternative users;
  
  providing a request to the selected user to assign labels;
  
  in response to the request, receiving from the user an assignment of labels to one or more of the unlabeled examples; and
  
  generating a third inference model based on the plurality of labeled examples, the one or more of the unlabeled examples, and the assignment of labels received from the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Allegro Artificial Intelligence Ltd.
Original Assignee
Allegro Artificial Intelligence Ltd.
Inventors
Guttmann, Moshe
Primary Examiner(s)
Hicks, Austin

Application Number

US16/048,377
Publication Number

US 20180336481A1
Time in Patent Office

491 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/2379   Updates performed during on...

G06F 16/24565   Triggers; Constraints

G06F 16/285   Clustering or classification

G06F 18/214   Generating training pattern...

G06F 18/217   Validation; Performance eva...

G06F 21/6218   to a system of files or obj...

G06F 7/14   Merging, i.e. combining at ...

G06F 9/505   considering the load

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/048   Activation functions

G06N 3/08   Learning methods

G06N 3/082   modifying the architecture,...

G06N 3/084   Backpropagation, e.g. using...

G06N 5/022   Knowledge engineering; Know...

G06N 5/04   Inference or reasoning models

G06N 5/046   Forward inferencing; Produc...

G06N 7/01   Probabilistic graphical mod...

G06Q 10/06311 : Scheduling, planning or tas...

H04L 63/0823 : using certificates cryptogr...

H04L 63/102 : Entity profiles

H04N 23/661 : Transmitting camera control...

View All

System and method for incremental annotation of datasets

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for incremental annotation of datasets

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links