Systems and methods for selecting machine learning training data
First Claim
Patent Images
1. An entity resolution system utilizing active learning for training a machine learning model of the entity resolution system, the entity resolution system comprising:
- one or more processors and a memory storing instructions that, when executed by the one or more processors, cause the system to;
obtain a machine learning model and a training dataset, the training dataset including a plurality of training examples, each training example of at least a portion of the training examples including one or more records, each record including an entity identification field and an entity location field;
determine uncertainty scores for the plurality of training examples according to the machine learning model;
select a first example batch from the plurality of training examples according to uncertainty scores of the plurality of training examples;
update the machine learning model according to at least one labeled training example of the first example batch;
determine updated uncertainty scores for the plurality of training examples according to the updated machine learning model;
select a second example batch from the plurality of training examples according to the updated uncertainty scores of the plurality of training examples;
update the machine learning model according to at least one labeled training example of the second example batch;
resolving, based at least in part on the machine learning model updated according to the at least one labeled training example of the second example batch, matching entities associated with one or more sets of sets of records, the one or more sets of records including at least a portion of the training dataset.
8 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for selecting training examples to increase the efficiency of supervised active machine learning processes. Training examples for presentation to a user may be selected according to measure of the model'"'"'s uncertainty in labeling the examples. A number of training examples may be selected to increase efficiency between the user and the processing system by selecting the number of training examples to minimize user downtime in the machine learning process.
154 Citations
20 Claims
-
1. An entity resolution system utilizing active learning for training a machine learning model of the entity resolution system, the entity resolution system comprising:
one or more processors and a memory storing instructions that, when executed by the one or more processors, cause the system to; obtain a machine learning model and a training dataset, the training dataset including a plurality of training examples, each training example of at least a portion of the training examples including one or more records, each record including an entity identification field and an entity location field; determine uncertainty scores for the plurality of training examples according to the machine learning model; select a first example batch from the plurality of training examples according to uncertainty scores of the plurality of training examples; update the machine learning model according to at least one labeled training example of the first example batch; determine updated uncertainty scores for the plurality of training examples according to the updated machine learning model; select a second example batch from the plurality of training examples according to the updated uncertainty scores of the plurality of training examples; update the machine learning model according to at least one labeled training example of the second example batch; resolving, based at least in part on the machine learning model updated according to the at least one labeled training example of the second example batch, matching entities associated with one or more sets of sets of records, the one or more sets of records including at least a portion of the training dataset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
11. A method for entity resolution utilizing active learning for training a machine learning model, the method being performed on a computer system having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, cause the computer system to perform the method, the method comprising:
-
obtaining, by the computer system, a machine learning model and a training dataset, the training dataset including a plurality of training examples, each training example of at least a portion of the training examples including one or more records, each record including an entity identification field and an entity location field; determining, by the computer system, uncertainty scores for the plurality of training examples according to the machine learning model; selecting, by the computer system, a first example batch from the plurality of training examples according to uncertainty scores of the plurality of training examples; updating, by the computer system, the machine learning model according to at least one labeled training example of the first example batch; determining, by the computer system, updated uncertainty scores for the plurality of training examples according to the updated machine learning model; selecting, by the computer system, a second example batch from the plurality of training examples according to the updated uncertainty scores of the plurality of training examples; updating, by the computer system, the machine learning model according to at least one labeled training example of the second example batch; resolving, by the computer system based at least in part on the machine learning model updated according to the at least one labeled training example of the second example batch, matching entities associated with one or more sets of sets of records, the one or more sets of records including at least a portion of the training dataset. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification