Training Machine Learning Algorithms with Temporally Variant Personal Data, and Applications Thereof
First Claim
1. A computer-implemented method for training a machine learning algorithm with temporally variant personal data, comprising:
- (a) at a plurality of times, monitoring a data source to determine whether data relating to a person has updated;
(b) when data for the person has been updated, storing the updated data in a database such that the database includes a running log specifying how the person'"'"'s data has changed over time, wherein the person'"'"'s data includes values for a plurality of properties relating to the person;
(c) receiving an indication that a value for the particular property in the person'"'"'s data was verified as accurate or inaccurate at a particular time;
(d) retrieving, from the database based on the particular time, the person'"'"'s data, including values for the plurality of properties, that were up-to-date at the particular time; and
(e) training a model using the retrieved data and the indication such that the model can predict whether another person'"'"'s value for the particular property is accurate, whereby having the retrieved data be current to the particular time maintains the retrieved data'"'"'s significance in training the model.
2 Assignments
0 Petitions
Accused Products
Abstract
To train models, training data is needed. As personal data changes over time, the training data can get stale, obviating its usefulness in training the model. Embodiments deal with this by developing a database with a running log specifying how each person'"'"'s data changes at the time. When data is ingested, it may not he normalized. To deal with this, embodiments clean the data to ensure the ingested data fields are normalized. Finally, the various tasks needed to train the model and solve for accuracy of personal data can quickly become cumbersome to a computing device. They can conflict with one another and compete inefficiently for computing resources, such as processor power and memory capacity. To deal with these issues, a scheduler is employed to queue the various tasks involved.
-
Citations
22 Claims
-
1. A computer-implemented method for training a machine learning algorithm with temporally variant personal data, comprising:
-
(a) at a plurality of times, monitoring a data source to determine whether data relating to a person has updated; (b) when data for the person has been updated, storing the updated data in a database such that the database includes a running log specifying how the person'"'"'s data has changed over time, wherein the person'"'"'s data includes values for a plurality of properties relating to the person; (c) receiving an indication that a value for the particular property in the person'"'"'s data was verified as accurate or inaccurate at a particular time; (d) retrieving, from the database based on the particular time, the person'"'"'s data, including values for the plurality of properties, that were up-to-date at the particular time; and (e) training a model using the retrieved data and the indication such that the model can predict whether another person'"'"'s value for the particular property is accurate, whereby having the retrieved data be current to the particular time maintains the retrieved data'"'"'s significance in training the model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory program storage device having instructions stored thereon that, when executed by at least one computing device, causes the at least one computing device to perform a method for training a machine learning algorithm with temporally variant personal data, the method comprising:
-
(a) at a plurality of times, monitoring a data source to determine whether data relating to a person has updated; (b) when data for the person has been updated, storing the updated data in a database such that the database includes a running log specifying how the person'"'"'s data has changed over time, wherein the person'"'"'s data includes values for a plurality of properties relating to the person; (c) receiving an indication that a value for the particular property in the person'"'"'s data was verified as accurate or inaccurate at a particular time; (d) retrieving, from the database based on the particular time, the person'"'"'s data, including values for the plurality of properties, that were up-to-date at the particular time; and (e) training a model using the retrieved data and the indication such that the model can predict whether another person'"'"'s value for the particular property is accurate, whereby having the retrieved data be current to the particular time maintains the retrieved data'"'"'s significance in training the model. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 22)
-
-
19. A system for training a machine learning algorithm with temporally variant personal data, comprising:
-
a computing device; a database that includes a running log specifying how a person'"'"'s data has changed over time, wherein the person'"'"'s data includes values for a plurality of properties relating to the person; a data ingestion process implemented on the computing device and configured to;
(i) at a plurality of times, monitor a data source to determine whether data relating to the person has updated; and
(ii) when data for the person has been updated, storing the updated data in the database;an API monitor implemented on the computing device and configured to receive an indication that a value for the particular property in the person'"'"'s data was verified as accurate or inaccurate at a particular time; a queries implemented on the computing device and configured to retrieve, from the database based on the particular time, the person'"'"'s data, including values for the plurality of properties, that were up-to-date at the particular time; and a trainer implemented on the computing device and configured to train a model using the retrieved data and the indication such that the model can predict whether another person'"'"'s value for the particular property is accurate, whereby having the retrieved data be current to the particular time maintains the retrieved data'"'"'s significance in training the model. - View Dependent Claims (20, 21)
-
Specification