×

Consistent filtering of machine learning data

  • US 10,540,606 B2
  • Filed: 08/14/2014
  • Issued: 01/21/2020
  • Est. Priority Date: 06/30/2014
  • Status: Active Grant
First Claim
Patent Images

1. A system, comprising:

  • one or more computing devices configured to;

    generate consistency metadata to be used for one or more training-and-evaluation iterations of a machine learning model, wherein the consistency metadata comprises at least a particular initialization parameter value for a pseudo-random number source;

    sub-divide an address space of a particular data set of the machine learning model into a plurality of contiguous chunks, including a first chunk comprising a first plurality of observation records, and a second chunk comprising a second plurality of observation records;

    retrieve, from one or more persistent storage devices, observation records of the first chunk into a memory of a first server, and observation records of the second chunk into a memory of a second server;

    select, using a first set of pseudo-random numbers, a first training set from the plurality of contiguous chunks, wherein the first training set includes at least a portion of the first chunk, wherein observation records of the first training set are used to train the machine learning model during a first training-and-evaluation iteration of the one or more training-and-evaluation iterations, and wherein the first set of pseudo-random numbers is obtained using the consistency metadata; and

    select, using a second set of pseudo-random numbers, a first test set from the plurality of contiguous chunks, wherein the first test set includes at least a portion of the second chunk, wherein observation records of the first test set are used to evaluate the machine learning model during the first training-and-evaluation iteration, and wherein the second set of pseudo-random numbers is obtained using the consistency metadata;

    wherein the first and second sets of pseudo-random numbers cause individual observation record to be selected for exactly one of the first training set or the first test set.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×