×

INPUT PROCESSING FOR MACHINE LEARNING

  • US 20150379072A1
  • Filed: 08/14/2014
  • Published: 12/31/2015
  • Est. Priority Date: 06/30/2014
  • Status: Active Grant
First Claim
Patent Images

1. A system, comprising:

  • one or more computing devices configured to;

    receive, via a programmatic interface of a machine learning service of a provider network, a request to extract observation records of a particular data set from one or more file sources, wherein a size of the particular data set exceeds a size of a first memory portion available for the particular data set at a first server of the machine learning service;

    map the particular data set to a plurality of contiguous chunks, including a particular contiguous chunk whose size does not exceed the first memory portion;

    generate, based at least in part on a filtering descriptor indicated in the request, a filtering plan to perform a sequence of chunk-level filtering operations on the plurality of contiguous chunks, wherein an operation type of individual ones of the sequence of filtering operations comprises one or more of;

    (a) sampling, (b) shuffling, (c) splitting, or (d) partitioning for parallel computation, and wherein the filtering plan includes a first chunk-level filtering operation followed by a second chunk-level filtering operation;

    execute, to implement the first chunk-level filtering operation, at least a set of reads directed to one or more persistent storage devices at which at least a subset of the plurality of contiguous chunks are stored, wherein, subsequent to the set of reads, the first memory portion comprises at least the particular contiguous chunk;

    implement the second chunk-level filtering operation on an in-memory result set of the first chunk-level filtering operation, without re-reading from the one or more persistent storage devices, and without copying the particular contiguous chunk; and

    extract a plurality of observation records from an output of the sequence of chunk-level filtering operations.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×