×

Distributed data set indexing

  • US 10,303,670 B2
  • Filed: 05/21/2018
  • Issued: 05/28/2019
  • Est. Priority Date: 02/13/2017
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor of a first node device of multiple node devices, and a storage of the first node device to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • read data values from a first data field and a second data field of a set of data fields within each data record of multiple data records of a first data cell of a data set in a single pass through the multiple data records of the first data cell;

    for each data record within the first data cell, index the multiple data records of the first data cell, at the first node device and at least partially in parallel with other node devices of the multiple node devices, by performance of operations comprising;

    determine, based on the data value retrieved from the first data field, whether the first data field stores a unique data value, wherein the data value has not been read from the first data field of any preceding data record in the single pass;

    in response to a determination that the first data field stores a unique data value, add an identifier of the data record to a first unique values index of a first cell index that corresponds to the first data cell, wherein identifiers of data records within the first unique values index are ordered based on corresponding unique data values in the first data field to enable use of the first unique values index to perform a search of the data values within the first data field of the data records of the first data cell;

    determine, based on the data value retrieved from the second data field, whether the second data field stores a unique data value, wherein the data value has not been read from the second data field of any preceding data record in the single pass; and

    in response to a determination that the second data field stores a unique data value, add an identifier of the data record to a second unique values index of the first cell index, wherein identifiers of data records within the second unique values index are ordered based on corresponding unique data values in the second data field to enable use of the second unique values index to perform a search of the data values within the second data field of the data records of the first data cell;

    request provision, by a control device, of a first pointer to a location within a data file maintained by at least one data device at which to store at least the first data cell and the first cell index; and

    transmit, to the at least one data device, and at least partially in parallel with other node devices of the multiple node devices, at least the first data cell and the first cell index with an instruction to store at least the first data cell index and the first cell index with the first data cell stored in the data file starting at the location pointed to by the first pointer, and with the first cell index stored in the data file at a location after at least the first data cell.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×