×

Distributed data set indexing

  • US 9,977,807 B1
  • Filed: 12/11/2017
  • Issued: 05/22/2018
  • Est. Priority Date: 02/13/2017
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor of a first node device of multiple node devices, and a storage of the first node device to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • store, at the first node device, a first super cell of multiple super cells into which a data set is divided from a data file maintained by at least one data device, wherein;

    the multiple super cells are distributed among the multiple node devices;

    each super cell comprises multiple data cells;

    each data cell of the multiple data cells comprises multiple data records; and

    each data record of the multiple data records comprises a set of data fields at which data values of the data set are stored;

    store, for each data cell within the first super cell, a cell index that corresponds to the data cell, wherein the cell index comprises a first hash values vector that corresponds to a first data field of the set of data fields, and that comprises hash values generated from each unique value among the data values stored within the first data field;

    receive, at the first node device, from a control device, and at least partially in parallel with other node devices of the multiple node devices, query instructions specifying search criteria of a search to be performed of the data set for data records that meet the search criteria, wherein the search criteria comprises at least one data value to be searched for within the first data field;

    in response to the receipt of the query instructions, generate a first hash value from a first data value of the at least one data value of the search criteria, and for each data cell within the first super cell, the processor is caused to perform operations of the search, the operations comprising;

    compare the first hash value to the hash values within the first hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record that meets the search criteria for at least the first data value; and

    in response to a determination that the data cell includes at least one data record that meets the search criteria, search the data records of the data cell to identify one or more data records that meet the search criteria; and

    in response to identifying at least one data record within at least one data cell of the first super cell that meets the search criteria for at least the first data value, the processor is caused to perform operations comprising;

    generate results data indicative of the first super cell including at least one data record that meets the search criteria for at least the first data value; and

    provide the results data to the control device.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×