×

Distributed data set indexing

  • US 10,013,441 B1
  • Filed: 12/11/2017
  • Issued: 07/03/2018
  • Est. Priority Date: 02/13/2017
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising a processor of a first node device of multiple node devices, and a storage of the first node device to store instructions that, when executed by the processor, cause the processor to perform operations comprising:

  • store, at the first node device, a super cell of multiple super cells into which a data set is divided from a data file maintained by at least one data device, wherein;

    the multiple super cells are distributed among the multiple node devices;

    each super cell comprises multiple data cells;

    each data cell of the multiple data cells comprises multiple data records; and

    each data record of the multiple data records comprises a set of data fields at which data values of the data set are stored;

    index, at the first node device, the multiple data records within a first data cell of a super cell by a first data field and by a second data field of the set of data fields in a single read pass through the first data cell, wherein for each data record within the first data cell, the processor is caused to;

    retrieve data values from the first data field and the second data field;

    perform operations to generate a first binary tree of unique data values of the first data field of the first data cell to determine whether the data value retrieved from the first data field comprises a unique value or a duplicate value, the operations comprising;

    search the first binary tree to determine whether the data value retrieved from the first data field comprises a duplicate data value that is already present within the first binary tree or a unique value that is not already present within the first binary tree; and

    add the data value retrieved from the first data field to the first binary tree in response to a determination that the data value retrieved from the first data field is a unique value; and

    perform operations to generate a second binary tree of unique data values of the second data field of the first data cell to determine whether the data value retrieved from the second data field comprises a unique value or a duplicate value, the operations comprising;

    search the second binary tree to determine whether the data value retrieved from the second data field comprises a duplicate data value that is already present within the second binary tree or a unique value that is not already present within the second binary tree; and

    add the data value retrieved from the second data field to the second binary tree in response to a determination that the data value retrieved from the second data field is a unique value;

    generate a first unique values index of identifiers of the data records associated with the unique data values within the first binary tree based on an in-order traversal of the first binary tree to enable use of the first unique values index to perform a binary search of the data values within the first data field of the data records of the first data cell;

    generate a second unique values index of identifiers of the data records associated with the unique data values within the second binary tree based on an in-order traversal of the second binary tree to enable use of the second unique values index to perform a binary search of the data values within the second data field of the data records of the first data cell;

    generate, within a super cell index corresponding to the super cell, and from the in-order traversals of the first and second binary trees, an indication of a range of the data values of the first data field within the data records of the first data cell, and an indication of a range of the data values of the second data field within the data records of the first data cell; and

    provide the super cell index to a control device to enable use of the super cell index by the control device.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×