Distributed data set indexing
First Claim
1. An apparatus comprising a processor of a first node device of multiple node devices, and a storage of the first node device to store instructions that, when executed by the processor, cause the processor to perform operations comprising:
- store, at the first node device, a first super cell of multiple super cells into which a data set is divided from a data file maintained by at least one data device, wherein;
the multiple super cells are distributed among the multiple node devices;
each super cell comprises multiple data cells;
each data cell of the multiple data cells comprises multiple data records; and
each data record of the multiple data records comprises a set of data fields at which data values of the data set are stored;
store, for each data cell within the first super cell, a cell index that corresponds to the data cell, wherein the cell index comprises a first hash values vector that corresponds to a first data field of the set of data fields, and that comprises hash values generated from each unique value among the data values stored within the first data field;
receive, at the first node device, from a control device, and at least partially in parallel with other node devices of the multiple node devices, query instructions specifying search criteria of a search to be performed of the data set for data records that meet the search criteria, wherein the search criteria comprises at least one data value to be searched for within the first data field;
in response to the receipt of the query instructions, generate a first hash value from a first data value of the at least one data value of the search criteria, and for each data cell within the first super cell, the processor is caused to perform operations of the search, the operations comprising;
compare the first hash value to the hash values within the first hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record that meets the search criteria for at least the first data value; and
in response to a determination that the data cell includes at least one data record that meets the search criteria, search the data records of the data cell to identify one or more data records that meet the search criteria; and
in response to identifying at least one data record within at least one data cell of the first super cell that meets the search criteria for at least the first data value, the processor is caused to perform operations comprising;
generate results data indicative of the first super cell including at least one data record that meets the search criteria for at least the first data value; and
provide the results data to the control device.
0 Assignments
0 Petitions
Accused Products
Abstract
An apparatus including a processor to: receive search criteria including a data value; in response to receiving the search criteria, generate a hash value from the data value of the search criteria, and for each data cell of a super cell, compare the hash value to hash values within a hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record meeting the search criteria, and in response to determining that the data cell includes at least one of such data record, search the data records to identify one or more data records meeting the search criteria; and in response to identifying at least one data record within at least one data cell of the super cell meeting the search criteria, provide results data indicative of the super cell including at least one of such data record.
43 Citations
30 Claims
-
1. An apparatus comprising a processor of a first node device of multiple node devices, and a storage of the first node device to store instructions that, when executed by the processor, cause the processor to perform operations comprising:
-
store, at the first node device, a first super cell of multiple super cells into which a data set is divided from a data file maintained by at least one data device, wherein; the multiple super cells are distributed among the multiple node devices; each super cell comprises multiple data cells; each data cell of the multiple data cells comprises multiple data records; and each data record of the multiple data records comprises a set of data fields at which data values of the data set are stored; store, for each data cell within the first super cell, a cell index that corresponds to the data cell, wherein the cell index comprises a first hash values vector that corresponds to a first data field of the set of data fields, and that comprises hash values generated from each unique value among the data values stored within the first data field; receive, at the first node device, from a control device, and at least partially in parallel with other node devices of the multiple node devices, query instructions specifying search criteria of a search to be performed of the data set for data records that meet the search criteria, wherein the search criteria comprises at least one data value to be searched for within the first data field; in response to the receipt of the query instructions, generate a first hash value from a first data value of the at least one data value of the search criteria, and for each data cell within the first super cell, the processor is caused to perform operations of the search, the operations comprising; compare the first hash value to the hash values within the first hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record that meets the search criteria for at least the first data value; and in response to a determination that the data cell includes at least one data record that meets the search criteria, search the data records of the data cell to identify one or more data records that meet the search criteria; and in response to identifying at least one data record within at least one data cell of the first super cell that meets the search criteria for at least the first data value, the processor is caused to perform operations comprising; generate results data indicative of the first super cell including at least one data record that meets the search criteria for at least the first data value; and provide the results data to the control device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, the computer-program product including instructions operable to cause a processor of a first node device of multiple node devices to perform operations comprising:
-
store, at the first node device, a first super cell of multiple super cells into which a data set is divided from a data file maintained by at least one data device, wherein; the multiple super cells are distributed among the multiple node devices; each super cell comprises multiple data cells; each data cell of the multiple data cells comprises multiple data records; and each data record of the multiple data records comprises a set of data fields at which data values of the data set are stored; store, for each data cell within the first super cell, a cell index that corresponds to the data cell, wherein the cell index comprises a first hash values vector that corresponds to a first data field of the set of data fields, and that comprises hash values generated from each unique value among the data values stored within the first data field; receive, at the first node device, from a control device, and at least partially in parallel with other node devices of the multiple node devices, query instructions specifying search criteria of a search to be performed of the data set for data records that meet the search criteria, wherein the search criteria comprises at least one data value to be searched for within the first data field; in response to the receipt of the query instructions, generate a first hash value from a first data value of the at least one data value of the search criteria, and for each data cell within the first super cell, the processor is caused to perform operations of the search, the operations comprising; compare the first hash value to the hash values within the first hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record that meets the search criteria for at least the first data value; and in response to a determination that the data cell includes at least one data record that meets the search criteria, search the data records of the data cell to identify one or more data records that meet the search criteria; and in response to identifying at least one data record within at least one data cell of the first super cell that meets the search criteria for at least the first data value, the processor is caused to perform operations comprising; generate results data indicative of the first super cell including at least one data record that meets the search criteria for at least the first data value; and provide the results data to the control device. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-implemented method comprising:
-
storing, at a first node device of multiple node devices, a first super cell of multiple super cells into which a data set is divided from a data file maintained by at least one data device, wherein; the multiple super cells are distributed among the multiple node devices; each super cell comprises multiple data cells; each data cell of the multiple data cells comprises multiple data records; and each data record of the multiple data records comprises a set of data fields at which data values of the data set are stored; storing, for each data cell within the first super cell, a cell index that corresponds to the data cell, wherein the cell index comprises a first hash values vector that corresponds to a first data field of the set of data fields, and that comprises hash values generated from each unique value among the data values stored within the first data field; receiving, at the first node device, from a control device, and at least partially in parallel with other node devices of the multiple node devices, query instructions specifying search criteria of a search to be performed of the data set for data records that meet the search criteria, wherein the search criteria comprises at least one data value to be searched for within the first data field; in response to the receipt of the query instructions, generating a first hash value from a first data value of the at least one data value of the search criteria, and for each data cell within the first super cell, performing operations of the search, the operations comprising; comparing the first hash value to the hash values within the first hash values vector in the corresponding cell index to determine whether the data cell includes at least one data record that meets the search criteria for at least the first data value; and in response to a determination that the data cell includes at least one data record that meets the search criteria, searching the data records of the data cell to identify one or more data records that meet the search criteria; and in response to identifying at least one data record within at least one data cell of the first super cell that meets the search criteria for at least the first data value, performing operations comprising; generating results data indicative of the first super cell including at least one data record that meets the search criteria for at least the first data value; and providing the results data to the control device. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification