Inline tree data structure for high-speed searching and filtering of large datasets
First Claim
1. A computer system comprising at least one processor structured and programmed to perform a method comprising:
- receiving from a computer-readable storage medium first electronic indicia of a dataset comprising a multitude of alphanumeric data records, each data record including data strings for multiple corresponding defined data fields; and
using one or more computer processors programmed therefor and operatively coupled to the first storage medium, generating second electronic indicia of the data set comprising (i) an alphanumeric or binary clump header table comprising a plurality of clump data records and (ii) an inline tree data structure; and
storing the inline tree data structure and the clump header table on a computer-readable storage medium operatively coupled to the one or more computer processors,wherein;
for a first set of one or more data fields among the defined data fields, each range of data strings for the first set of data fields is divided into multiple corresponding subranges, and the multitude of data records comprises multiple first-level subsets of the data records, wherein each first-level subset includes only those data records for which each data string of the first set of data fields falls within a corresponding one of the subranges;
for a second set of one or more data fields among the defined data fields, each range of data strings for the second set of data fields is divided into multiple corresponding subranges, and each one of the multiple first-level subsets of the data records comprises multiple corresponding second-level subsets of the data records, wherein each second-level subset includes only those data records for which each data string of the second set of data fields falls within a corresponding one of the subranges;
the inline tree data structure comprises an alternating sequence of (i) multiple first-level binary string segments, each followed by (ii) a subset of one or more corresponding second-level binary string segments;
each first-level binary string segment encodes the range of data strings in a selected filterable subset of the first set of data fields of a corresponding one of the first-level subsets of the data records, and excludes a non-filterable subset of the first set of data fields;
each second-level binary string segment encodes the range of data strings in a selected filterable subset of the second set of data fields of a corresponding one of the second-level subsets of the data records, and excludes a non-filterable subset of the second set of data fields;
for a selected subset of the defined data fields, each combination of specific data strings that occurs in the dataset is indicated by a corresponding one of the plurality of clump data records of the clump header table; and
each clump data record in the clump header table includes an indicator of a location in the inline tree data structure of a corresponding first-level binary string segment.
0 Assignments
0 Petitions
Accused Products
Abstract
A data structure comprises a clump header table and an inline tree data structure. The inline tree, representing filterable data fields of hierarchically organized data records, comprises an alternating sequence of first-level binary string segments, each followed by one or more corresponding second-level binary string segments. Each clump header record includes an indicator of a location in the inline tree of corresponding binary string segments. A dedicated, specifically adapted conversion program generates the clump header file and the inline tree for storage on any computer-readable medium, and the inline tree can be read entirely into RAM to be searched or filtered. A dedicated, specifically adapted search and filter program is employed to list or enumerate retrieved data records. Run-time computer code generation can reduce time required for searching and filtering. One example includes spatial searching and filtering of data records that include spatial coordinates as data fields.
70 Citations
27 Claims
-
1. A computer system comprising at least one processor structured and programmed to perform a method comprising:
-
receiving from a computer-readable storage medium first electronic indicia of a dataset comprising a multitude of alphanumeric data records, each data record including data strings for multiple corresponding defined data fields; and using one or more computer processors programmed therefor and operatively coupled to the first storage medium, generating second electronic indicia of the data set comprising (i) an alphanumeric or binary clump header table comprising a plurality of clump data records and (ii) an inline tree data structure; and storing the inline tree data structure and the clump header table on a computer-readable storage medium operatively coupled to the one or more computer processors, wherein; for a first set of one or more data fields among the defined data fields, each range of data strings for the first set of data fields is divided into multiple corresponding subranges, and the multitude of data records comprises multiple first-level subsets of the data records, wherein each first-level subset includes only those data records for which each data string of the first set of data fields falls within a corresponding one of the subranges; for a second set of one or more data fields among the defined data fields, each range of data strings for the second set of data fields is divided into multiple corresponding subranges, and each one of the multiple first-level subsets of the data records comprises multiple corresponding second-level subsets of the data records, wherein each second-level subset includes only those data records for which each data string of the second set of data fields falls within a corresponding one of the subranges; the inline tree data structure comprises an alternating sequence of (i) multiple first-level binary string segments, each followed by (ii) a subset of one or more corresponding second-level binary string segments; each first-level binary string segment encodes the range of data strings in a selected filterable subset of the first set of data fields of a corresponding one of the first-level subsets of the data records, and excludes a non-filterable subset of the first set of data fields; each second-level binary string segment encodes the range of data strings in a selected filterable subset of the second set of data fields of a corresponding one of the second-level subsets of the data records, and excludes a non-filterable subset of the second set of data fields; for a selected subset of the defined data fields, each combination of specific data strings that occurs in the dataset is indicated by a corresponding one of the plurality of clump data records of the clump header table; and each clump data record in the clump header table includes an indicator of a location in the inline tree data structure of a corresponding first-level binary string segment.
-
-
2. An article comprising a tangible, non-transitory medium encoding computer-readable instructions that, when applied to a computer system, instruct the computer system to perform a method comprising:
-
receiving from a computer-readable storage medium first electronic indicia of a dataset comprising a multitude of alphanumeric data records, each data record including data strings for multiple corresponding defined data fields; and using one or more computer processors programmed therefor and operatively coupled to the first storage medium, generating second electronic indicia of the data set comprising (i) an alphanumeric or binary clump header table comprising a plurality of clump data records and (ii) an inline tree data structure; and storing the inline tree data structure and the clump header table on a computer-readable storage medium operatively coupled to the one or more computer processors, wherein; for a first set of one or more data fields among the defined data fields, each range of data strings for the first set of data fields is divided into multiple corresponding subranges, and the multitude of data records comprises multiple first-level subsets of the data records, wherein each first-level subset includes only those data records for which each data string of the first set of data fields falls within a corresponding one of the subranges; for a second set of one or more data fields among the defined data fields, each range of data strings for the second set of data fields is divided into multiple corresponding subranges, and each one of the multiple first-level subsets of the data records comprises multiple corresponding second-level subsets of the data records, wherein each second-level subset includes only those data records for which each data string of the second set of data fields falls within a corresponding one of the subranges; the inline tree data structure comprises an alternating sequence of (i) multiple first-level binary string segments, each followed by (ii) a subset of one or more corresponding second-level binary string segments; each first-level binary string segment encodes the range of data strings in a selected filterable subset of the first set of data fields of a corresponding one of the first-level subsets of the data records, and excludes a non-filterable subset of the first set of data fields; each second-level binary string segment encodes the range of data strings in a selected filterable subset of the second set of data fields of a corresponding one of the second-level subsets of the data records, and excludes a non-filterable subset of the second set of data fields; for a selected subset of the defined data fields, each combination of specific data strings that occurs in the dataset is indicated by a corresponding one of the plurality of clump data records of the clump header table; and each clump data record in the clump header table includes an indicator of a location in the inline tree data structure of a corresponding first-level binary string segment.
-
-
3. An article comprising a tangible, non-transitory computer-readable medium encoded to store an inline tree data structure and an alphanumeric or binary clump header table generated by a method comprising:
-
receiving from a computer-readable storage medium first electronic indicia of a dataset comprising a multitude of alphanumeric data records, each data record including data strings for multiple corresponding defined data fields; and using one or more computer processors programmed therefor and operatively coupled to the first storage medium, generating second electronic indicia of the data set comprising (i) the clump header table comprising a plurality of clump data records and (ii) the inline tree data structure; and storing the inline tree data structure and the clump header table on a computer-readable storage medium operatively coupled to the one or more computer processors, wherein; for a first set of one or more data fields among the defined data fields, each range of data strings for the first set of data fields is divided into multiple corresponding subranges, and the multitude of data records comprises multiple first-level subsets of the data records, wherein each first-level subset includes only those data records for which each data string of the first set of data fields falls within a corresponding one of the subranges; for a second set of one or more data fields among the defined data fields, each range of data strings for the second set of data fields is divided into multiple corresponding subranges, and each one of the multiple first-level subsets of the data records comprises multiple corresponding second-level subsets of the data records, wherein each second-level subset includes only those data records for which each data string of the second set of data fields falls within a corresponding one of the subranges; the inline tree data structure comprises an alternating sequence of (i) multiple first-level binary string segments, each followed by (ii) a subset of one or more corresponding second-level binary string segments; each first-level binary string segment encodes the range of data strings in a selected filterable subset of the first set of data fields of a corresponding one of the first-level subsets of the data records, and excludes a non-filterable subset of the first set of data fields; each second-level binary string segment encodes the range of data strings in a selected filterable subset of the second set of data fields of a corresponding one of the second-level subsets of the data records, and excludes a non-filterable subset of the second set of data fields; for a selected subset of the defined data fields, each combination of specific data strings that occurs in the dataset is indicated by a corresponding one of the plurality of clump data records of the clump header table; and each clump data record in the clump header table includes an indicator of a location in the inline tree data structure of a corresponding first-level binary string segment. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer-implemented method for searching or filtering electronic indicia of a dataset stored on a computer-readable medium, the method comprising:
-
(a) receiving an electronic query for data records or the dataset, or an enumeration thereof, having data strings in one or more specified data fields that fall within corresponding specified filter subranges for those data fields; (b) in response to the query of part (a), with a computer processor programmed therefor, automatically electronically generating at least a portion of computer code for automatically electronically interrogating the electronic indicia of the dataset; (c) in response to the query of part (a), with a computer processor linked to the computer-readable medium and programmed at least partly according to the code generated in part (b), automatically electronically interrogating the electronic indicia of the dataset to identify one or more data records that correspond to data strings that fall within the specified filter subranges according to the query of part (a); and (d) automatically generating, with a computer processor programmed therefor, and storing, on a computer-readable medium coupled to that processor, a list or an enumeration of one or more data records that correspond to the data records identified in part (c). - View Dependent Claims (24, 25)
-
-
26. A computer system comprising one or more computer processors and one or more computer-readable storage media operatively coupled to one or more of the computer processors, wherein the computer system is structured, connected, and programmed to perform a method for searching or filtering electronic indicia of a dataset stored on a computer-readable medium, wherein the method comprises:
-
(a) receiving an electronic query for data records or the dataset, or an enumeration thereof, having data strings in one or more specified data fields that fall within corresponding specified filter subranges for those data fields; (b) in response to the query of part (a), with a computer processor programmed therefor, automatically electronically generating at least a portion of computer code for automatically electronically interrogating the electronic indicia of the dataset; (c) in response to the query of part (a), with a computer processor linked to the computer-readable medium and programmed at least partly according to the code generated in part (b), automatically electronically interrogating the electronic indicia of the dataset to identify one or more data records that correspond to data strings that fall within the specified filter subranges according to the query of part (a); and (d) automatically generating, with a computer processor programmed therefor, and storing, on a computer-readable medium coupled to that processor, a list or an enumeration of one or more data records that correspond to the data records identified in part (c).
-
-
27. An article comprising a tangible, non-transitory medium encoding computer-readable instructions that, when applied to a computer system comprising at least one processor, instruct the computer system to perform a method for searching or filtering electronic indicia of a dataset stored on a computer-readable medium, wherein the method comprises:
-
(a) receiving an electronic query for data records or the dataset, or an enumeration thereof, having data strings in one or more specified data fields that fall within corresponding specified filter subranges for those data fields; (b) in response to the query of part (a), with a computer processor programmed therefor, automatically electronically generating at least a portion of computer code for automatically electronically interrogating the electronic indicia of the dataset; (c) in response to the query of part (a), with a computer processor linked to the computer-readable medium and programmed at least partly according to the code generated in part (b), automatically electronically interrogating the electronic indicia of the dataset to identify one or more data records that correspond to data strings that fall within the specified filter subranges according to the query of part (a); and (d) automatically generating, with a computer processor programmed therefor, and storing, on a computer-readable medium coupled to that processor, a list or an enumeration of one or more data records that correspond to the data records identified in part (c).
-
Specification