×

Inline tree data structure for high-speed searching and filtering of large datasets

  • US 8,977,656 B2
  • Filed: 01/10/2012
  • Issued: 03/10/2015
  • Est. Priority Date: 01/10/2011
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • receiving from a computer-readable storage medium first electronic indicia of a dataset comprising a multitude of alphanumeric data records, each data record including data strings for multiple corresponding defined data fields; and

    using one or more computer processors programmed therefor and operatively coupled to the first storage medium, generating second electronic indicia of the data set comprising (i) an alphanumeric or binary clump header table comprising a plurality of clump data records and (ii) an inline tree data structure; and

    storing the inline tree data structure and the clump header table on a computer-readable storage medium operatively coupled to the one or more computer processors,wherein;

    for a first set of one or more data fields among the defined data fields, each range of data strings for the first set of data fields is divided into multiple corresponding subranges, and the multitude of data records comprises multiple first-level subsets of the data records, wherein each first-level subset includes only those data records for which each data string of the first set of data fields falls within a corresponding one of the subranges;

    for a second set of one or more data fields among the defined data fields, each range of data strings for the second set of data fields is divided into multiple corresponding subranges, and each one of the multiple first-level subsets of the data records comprises multiple corresponding second-level subsets of the data records, wherein each second-level subset includes only those data records for which each data string of the second set of data fields falls within a corresponding one of the subranges;

    the inline tree data structure comprises an alternating sequence of (i) multiple first-level binary string segments, each followed by (ii) a subset of one or more corresponding second-level binary string segments;

    each first-level binary string segment encodes the range of data strings in a selected filterable subset of the first set of data fields of a corresponding one of the first-level subsets of the data records, and excludes a non-filterable subset of the first set of data fields;

    each second-level binary string segment encodes the range of data strings in a selected filterable subset of the second set of data fields of a corresponding one of the second-level subsets of the data records, and excludes a non-filterable subset of the second set of data fields;

    for a selected subset of the defined data fields, each combination of specific data strings that occurs in the dataset is indicated by a corresponding one of the plurality of clump data records of the clump header table; and

    each clump data record in the clump header table includes an indicator of a location in the inline tree data structure of a corresponding first-level binary string segment.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×