Inline tree data structure for high-speed searching and filtering of large datasets

US 8,977,656 B2
Filed: 01/10/2012
Issued: 03/10/2015
Est. Priority Date: 01/10/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

receiving from a computer-readable storage medium first electronic indicia of a dataset comprising a multitude of alphanumeric data records, each data record including data strings for multiple corresponding defined data fields; and

using one or more computer processors programmed therefor and operatively coupled to the first storage medium, generating second electronic indicia of the data set comprising (i) an alphanumeric or binary clump header table comprising a plurality of clump data records and (ii) an inline tree data structure; and

storing the inline tree data structure and the clump header table on a computer-readable storage medium operatively coupled to the one or more computer processors,wherein;

for a first set of one or more data fields among the defined data fields, each range of data strings for the first set of data fields is divided into multiple corresponding subranges, and the multitude of data records comprises multiple first-level subsets of the data records, wherein each first-level subset includes only those data records for which each data string of the first set of data fields falls within a corresponding one of the subranges;

for a second set of one or more data fields among the defined data fields, each range of data strings for the second set of data fields is divided into multiple corresponding subranges, and each one of the multiple first-level subsets of the data records comprises multiple corresponding second-level subsets of the data records, wherein each second-level subset includes only those data records for which each data string of the second set of data fields falls within a corresponding one of the subranges;

the inline tree data structure comprises an alternating sequence of (i) multiple first-level binary string segments, each followed by (ii) a subset of one or more corresponding second-level binary string segments;

each first-level binary string segment encodes the range of data strings in a selected filterable subset of the first set of data fields of a corresponding one of the first-level subsets of the data records, and excludes a non-filterable subset of the first set of data fields;

each second-level binary string segment encodes the range of data strings in a selected filterable subset of the second set of data fields of a corresponding one of the second-level subsets of the data records, and excludes a non-filterable subset of the second set of data fields;

for a selected subset of the defined data fields, each combination of specific data strings that occurs in the dataset is indicated by a corresponding one of the plurality of clump data records of the clump header table; and

each clump data record in the clump header table includes an indicator of a location in the inline tree data structure of a corresponding first-level binary string segment.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A data structure comprises a clump header table and an inline tree data structure. The inline tree, representing filterable data fields of hierarchically organized data records, comprises an alternating sequence of first-level binary string segments, each followed by one or more corresponding second-level binary string segments. Each clump header record includes an indicator of a location in the inline tree of corresponding binary string segments. A dedicated, specifically adapted conversion program generates the clump header file and the inline tree for storage on any computer-readable medium, and the inline tree can be read entirely into RAM to be searched or filtered. A dedicated, specifically adapted search and filter program is employed to list or enumerate retrieved data records. Run-time computer code generation can reduce time required for searching and filtering. One example includes spatial searching and filtering of data records that include spatial coordinates as data fields.

71 Citations

View as Search Results

31 Claims

1. A computer-implemented method comprising:
- receiving from a computer-readable storage medium first electronic indicia of a dataset comprising a multitude of alphanumeric data records, each data record including data strings for multiple corresponding defined data fields; and
  
  using one or more computer processors programmed therefor and operatively coupled to the first storage medium, generating second electronic indicia of the data set comprising (i) an alphanumeric or binary clump header table comprising a plurality of clump data records and (ii) an inline tree data structure; and
  
  storing the inline tree data structure and the clump header table on a computer-readable storage medium operatively coupled to the one or more computer processors,wherein;
  
  for a first set of one or more data fields among the defined data fields, each range of data strings for the first set of data fields is divided into multiple corresponding subranges, and the multitude of data records comprises multiple first-level subsets of the data records, wherein each first-level subset includes only those data records for which each data string of the first set of data fields falls within a corresponding one of the subranges;
  
  for a second set of one or more data fields among the defined data fields, each range of data strings for the second set of data fields is divided into multiple corresponding subranges, and each one of the multiple first-level subsets of the data records comprises multiple corresponding second-level subsets of the data records, wherein each second-level subset includes only those data records for which each data string of the second set of data fields falls within a corresponding one of the subranges;
  
  the inline tree data structure comprises an alternating sequence of (i) multiple first-level binary string segments, each followed by (ii) a subset of one or more corresponding second-level binary string segments;
  
  each first-level binary string segment encodes the range of data strings in a selected filterable subset of the first set of data fields of a corresponding one of the first-level subsets of the data records, and excludes a non-filterable subset of the first set of data fields;
  
  each second-level binary string segment encodes the range of data strings in a selected filterable subset of the second set of data fields of a corresponding one of the second-level subsets of the data records, and excludes a non-filterable subset of the second set of data fields;
  
  for a selected subset of the defined data fields, each combination of specific data strings that occurs in the dataset is indicated by a corresponding one of the plurality of clump data records of the clump header table; and
  
  each clump data record in the clump header table includes an indicator of a location in the inline tree data structure of a corresponding first-level binary string segment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 wherein:
    - for a third set of data field among the defined data fields, each range of data strings for the third set of data fields is divided into multiple corresponding subranges, and each one of the multiple second-level subsets of the data records comprises multiple corresponding third-level subsets of the data records, wherein each third-level subset includes only those data records for which the data string of the third data field falls within a corresponding one of the subranges; and
      
      the inline tree data structure further comprises a subset of one or more corresponding third-level binary string segments following each second-level binary string segment; and
      
      each third-level binary string segment encodes the range of data strings in the third set of data fields of a corresponding one of the third-level subsets of the data records.
  - 3. The method of claim 2 wherein each second-level binary string segment and one or more corresponding third-level binary string segments form a substantially contiguous portion within the inline tree data structure.
  - 4. The method of claim 2 wherein each lowest-level binary string segment includes a process control field that indicates that the next binary string segment of the inline tree data structure is (i) at the same level, (ii) one level higher, (iii) two levels higher, or (iv) at a higher level that corresponds to a different data clump record.
  - 5. The method of claim 2 wherein each of the first- or second-level binary string segments includes an indicator of (i) a length of the corresponding binary string segment, (ii) a location of a next binary string segment of the same level, or (iii) a location of a last lowest-level binary string that follows the corresponding binary string segment.
  - 6. The method of claim 1 wherein each first-level binary string segment and one or more corresponding second-level binary string segments form a substantially contiguous portion within the inline tree data structure.
  - 7. The method of claim 1 wherein each lowest-level binary string segment includes a process control field that indicates that the next binary string segment of the inline tree data structure is (i) at the same level, (ii) one level higher, (iii) two levels higher, or (iv) at a higher level that corresponds to a different data clump record.
  - 8. The method of claim 1 wherein each of the first- or second-level binary string segments includes an indicator of (i) a length of the corresponding binary string segment, (ii) a location of a next binary string segment of the same level, or (iii) a location of a last lowest-level binary string that follows the corresponding binary string segment.
  - 9. The method of claim 1 wherein (i) the dataset comprises a set of data records that each include geographic coordinates, and (ii) the selected subset of the defined data fields are linked to the geographic coordinates.
  - 10. The method of claim 9 wherein the dataset comprises a multitude of voter registration records.
  - 11. The method of claim 9 wherein the dataset comprises a multitude of census data records.

12. A computer-implemented method for searching or filtering an inline tree data structure and a clump header table stored on a computer-readable medium, wherein:
- the clump header table and the inline tree data structure comprise at least a portion of electronic indicia generated from a dataset comprising a multitude of alphanumeric data records, each data record including data strings for multiple corresponding defined data fields;
  
  for a first set of one or more data fields among the defined data fields, each range of data strings for the first set of data fields is divided into multiple corresponding subranges, and the multitude of data records comprises multiple first-level subsets of the data records, wherein each first-level subset includes only those data records for which each data string of the first set of data fields falls within a corresponding one of the subranges;
  
  for a second set of one or more data fields among the defined data fields, each range of data strings for the second set of data fields is divided into multiple corresponding subranges, and each one of the multiple first-level subsets of the data records comprises multiple corresponding second-level subsets of the data records, wherein each second-level subset includes only those data records for which each data string of the second set of data fields falls within a corresponding one of the subranges;
  
  the inline tree data structure comprises an alternating sequence of (i) multiple first-level binary string segments, each followed by (ii) a subset of one or more corresponding second-level binary string segments;
  
  each first-level binary string segment encodes the range of data strings in a selected filterable subset of the first set of data fields of a corresponding one of the first-level subsets of the data records, and excludes a non-filterable subset of the first set of data fields;
  
  each second-level binary string segment encodes the range of data strings in a selected filterable subset of the second set of data fields of a corresponding one of the second-level subsets of the data records, and excludes a non-filterable subset of the second set of data fields;
  
  for a selected subset of the defined data fields, each combination of specific data strings that occurs in the dataset is indicated by a corresponding one of the plurality of clump data records of the clump header table; and
  
  each clump data record in the clump header table includes an indicator of a location in the inline tree data structure of a corresponding first-level binary string segment,wherein the method comprises;
  
  (a) receiving an electronic query for data records, or an enumeration thereof, having data strings in one or more specified clumped or filterable data fields that fall within corresponding specified filter subranges for those data fields;
  
  (b) in response to the query of part (a), with a computer processor programmed therefor and linked to the computer-readable medium, automatically electronically interrogating the clump header table to identify one or more clump data records that correspond to data strings in specified clump data fields that fall within the specified filter subranges according to the query of part (a);
  
  (c) automatically electronically interrogating, with a computer processor programmed therefor and linked to the computer-readable medium, those first-level binary string segments indicated by the clump data records identified in part (b), to identify one or more first-level binary string segments that indicate one or more data records that have data strings in specified filterable data fields within the specified filter subranges according to the query of in part (a);
  
  (d) automatically electronically interrogating, with a computer processor programmed therefor and linked to the computer-readable medium, those second-level binary string segments corresponding to the first-level binary string segments identified in part (c), to identify one or more second-level binary string segments that indicate one or more data records in specified filterable data fields that have data strings within the specified filter subranges according to the query of part (a); and
  
  (e) automatically generating, with a computer processor programmed therefor, and storing, on a computer-readable medium coupled to that processor, a list or an enumeration of one or more data records that correspond to the clump data records identified in part (b), the first-level binary strings segments identified in part (c), or the second-level binary strings identified in part (d).
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 13. The method of claim 12 wherein the inline tree data structure is stored in one or more computer-readable media that are directly accessible to the computer processor of part (c) or (d).
  - 14. The method of claim 13 wherein the directly accessible computer media include random access memory.
  - 15. The method of claim 13 further comprising loading sequentially, according to location within the inline tree data structure, the binary string segments into a processor cache computer memory.
  - 16. The method of claim 12 wherein the clump header table is stored in a computer-readable medium that is directly accessible to the computer processor of part (b).
  - 17. The method of claim 12 further comprising, in response to the query of part (a), with a computer processor programmed therefor and linked to the computer-readable medium, automatically electronically generating computer code for performing parts (c) and (d).
  - 18. The method of claim 17, wherein the generated computer code causes binary string segments corresponding to non-specified filterable data fields to be skipped over without requiring a processor decision in parts (c) and (d).
  - 19. The method of claim 17 wherein the generated computer code encodes one or more of the corresponding filter subranges specified in the query of part (a).
  - 20. The method of claim 12 wherein the interrogation of part (c) or (d) includes evaluating a corresponding process control field of the interrogated binary string segment to determine that the next binary string segment of the inline tree data structure is (i) at the same level, (ii) one level higher, (iii) two levels higher, or (iv) at a higher level that corresponds to a different data clump record.
  - 21. The method of claim 12 wherein the one or more binary data files indicate at least 1,000,000 data records, and the interrogations of parts (a), (b), and (c) are performed in less than 150 nanoseconds per data record per processor core.
  - 22. The method of claim 12 wherein the one or more binary data files indicate at least 10,000,000 data records, and the interrogations of parts (b) and (c) are performed in less than 150 nanoseconds per data record per processor core.
  - 23. The method of claim 12 wherein one or more computer-readable media directly accessible to the computer processor of part (c) or (d) are encoded to store the inline tree data structure.
  - 24. The method of claim 23 wherein one or more of the computer-readable media comprise random access memory.
  - 25. The method of claim 23 further comprising loading sequentially, according to location within the inline tree data structure, the binary string segments into a processor cache computer memory.
  - 26. The method of claim 12 wherein (i) the dataset comprises a set of data records that each include geographic coordinates, and (ii) the selected subset of the defined data fields are linked to the geographic coordinates.
  - 27. The method of claim 26 wherein the dataset comprises a multitude of voter registration records.
  - 28. The method of claim 26 wherein the dataset comprises a multitude of census data records.
  - 29. The method of claim 26 further comprising:
    - generating a graphical representation of the list or enumeration generated in part (e); and
      
      generating an image or animation of the graphical representation overlaid on a map.
  - 30. The method of claim 29 wherein the dataset comprises a multitude of voter registration records.
  - 31. The method of claim 29 wherein the dataset comprises a multitude of census data records.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Moonshadow Mobile, Inc.
Original Assignee
Moonshadow Mobile, Inc.
Inventors
Ward, Roy W.
Primary Examiner(s)
Jalil, Neveen Abel
Assistant Examiner(s)
BISKEBORN, KRISTOFER M

Application Number

US13/347,646
Publication Number

US 20120179699A1
Time in Patent Office

1,155 Days
Field of Search

707/754, 707/797, 707/811
US Class Current

707/811
CPC Class Codes

G06F 16/1734   Details of monitoring file ...

G06F 16/2246   Trees, e.g. B+trees

G06F 16/282   Hierarchical databases, e.g...

G06F 16/285   Clustering or classification

G06F 16/29   Geographical information da...

G06F 16/9535   Search customisation based ...

G06F 40/146   Coding or compression of tr...

G06F 40/149   Adaptation of the text data...

G06Q 2230/00   Voting or election arrangem...

Inline tree data structure for high-speed searching and filtering of large datasets

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

71 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Inline tree data structure for high-speed searching and filtering of large datasets

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

71 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links