Efficient data infrastructure for high dimensional data analysis
First Claim
1. A method for generating an inverted index for retrieval of data from a data structure, the method comprising:
- generating a file model structure comprising a raw value data file, a mapped value file, and a dimension table, the generating comprising;
populating the dimension table with mapped values and ranges of raw values of a dimension of a data structure, the ranges of raw values mapped to corresponding mapped values; and
for raw values within the dimension of the data structure;
populating the raw value data file with a raw value of the dimension; and
populating the mapped value file with a mapped value corresponding to the raw value within the raw value data file based upon a mapping within the dimension table; and
generating an inverted index comprising a recordID file and a Count-Offset file, the generating comprising;
for mapped values within the dimension table;
populating the recordID file with one or more record identifiers associated with a raw value mapped to a mapped value by falling within a range of raw values within the dimension table; and
populating the Count-Offset file with a count value and an offset value of the mapped value based upon the populating of the recordID file.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a technology by which high dimensional source data corresponding to rows of records with identifiers, and columns comprising dimensions of data values, are processed into a file model for efficient access. An inverted index corresponding to any dimension is built by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table. The record identifiers are arranged into subgroups based on their mapped value; a count and/or an offset may be maintained for locating each of the subgroups. The raw values for a dimension are maintained within a raw value file. For sparse data, the raw value file may be compressed, e.g., by excluding nulls and associating a record identifier with each non-null. A data manager provides access to data in the data files, such as by offering various functions, using caching for efficiency.
123 Citations
20 Claims
-
1. A method for generating an inverted index for retrieval of data from a data structure, the method comprising:
-
generating a file model structure comprising a raw value data file, a mapped value file, and a dimension table, the generating comprising; populating the dimension table with mapped values and ranges of raw values of a dimension of a data structure, the ranges of raw values mapped to corresponding mapped values; and for raw values within the dimension of the data structure; populating the raw value data file with a raw value of the dimension; and populating the mapped value file with a mapped value corresponding to the raw value within the raw value data file based upon a mapping within the dimension table; and generating an inverted index comprising a recordID file and a Count-Offset file, the generating comprising; for mapped values within the dimension table; populating the recordID file with one or more record identifiers associated with a raw value mapped to a mapped value by falling within a range of raw values within the dimension table; and populating the Count-Offset file with a count value and an offset value of the mapped value based upon the populating of the recordID file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for generating an inverted index for retrieval of data from a data structure, the system comprising:
-
a data importer and processing mechanism coupled to database and configured to;
generate a file model structure comprising;a dimension table comprising mapped values and ranges of raw values of a dimension of a data structure, the ranges of raw values mapped to corresponding mapped value; a raw value file comprising raw values of the dimension; and
a mapped value file comprising mapped values corresponding to the raw values within the raw value data file based upon a mapping within the dimension table; andgenerate an inverted index comprising; a recordID file comprising record identifiers associated with raw values mapped to mapped values by falling within a range of raw values within the dimension table; and a Count-Offset file comprising count values and offset values defining the location of particular record identifiers within the recordID file. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A method for generating an inverted index for retrieval of data from a data structure comprising record identifiers and one or more dimensions, a dimension comprising one or more raw values, the method comprising:
generating an inverted index comprising a recordID file and a Count-Offset file, the generating comprising; for mapped values within a dimension table comprising mapped values mapped to ranges of raw values of a dimension; populating the recordID file with one or more record identifiers associated with a raw value mapped to a mapped value by falling within a range of raw values within the dimension table; and populating the Count-Offset file with a count value and an offset value of the mapped value based upon the populating of the recordID file. - View Dependent Claims (16, 17, 18, 19, 20)
Specification