Efficient data infrastructure for high dimensional data analysis
First Claim
1. In a computing environment in which source data is arranged as a data structure comprising record identifiers and dimensions, each dimension having a data value, which may be non-null or null for each record identifier, a method comprising, constructing an inverted index corresponding to a dimension, including by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table, and arranging the record identifiers into subgroups within an record identifier data structure based on each record identifier'"'"'s corresponding mapped value in the dimension table.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a technology by which high dimensional source data corresponding to rows of records with identifiers, and columns comprising dimensions of data values, are processed into a file model for efficient access. An inverted index corresponding to any dimension is built by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table. The record identifiers are arranged into subgroups based on their mapped value; a count and/or an offset may be maintained for locating each of the subgroups. The raw values for a dimension are maintained within a raw value file. For sparse data, the raw value file may be compressed, e.g., by excluding nulls and associating a record identifier with each non-null. A data manager provides access to data in the data files, such as by offering various functions, using caching for efficiency.
-
Citations
20 Claims
- 1. In a computing environment in which source data is arranged as a data structure comprising record identifiers and dimensions, each dimension having a data value, which may be non-null or null for each record identifier, a method comprising, constructing an inverted index corresponding to a dimension, including by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table, and arranging the record identifiers into subgroups within an record identifier data structure based on each record identifier'"'"'s corresponding mapped value in the dimension table.
-
8. At least one computer-readable medium having computer-executable instructions, which when executed perform steps, comprising:
-
processing high dimensional data corresponding to rows of record identifiers by columns of dimensions, including constructing a raw value file by which raw values for a dimension can be located, and constructing an inverted index containing subgroups of one or more record identifiers, each subgroup defined by a mapping value based on the raw value associated with each record identifier of that subgroup; and providing access to the raw value file and inverted index for use in analyzing the high dimensional data. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. In a computing environment, a system comprising:
-
a data importer and processing mechanism coupled to a data source containing data corresponding to rows of record identifiers by columns of dimensions, the data importer and processing mechanism writing files containing information corresponding to the data, including a raw value file by which raw values for a dimension can be located, and constructing an inverted index file containing subgroups of one or more record identifiers, each subgroup defined by a mapping value based on the raw value associated with each record identifier of that subgroup; and a data manager that provides access to data in the data files. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification