Data organization and indexing related technology
First Claim
1. A computer-implemented method comprising:
- accessing, from an electronic data storage, data stored in the electronic data storage for multiple attribute classes, the accessed data comprising data values stored in the electronic data storage for each of at least two of the multiple attribute classes;
identifying redundancy characteristics of the accessed data within each of the at least two attribute classes by;
analyzing the accessed data values stored in the electronic data storage for each of the at least two attribute classes, andbased on the analysis of the accessed data values stored in the electronic data storage for each of the at least two attribute classes, determining a measure of redundancy within the accessed data values for each of the at least two attribute classes;
accessing a rule that indicates a preference to maintain an order of attribute classes within a dimension of attribute classes despite redundancy characteristics, the dimension of attribute classes defining a subset of attribute classes that have a parent-child relationship;
identifying a dimension of attribute classes included in the multiple attribute classes, the dimension of attribute classes being arranged in a particular order in the electronic data storage based on a parent-child relationship;
determining a relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes, including maintaining the particular order of the attribute classes included in the dimension despite the determined measure of redundancy within the accessed data values for each of the at least two attribute classes indicating that a different order of the attribute classes included in the dimension is preferred;
organizing the accessed data based on the determined relative order among the multiple attribute classes;
compressing, using run length encoding, the organized data in the determined relative order among the multiple attribute classes;
generating an index that is descriptive of the compressed data; and
storing, in electronic storage, the encoded data and the generated index to enable subsequent searching of the encoded data using the generated index,wherein determining the measure of redundancy within the accessed data values for each of the at least two attribute classes comprises;
determining a number of distinct data values within an attribute class;
determining a parameter for a distinct data value within the attribute class, the parameter reflecting contribution of the distinct data value to the entirety of data values within the attribute class; and
determining a redundancy measure for the attribute class based on the number of distinct data values within the attribute class and the determined parameter; and
wherein determining the relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes comprises determining the relative order among the multiple attribute classes based on the determined redundancy measure for the attribute class that is based on the number of distinct data values within the attribute class and the determined parameter.
2 Assignments
0 Petitions
Accused Products
Abstract
Data organization and indexing, in which data that includes information for multiple attribute classes is accessed and redundancy characteristics of the accessed data within each of at least two of the multiple attribute classes are identified. Based on the identified redundancy characteristics, a relative order among the multiple attribute classes of the accessed data is determined and the accessed data is organized based on the determined relative order. The organized data is compressed using run length encoding and an index that is descriptive of the compressed data is generated. The encoded data and the generated index are stored to enable subsequent searching of the encoded data using the generated index.
30 Citations
17 Claims
-
1. A computer-implemented method comprising:
-
accessing, from an electronic data storage, data stored in the electronic data storage for multiple attribute classes, the accessed data comprising data values stored in the electronic data storage for each of at least two of the multiple attribute classes; identifying redundancy characteristics of the accessed data within each of the at least two attribute classes by; analyzing the accessed data values stored in the electronic data storage for each of the at least two attribute classes, and based on the analysis of the accessed data values stored in the electronic data storage for each of the at least two attribute classes, determining a measure of redundancy within the accessed data values for each of the at least two attribute classes; accessing a rule that indicates a preference to maintain an order of attribute classes within a dimension of attribute classes despite redundancy characteristics, the dimension of attribute classes defining a subset of attribute classes that have a parent-child relationship; identifying a dimension of attribute classes included in the multiple attribute classes, the dimension of attribute classes being arranged in a particular order in the electronic data storage based on a parent-child relationship; determining a relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes, including maintaining the particular order of the attribute classes included in the dimension despite the determined measure of redundancy within the accessed data values for each of the at least two attribute classes indicating that a different order of the attribute classes included in the dimension is preferred; organizing the accessed data based on the determined relative order among the multiple attribute classes; compressing, using run length encoding, the organized data in the determined relative order among the multiple attribute classes; generating an index that is descriptive of the compressed data; and storing, in electronic storage, the encoded data and the generated index to enable subsequent searching of the encoded data using the generated index, wherein determining the measure of redundancy within the accessed data values for each of the at least two attribute classes comprises; determining a number of distinct data values within an attribute class; determining a parameter for a distinct data value within the attribute class, the parameter reflecting contribution of the distinct data value to the entirety of data values within the attribute class; and determining a redundancy measure for the attribute class based on the number of distinct data values within the attribute class and the determined parameter; and wherein determining the relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes comprises determining the relative order among the multiple attribute classes based on the determined redundancy measure for the attribute class that is based on the number of distinct data values within the attribute class and the determined parameter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An electronic system comprising:
-
at least one electronic data storage device; and at least one processor configured to perform operations comprising; accessing, from an electronic data storage, data stored in the electronic data storage for multiple attribute classes, the accessed data comprising data values stored in the electronic data storage for each of at least two of the multiple attribute classes; identifying redundancy characteristics of the accessed data within each of the at least two attribute classes by; analyzing the accessed data values stored in the electronic data storage for each of the at least two attribute classes, and based on the analysis of the accessed data values stored in the electronic data storage for each of the at least two attribute classes, determining a measure of redundancy within the accessed data values for each of the at least two attribute classes; accessing a rule that indicates a preference to maintain an order of attribute classes within a dimension of attribute classes despite redundancy characteristics, the dimension of attribute classes defining a subset of attribute classes that have a parent-child relationship; identifying a dimension of attribute classes included in the multiple attribute classes, the dimension of attribute classes being arranged in a particular order in the electronic data storage based on a parent-child relationship; determining a relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes, including maintaining the particular order of the attribute classes included in the dimension despite the determined measure of redundancy within the accessed data values for each of the at least two attribute classes indicating that a different order of the attribute classes included in the dimension is preferred; organizing the accessed data based on the determined relative order among the multiple attribute classes; compressing, using run length encoding, the organized data in the determined relative order among the multiple attribute classes; generating an index that is descriptive of the compressed data; and storing, in the at least one electronic data storage device, the compressed data and the generated index to enable subsequent searching of the compressed data using the generated index, wherein determining the measure of redundancy within the accessed data values for each of the at least two attribute classes comprises; determining a number of distinct data values within an attribute class; determining a parameter for a distinct data value within the attribute class, the parameter reflecting contribution of the distinct data value to the entirety of data values within the attribute class; and determining a redundancy measure for the attribute class based on the number of distinct data values within the attribute class and the determined parameter; and wherein determining the relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes comprises determining the relative order among the multiple attribute classes based on the determined redundancy measure for the attribute class that is based on the number of distinct data values within the attribute class and the determined parameter.
-
-
15. A computer-implemented method comprising:
-
accessing, from an electronic data storage, data stored in the electronic data storage for multiple attribute classes, the accessed data comprising data values stored in the electronic data storage for each of at least two of the multiple attribute classes; identifying redundancy characteristics of the accessed data within each of the at least two attribute classes by; analyzing the accessed data values stored in the electronic data storage for each of the at least two attribute classes, and based on the analysis of the accessed data values stored in the electronic data storage for each of the at least two attribute classes, determining a measure of redundancy within the accessed data values for each of the at least two attribute classes; accessing a rule that indicates a preference to maintain an order of attribute classes within a dimension of attribute classes despite redundancy characteristics, the dimension of attribute classes defining a subset of attribute classes that have a parent-child relationship; identifying a dimension of attribute classes included in the multiple attribute classes, the dimension of attribute classes being arranged in a particular order in the electronic data storage based on a parent-child relationship; determining a relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes, including maintaining the particular order of the attribute classes included in the dimension despite the determined measure of redundancy within the accessed data values for each of the at least two attribute classes indicating that a different order of the attribute classes included in the dimension is preferred; ordering the multiple attribute classes within the accessed data based on the determined relative order among the multiple attribute classes; encoding the accessed data within the ordered multiple attribute classes, with the encoding reflecting redundancies and uniqueness within the accessed data and accounting for the determined relative order among the multiple attribute classes; and storing, in electronic storage, the encoded data to enable subsequent searching of the encoded data, wherein determining the measure of redundancy within the accessed data values for each of the at least two attribute classes comprises; determining a number of distinct data values within an attribute class; determining a parameter for a distinct data value within the attribute class, the parameter reflecting contribution of the distinct data value to the entirety of data values within the attribute class; and determining a redundancy measure for the attribute class based on the number of distinct data values within the attribute class and the determined parameter; and wherein determining the relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes comprises determining the relative order among the multiple attribute classes based on the determined redundancy measure for the attribute class that is based on the number of distinct data values within the attribute class and the determined parameter.
-
-
16. A computer-implemented method comprising:
-
accessing, from an electronic data storage, data that includes information for multiple attribute classes; identifying redundancy characteristics of the accessed data within each of at least two of the multiple attribute classes; identifying a search frequency for each of the at least two attribute classes; determining a relative order among the multiple attribute classes based on; the identified redundancy characteristics of the accessed data within each of at least two attribute classes, and the identified search frequency for each of the at least two attribute classes; organizing the accessed data based on the determined relative order among the multiple attribute classes; compressing, using run length encoding, the organized data in the determined relative order among the multiple attribute classes; generating an index that is descriptive of the compressed data; and storing, in electronic storage, the encoded data and the generated index to enable subsequent searching of the encoded data using the generated index, wherein identifying redundancy characteristics of the accessed data within each of at least two of the multiple attribute classes comprises; determining a number of distinct data values within an attribute class; determining a parameter for a distinct data value within the attribute class, the parameter reflecting contribution of the distinct data value to the entirety of data values within the attribute class; and determining a redundancy measure for the attribute class based on the number of distinct data values within the attribute class and the determined parameter; and wherein determining the relative order among the multiple attribute classes comprises determining the relative order among the multiple attribute classes based on the determined redundancy measure for the attribute class that is based on the number of distinct data values within the attribute class and the determined parameter, and wherein determining the relative order among the multiple attribute classes further comprises; accessing a rule that indicates attribute classes that are searched at a higher frequency than other attribute classes are prioritized in comparison to the other attribute classes in determining an order of attribute classes; identifying a first attribute class that is included in the multiple attribute classes and that is searched at a higher frequency than a second attribute class that is included in the multiple attribute classes; and based on the identification that the first attribute class is searched at a higher frequency than the second attribute class, determining to order the first attribute class prior to the second attribute class despite the identified redundancy characteristics indicating that ordering the second attribute class prior to the first attribute class is preferred. - View Dependent Claims (17)
-
Specification