×

Data organization and indexing related technology

  • US 8,577,902 B1
  • Filed: 05/11/2010
  • Issued: 11/05/2013
  • Est. Priority Date: 05/12/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • accessing, from an electronic data storage, data stored in the electronic data storage for multiple attribute classes, the accessed data comprising data values stored in the electronic data storage for each of at least two of the multiple attribute classes;

    identifying redundancy characteristics of the accessed data within each of the at least two attribute classes by;

    analyzing the accessed data values stored in the electronic data storage for each of the at least two attribute classes, andbased on the analysis of the accessed data values stored in the electronic data storage for each of the at least two attribute classes, determining a measure of redundancy within the accessed data values for each of the at least two attribute classes;

    accessing a rule that indicates a preference to maintain an order of attribute classes within a dimension of attribute classes despite redundancy characteristics, the dimension of attribute classes defining a subset of attribute classes that have a parent-child relationship;

    identifying a dimension of attribute classes included in the multiple attribute classes, the dimension of attribute classes being arranged in a particular order in the electronic data storage based on a parent-child relationship;

    determining a relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes, including maintaining the particular order of the attribute classes included in the dimension despite the determined measure of redundancy within the accessed data values for each of the at least two attribute classes indicating that a different order of the attribute classes included in the dimension is preferred;

    organizing the accessed data based on the determined relative order among the multiple attribute classes;

    compressing, using run length encoding, the organized data in the determined relative order among the multiple attribute classes;

    generating an index that is descriptive of the compressed data; and

    storing, in electronic storage, the encoded data and the generated index to enable subsequent searching of the encoded data using the generated index,wherein determining the measure of redundancy within the accessed data values for each of the at least two attribute classes comprises;

    determining a number of distinct data values within an attribute class;

    determining a parameter for a distinct data value within the attribute class, the parameter reflecting contribution of the distinct data value to the entirety of data values within the attribute class; and

    determining a redundancy measure for the attribute class based on the number of distinct data values within the attribute class and the determined parameter; and

    wherein determining the relative order among the multiple attribute classes based on the determined measure of redundancy within the accessed data values for each of the at least two attribute classes comprises determining the relative order among the multiple attribute classes based on the determined redundancy measure for the attribute class that is based on the number of distinct data values within the attribute class and the determined parameter.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×