Managing large scale association sets using optimized bit map representations
First Claim
1. A method of processing a database query for sets of data comprising:
- assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data;
partitioning a set of entities represented by entity identifiers into a plurality of segments, wherein content of each segment and metadata for each segment is stored in separate data objects, wherein each segment is one of an insert type to indicate association and a delete type to indicate dissociation, and a segment size is based on a request size, compression type, and run-time optimizations, and wherein the plurality of segments are chronologically ordered and used to generate content of the set of entities by merging, inserting, and deleting segments;
generating a representation on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set; and
processing a query based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time, and wherein operations are performed on the plurality of segments concurrently.
1 Assignment
0 Petitions
Accused Products
Abstract
Processing a database query for sets of data includes assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data. A representation is then generated on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set. Finally, a query is processed based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time.
9 Citations
21 Claims
-
1. A method of processing a database query for sets of data comprising:
-
assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data; partitioning a set of entities represented by entity identifiers into a plurality of segments, wherein content of each segment and metadata for each segment is stored in separate data objects, wherein each segment is one of an insert type to indicate association and a delete type to indicate dissociation, and a segment size is based on a request size, compression type, and run-time optimizations, and wherein the plurality of segments are chronologically ordered and used to generate content of the set of entities by merging, inserting, and deleting segments; generating a representation on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set; and processing a query based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time, and wherein operations are performed on the plurality of segments concurrently. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for processing a database query for sets of data comprising:
a processor configured to; assign a unique identifier from an integer space to each entity within data and create one or more sets of entities each pertaining to a corresponding entity within the data; partition a set of entities represented by entity identifiers into a plurality of segments, wherein content of each segment and metadata for each segment is stored in separate data objects, wherein each segment is one of an insert type to indicate association and a delete type to indicate dissociation, and a segment size is based on a request size, compression type, and run-time optimizations, and wherein the plurality of segments are chronologically ordered and used to generate content of the set of entities by merging, inserting, and deleting segments; generate a representation on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set; and process a query based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time, and wherein operations are performed on the plurality of segments concurrently. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A computer program product for processing a database query for sets of data, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
assign a unique identifier from an integer space to each entity within data and create one or more sets of entities each pertaining to a corresponding entity within the data; partition a set of entities represented by entity identifiers into a plurality of segments, wherein content of each segment and metadata for each segment is stored in separate data objects, wherein each segment is one of an insert type to indicate association and a delete type to indicate dissociation, and a segment size is based on a request size, compression type, and run-time optimizations, and wherein the plurality of segments are chronologically ordered and used to generate content of the set of entities by merging, inserting, and deleting segments; generate a representation on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set; and process a query based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time, and wherein operations are performed on the plurality of segments concurrently. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification