Managing large scale association sets using optimized bit map representations

US 10,452,631 B2
Filed: 03/15/2017
Issued: 10/22/2019
Est. Priority Date: 03/15/2017
Status: Active Grant

First Claim

Patent Images

1. A method of processing a database query for sets of data comprising:

assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data;

partitioning a set of entities represented by entity identifiers into a plurality of segments, wherein content of each segment and metadata for each segment is stored in separate data objects, wherein each segment is one of an insert type to indicate association and a delete type to indicate dissociation, and a segment size is based on a request size, compression type, and run-time optimizations, and wherein the plurality of segments are chronologically ordered and used to generate content of the set of entities by merging, inserting, and deleting segments;

generating a representation on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set; and

processing a query based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time, and wherein operations are performed on the plurality of segments concurrently.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Processing a database query for sets of data includes assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data. A representation is then generated on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set. Finally, a query is processed based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time.

9 Citations

21 Claims

1. A method of processing a database query for sets of data comprising:
- assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data;
  
  partitioning a set of entities represented by entity identifiers into a plurality of segments, wherein content of each segment and metadata for each segment is stored in separate data objects, wherein each segment is one of an insert type to indicate association and a delete type to indicate dissociation, and a segment size is based on a request size, compression type, and run-time optimizations, and wherein the plurality of segments are chronologically ordered and used to generate content of the set of entities by merging, inserting, and deleting segments;
  
  generating a representation on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set; and
  
  processing a query based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time, and wherein operations are performed on the plurality of segments concurrently.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein at least one set of entities includes entities associated with a specific entity.
  - 3. The method of claim 1, wherein at least one set of entities includes entities dissociated with a specific entity.
  - 4. The method of claim 1, wherein an entity represents one or more instances from a group of a person, a document, an event, and an object.
  - 5. The method of claim 1, wherein metadata for an entity identifier of a segment of a set of entities is stored inline with the segment as a payload.
  - 6. The method of claim 1, wherein processing the query further comprises:
    - evaluating the query for the plurality of segments and combining results from each of the evaluated segments.
  - 7. The method of claim 1, wherein a set of entities includes a multi-set containing non-unique entities with duplicate entity identifiers preserved in a physical representation of the set of entities, and wherein the query requests the set or multi-set representation.

8. A system for processing a database query for sets of data comprising:
- a processor configured to;
  
  assign a unique identifier from an integer space to each entity within data and create one or more sets of entities each pertaining to a corresponding entity within the data;
  
  partition a set of entities represented by entity identifiers into a plurality of segments, wherein content of each segment and metadata for each segment is stored in separate data objects, wherein each segment is one of an insert type to indicate association and a delete type to indicate dissociation, and a segment size is based on a request size, compression type, and run-time optimizations, and wherein the plurality of segments are chronologically ordered and used to generate content of the set of entities by merging, inserting, and deleting segments;
  
  generate a representation on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set; and
  
  process a query based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time, and wherein operations are performed on the plurality of segments concurrently.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein at least one set of entities includes entities associated with a specific entity.
  - 10. The system of claim 8, wherein at least one set of entities includes entities dissociated with a specific entity.
  - 11. The system of claim 8, wherein an entity represents one or more instances from a group of a person, a document, an event, and an object.
  - 12. The system of claim 8, wherein metadata for an entity identifier of a segment of a set of entities is stored inline with the segment as a payload.
  - 13. The system of claim 8, wherein processing the query further comprises:
    - evaluating the query for the plurality of segments and combining results from each of the evaluated segments.
  - 14. The system of claim 8, wherein a set of entities includes a multi-set containing non-unique entities with duplicate entity identifiers preserved in a physical representation of the set of entities, and wherein the query requests the set or multi-set representation.

15. A computer program product for processing a database query for sets of data, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
- assign a unique identifier from an integer space to each entity within data and create one or more sets of entities each pertaining to a corresponding entity within the data;
  
  partition a set of entities represented by entity identifiers into a plurality of segments, wherein content of each segment and metadata for each segment is stored in separate data objects, wherein each segment is one of an insert type to indicate association and a delete type to indicate dissociation, and a segment size is based on a request size, compression type, and run-time optimizations, and wherein the plurality of segments are chronologically ordered and used to generate content of the set of entities by merging, inserting, and deleting segments;
  
  generate a representation on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set; and
  
  process a query based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time, and wherein operations are performed on the plurality of segments concurrently.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer program product of claim 15, wherein at least one set of entities includes entities associated with a specific entity.
  - 17. The computer program product of claim 15, wherein at least one set of entities includes entities dissociated with a specific entity.
  - 18. The computer program product of claim 15, wherein an entity represents one or more instances from a group of a person, a document, an event, and an object.
  - 19. The computer program product of claim 15, wherein metadata for an entity identifier of a segment of a set of entities is stored inline with the segment as a payload.
  - 20. The computer program product of claim 15, wherein processing the query further comprises:
    - evaluating the query for the plurality of segments and combining results from each of the evaluated segments.
  - 21. The computer program product of claim 15, wherein a set of entities includes a multi-set containing non-unique entities with duplicate entity identifiers preserved in a physical representation of the set of entities, and wherein the query requests the set or multi-set representation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Desai, Rajesh M., Jayapandian, Magesh, Leong, Iun V., Perez, Justo L., Raphael, Roger C., Valencia, Gabriel
Primary Examiner(s)
Trujillo, James
Assistant Examiner(s)
Curran, J Mitchell

Application Number

US15/459,372
Publication Number

US 20180268009A1
Time in Patent Office

951 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/2237 Vectors, bitmaps or matrices

G06F 16/2455 Query execution

Managing large scale association sets using optimized bit map representations

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

9 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Managing large scale association sets using optimized bit map representations

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

9 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others