Method and system for inserting data records into files
First Claim
1. A method for locating a data record in a log file organized as a plurality of log segments, wherein each log segment comprises a plurality of data records and data records are indexed in the log file according to randomized unique identifiers (ids), the method comprising:
- providing a Bloom filter data structure for the plurality of log segments, wherein the Bloom filter data structure includes a section in a plurality of sections for each of the plurality of log segments;
setting bits in respective sections of the Bloom filter data structure for the plurality of log segments that correspond to the values for randomized unique ids associated with data records stored in the plurality of log segments, where the set bits for each log segment in the plurality of log segments form a log segment Bloom filter bit pattern in the respective section for each log segment;
receiving a randomized unique identifier (id) associated with the data record;
generating a data record Bloom filter bit pattern of a plurality of bits from the randomized unique id;
searching the plurality of sections of the bloom filter data structure for the plurality of log segments to find a section that includes a stored bit pattern of a plurality of bits in the log segment Bloom filter pattern that has bits set in the same bit positions as bits in the plurality of bits of the data record Bloom filter bit pattern;
searching for the data record in the log segment of the log file associated with the log segment Bloom filter pattern; and
outputting the data record if the data record is found at the log segment.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and system for adding a data record to a file comprise maintaining a data structure to track locations of data records within the file, wherein each entry in the data structure is a Bloom filter that corresponds to a different portion of the file that includes a plurality of data records. When an instruction to write data to the file is received, a data record is generated to append to the file, wherein the data record comprises a randomized unique id and the data from the received instruction. A Bloom filter bit pattern is extracted from the randomized unique id and bits in a current Bloom filter entry of the data structure are set, wherein the set bits correspond to the same bit positions as the Bloom filter bit pattern. The data record is thus able to be added to the file.
51 Citations
28 Claims
-
1. A method for locating a data record in a log file organized as a plurality of log segments, wherein each log segment comprises a plurality of data records and data records are indexed in the log file according to randomized unique identifiers (ids), the method comprising:
-
providing a Bloom filter data structure for the plurality of log segments, wherein the Bloom filter data structure includes a section in a plurality of sections for each of the plurality of log segments; setting bits in respective sections of the Bloom filter data structure for the plurality of log segments that correspond to the values for randomized unique ids associated with data records stored in the plurality of log segments, where the set bits for each log segment in the plurality of log segments form a log segment Bloom filter bit pattern in the respective section for each log segment; receiving a randomized unique identifier (id) associated with the data record; generating a data record Bloom filter bit pattern of a plurality of bits from the randomized unique id; searching the plurality of sections of the bloom filter data structure for the plurality of log segments to find a section that includes a stored bit pattern of a plurality of bits in the log segment Bloom filter pattern that has bits set in the same bit positions as bits in the plurality of bits of the data record Bloom filter bit pattern; searching for the data record in the log segment of the log file associated with the log segment Bloom filter pattern; and outputting the data record if the data record is found at the log segment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer-readable storage medium including instructions that, when executed by a processing unit of a computer system, causes the processing unit to locate a data record in a log file organized as a plurality of log segments, wherein each log segment comprises a plurality of data records and data records are indexed in the log file according to randomized unique identifiers (ids), by performing the steps of:
-
providing a Bloom filter data structure for the plurality of log segments, wherein the Bloom filter data structure includes a section in a plurality of sections for each of the plurality of log segments; setting bits in respective sections of the Bloom filter data structure for the plurality of log segments that correspond to the values for randomized unique ids associated with data records stored in the plurality of log segments, where the set bits for each log segment in the plurality of log segments form a log segment Bloom filter bit pattern in the respective section for each log segment; receiving a randomized unique identifier (id) associated with the data record; generating a data record Bloom filter bit pattern of a plurality of bits from the randomized unique id; searching the plurality of sections of the bloom filter data structure for the plurality of log segments to find a section that includes a stored bit pattern of a plurality of bits in the log segment Bloom filter pattern that has bits set in the same bit positions as bits in the plurality of bits of the data record Bloom filter bit pattern; searching for the data record in the log segment of the log file associated with the log segment Bloom filter pattern; and outputting the data record if the data record is found at the log segment. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 24)
-
-
21. A computer system configured to locate a data record in a log file organized as a plurality of log segments, wherein each log segment comprises a plurality of data records and data records are indexed in the log file according to randomized unique identifiers (ids), the computer system comprising:
-
a computer processor; and a non-transitory computer-readable storage medium including instructions that, when executed by the computer processor, cause the computer processor to perform the steps of; providing a Bloom filter data structure for the plurality of log segments, wherein the Bloom filter data structure includes a section in a plurality of sections for each of the plurality of log segments; setting bits in respective sections of the Bloom filter data structure for the plurality of log segments that correspond to the values for randomized unique ids associated with data records stored in the plurality of log segments, where the set bits for each log segment in the plurality of log segments form a log segment Bloom filter bit pattern in the respective section for each log segment; receiving a randomized unique identifier (id) associated with the data record; generating a data record Bloom filter bit pattern of a plurality of bits from the randomized unique id; searching the plurality of sections of the bloom filter data structure for the plurality of log segments to find a section that includes a stored bit pattern of a plurality of bits in the log segment Bloom filter pattern that has bits set in the same bit positions as bits in the plurality of bits of the data record Bloom filter bit pattern; searching for the data record in the log segment of the log file associated with the log segment Bloom filter pattern; and outputting the data record if the data record is found at the log segment. - View Dependent Claims (22, 23, 25, 26, 27, 28)
-
Specification