System and method for unorchestrated determination of data sequences using sticky factoring to determine breakpoints in digital sequences
First Claim
1. A method for partitioning a digital sequence comprising:
- performing a hash function on at least a portion of said digital sequence;
monitoring hash values produced by said hash function for a first predetermined numeric pattern found in a range of numeric values; and
marking a breakpoint in said digital sequence when said first predetermined numeric pattern occurs;
wherein said step of performing said hash function comprises a rolling hash function adapted to scan portions of said digital sequence and to progressively reduce a contribution of more distant bits in said digital sequence.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for unorchestrated determination of data sequences using “sticky byte” factoring to determine breakpoints in digital sequences such that common sequences can be identified. Sticky byte factoring provides an efficient method of dividing a data set into pieces that generally yields near optimal commonality. This is effectuated by employing a rolling hashsum and, in an exemplary embodiment disclosed herein, a threshold function to deterministically set divisions in a sequence of data. Both the rolling hash and the threshold function are designed to require minimal computation. This low overhead makes it possible to rapidly partition a data sequence for presentation to a factoring engine or other applications that prefer subsequent synchronization across the data set.
-
Citations
61 Claims
-
1. A method for partitioning a digital sequence comprising:
-
performing a hash function on at least a portion of said digital sequence;
monitoring hash values produced by said hash function for a first predetermined numeric pattern found in a range of numeric values; and
marking a breakpoint in said digital sequence when said first predetermined numeric pattern occurs;
wherein said step of performing said hash function comprises a rolling hash function adapted to scan portions of said digital sequence and to progressively reduce a contribution of more distant bits in said digital sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for determining a first breakpoint in a first digital sequence comprising:
-
determining a subset group of said first digital sequence;
performing a hash function on said subset group of said first digital sequence beginning at a starting position in said first digital sequence until a first predetermined numeric pattern, which is found in a range of numeric values, in said hash value is obtained; and
marking said first breakpoint when said first predetermined numeric pattern in said hash value is obtained. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A computer program product comprising:
-
a computer usable medium having computer readable code embodied therein for determining a first breakpoint in a first digital sequence comprising;
computer readable program code devices configured to cause a computer to effect determining a subset group of said first digital sequence;
computer readable program code devices configured to cause a computer to effect performing a hash function on said subset group of said first digital sequence beginning at a starting position in said first digital sequence until a first predetermined numeric pattern in said hash value is obtained;
computer readable program code devices configured to cause a computer to effect marking said first breakpoint when said first predetermined numeric pattern in said hash value is obtained;
computer readable program code devices configured to cause a computer to effect performing a hash function on said subset group beginning at a starting position in a second digital sequence until said first predetermined numeric pattern in said hash value is obtained;
computer readable program code devices configured to cause a computer to effect marking a second breakpoint in said second digital sequence when said first predetermined numeric pattern in said hash value is obtained; and
computer readable program code devices configured to cause a computer to effect comparing said predetermined hash value at said first breakpoint with that at said second breakpoint. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A method for determining a first breakpoint in a first digital sequence comprising:
-
determining a subset group of said first digital sequence;
performing a hash function on said subset group of said first digital sequence beginning at a starting position in said first digital sequence until a first predetermined numeric pattern is found in the hash values from the said hash function is obtained, wherein the hash function is a 32-bit rolling hash function and the subset group is a 32-bit pattern derived directly or indirectly from the said first digital sequence, and the performing comprises shifting the 32-bit pattern over one bit, reading a character from the first digital sequence and deriving directly or indirectly another 32-bit pattern, and repeating the shifting and the reading, until there are 32-bits in sequence that can be hashed together to provide one said hash value;
marking said first breakpoint when said first predetermined numeric pattern in said hash value is obtained;
further performing a hash function on a subset group of said first digital sequence from said first breakpoint until said first predetermined numeric pattern in said hash value is again obtained; and
marking another breakpoint in said first digital sequence when said first predetermined numeric pattern in said hash value is again obtained. - View Dependent Claims (26, 27)
-
-
28. A computer program product comprising:
-
a computer usable medium having computer readable code embodied therein for determining a first breakpoint in a first digital sequence comprising;
computer readable program code devices configured to cause a computer to effect determining a subset group of said first digital sequence;
computer readable program code devices configured to cause a computer to effect performing a hash function on said subset group of said first digital sequence beginning at a starting position in said first digital sequence until a first predetermined numeric pattern in said hash value is obtained, wherein the hash function is a 32-bit rolling hash function and the subset group is a 32-bit pattern, and the performing comprises shifting the 32-bit pattern over one bit, reading a character from the first digital sequence, and repeating the shifting and the reading; and
computer readable program code devices configured to cause a computer to effect marking said first breakpoint when said first predetermined numeric pattern in said hash value is obtained. - View Dependent Claims (29, 30, 31, 32)
-
-
33. A method for partitioning a digital sequence comprising:
-
performing a hash function on at least a portion of a digital sequence;
monitoring binary sequences produced by said hash function for a predetermined pattern of bits; and
marking a breakpoint in said digital sequence when said predetermined pattern occurs. - View Dependent Claims (34, 35, 36)
-
-
37. A computer program product comprising:
-
computer readable program code devices configured to cause a computer to effect performing a hash function on at least a portion of a digital sequence;
computer readable program code devices configured to cause a computer to effect monitoring binary sequences produced by said hash function for a predetermined pattern of bits; and
computer readable program code devices configured to cause a computer to effect marking a breakpoint in said digital sequence when said predetermined pattern occurs.
-
-
38. A method for partitioning a digital sequence comprising:
-
performing an indirect hash function on at least a portion of a digital sequence;
monitoring binary sequences produced by the indirect hash function for a first predetermined pattern of bits; and
marking a breakpoint in the digital sequence when the first predetermined pattern is identified as occurring during the monitoring. - View Dependent Claims (39, 40, 41, 42, 43)
-
-
44. A computer program product comprising:
-
computer readable program code devices configured to cause a computer to effect performing an indirect hash function on at least a portion of a digital sequence;
computer readable program code devices configured to cause a computer to effect monitoring binary sequences produced by the indirect hash function for a first predetermined pattern of bits; and
computer readable program code devices configured to cause a computer to effect marking a breakpoint in the digital sequence when the first predetermined pattern is identified as occurring during the monitoring. - View Dependent Claims (45, 46, 47, 48, 49)
-
-
50. A method for partitioning a digital sequence comprising:
-
performing an indirect hash function on at least a portion of a digital sequence;
monitoring hash values produced by the indirect hash function for a numeric pattern selected from a range of numeric values; and
marking a breakpoint in the digital sequence when the numeric pattern occurs according to the monitoring.
-
-
51. A computer program product comprising:
-
computer readable program code devices configured to cause a computer to effect performing an indirect hash function on at least a portion of a digital sequence;
computer readable program code devices configured to cause a computer to effect monitoring hash values produced by the indirect hash function for a numeric pattern selected from a range of numeric values; and
computer readable program code devices configured to cause a computer to effect marking a breakpoint in the digital sequence when the numeric pattern occurs according to the monitoring.
-
-
52. A method for partitioning a digital sequence comprising:
-
performing an indirect function on at least a portion of said digital sequence, the performing comprising;
indexing bytes from the digital sequence into alternative pre-determined bit sequences;
performing a hash function on the pre-determined bit sequences;
monitoring the bit sequences produced by the said hash function for a first predetermined bit pattern; and
marking a breakpoint in said digital sequence when said first predetermined bit pattern occurs. - View Dependent Claims (53, 54)
-
-
55. A computer program product comprising:
-
computer readable program code devices configured to cause a computer to effect performing an indirect function on at least a portion of said digital sequence, the computer readable program code devices comprising additional computer readable program code devices configured to cause a computer to effect;
indexing bytes from the digital sequence into alternative pre-determined bit sequences;
performing a hash function on the pre-determined bit sequences;
monitoring the bit sequences produced by the said hash function for a first predetermined bit pattern; and
marking a breakpoint in said digital sequence when said first predetermined bit pattern occurs.
-
-
56. A method for partitioning a digital sequence comprising:
-
processing at least a portion of a digital sequence to produce a transformation;
performing a hash function on the transformation;
monitoring binary sequences produced by the hash function for a pattern of bits; and
marking a breakpoint in the digital sequence when said first predetermined pattern occurs as determined by the monitoring. - View Dependent Claims (57, 58)
-
-
59. A computer program product comprising:
-
computer readable program code devices configured to cause a computer to effect processing at least a portion of a digital sequence to produce a transformation;
computer readable program code devices configured to cause a computer to effect performing a hash function on the transformation;
computer readable program code devices configured to cause a computer to effect monitoring binary sequences produced by the hash function for a pattern of bits; and
computer readable program code devices configured to cause a computer to effect marking a breakpoint in the digital sequence when said first predetermined pattern occurs as determined by the monitoring. - View Dependent Claims (60, 61)
-
Specification