Native data signatures in a file system
First Claim
1. In a file system of a computer operating system, a method for comparing data streams, comprising the steps of:
- generating, for a first data stream, a first data signature including a plurality of first unique change identifiers, said first data stream being separated into a plurality of first allocation units, with each of said first unique change identifiers being associated with a different one of said first allocation units;
generating, for a second data stream, a second data signature including a plurality of second unique change identifiers, said second data stream being separated into a plurality of second allocation units, with each of said second unique change identifiers being associated with a different one of said second allocation units; and
comparing said first data signature and said second data signature to determine whether said first data stream and said second data stream are the same.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides reliable systems and methods for rapidly determining whether file data streams are the same, or different, without having to make a comparison between the actual data streams. If the determination is made that the data streams are different, the present invention can rapidly determine where the changes between the data streams occur, again, without having to actually compare the entire size of the data streams. Such methods and systems are accomplished by generating a native data signature (NDS) for each data stream stored by the file system, wherein each NDS is substantially smaller in size as compared to the data stream corresponding thereto. The NDS is generated by separating the data stream into a plurality of allocation units, ranging in size from 4K bytes to 256K bytes; by generating a unique change identifier (UCI), one per each allocation unit; and by assembling the UCIs into a series of bits, typically ranging up to, and including, 64 in number. Thus, even with a UCI having 64 bits, time efficiency for comparing data streams by comparing between NDSs is improved on the order of 500:1 or 32,000:1, depending upon the size of the allocation unit.
-
Citations
23 Claims
-
1. In a file system of a computer operating system, a method for comparing data streams, comprising the steps of:
-
generating, for a first data stream, a first data signature including a plurality of first unique change identifiers, said first data stream being separated into a plurality of first allocation units, with each of said first unique change identifiers being associated with a different one of said first allocation units; generating, for a second data stream, a second data signature including a plurality of second unique change identifiers, said second data stream being separated into a plurality of second allocation units, with each of said second unique change identifiers being associated with a different one of said second allocation units; and comparing said first data signature and said second data signature to determine whether said first data stream and said second data stream are the same. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for comparing data streams in a file system of a computer operating system, comprising:
-
generating a first signature for a first data stream by; (i) separating the first data stream into a plurality of first allocation units; (ii) tagging each of the first allocation units with a first unique change identifier; and (iii) assembling each of the first unique change identifiers into the first signature; generating a second signature for a second data stream, the second data signature including a plurality of second unique change identifiers, the second data stream being separated into a plurality of second allocation units, with each of the second unique change identifiers being associated with a different one of the second allocation units; and comparing the first and second signatures to determine whether the first and second data streams are identical, including the step of comparing the first unique change identifiers and the second unique change identifiers. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer readable medium having computer executable instructions, said computer-executable instructions comprising:
-
means for separating said first data stream into a plurality of first allocation units; means for generating a plurality of first unique change identifiers, one first unique change identifier per first allocation unit; means for assembling said plurality of first unique change identifiers into a first signature corresponding to said first data stream, said first signature including said plurality of first unique change identifiers; means for generating a second signature corresponding to a second data stream, said second signature including a plurality of second change identifiers, each of said second change identifiers being associated with a different one of a plurality of second allocation units into which said second data stream is separated; and means for comparing said first signature and said second signature in order to determine whether said first and second data streams are the same. - View Dependent Claims (16, 17, 18)
-
-
19. A computer program product comprising computer-readable medium having stored thereon computer-executable instructions, said computer-executable instructions including:
-
means for generating; for a first data stream, a first data signature including a plurality of first unique change identifiers, said first data stream being separated into a plurality of first allocation units, with each of said first unique change identifiers being associated with a different one of said first allocation units; and for a second data stream, a second data signature including a plurality of second unique change identifiers, said second data stream being separated into a plurality of second allocation units, with each of said second unique change identifiers being associated with a different one of said second allocation units; and means for comparing said first data signature and said second data signature to determine whether said first data stream and said second data stream are the same. - View Dependent Claims (20, 21, 22, 23)
-
Specification