Techniques for global single instance segment-based indexing for backup data
First Claim
Patent Images
1. A method for performing global single instance segment-based indexing for backup data comprising:
- parsing a non-executable data item being backed up to detect one or more syntactic boundaries within the non-executable data item being backed up between adjacent sections;
dividing, using at least one computer processor, the non-executable data item being backed up into segments using the one or more detected syntactic boundaries, wherein dividing the non-executable data item being backed up into segments comprises padding at least one of the segments to make the at least one of the segments syntactically correct by completing the at least one of the segments according to a type of file to be indexed, and wherein the at least one of the segments comprises at least one of;
an XML node, a sentence, a paragraph, and a page, wherein segmentation is performed for a plurality of different types of syntactical boundaries including paragraphs and at least one of;
an XML node, a sentence, and a page and wherein padding is based at least in part on a format, received from an index engine, for a type of file to be indexed;
generating a fingerprint for each segment; and
saving an entry for each segment in an index database, wherein each entry comprises a resource list and the fingerprint for the segment, the resource list comprising a resource name and a reference count, wherein the reference count is configured to allow counting of a plurality of references to the resource name.
8 Assignments
0 Petitions
Accused Products
Abstract
Techniques for global single instance segment-based indexing are disclosed. In one particular exemplary embodiment, the techniques may be realized as a method for global single instance segment-based indexing for backup data. The method may comprise dividing an item being backed up into segments, generating a fingerprint for each segment, and saving an entry for each segment in an index database. Each entry may comprise the fingerprint for the segment.
20 Citations
17 Claims
-
1. A method for performing global single instance segment-based indexing for backup data comprising:
-
parsing a non-executable data item being backed up to detect one or more syntactic boundaries within the non-executable data item being backed up between adjacent sections; dividing, using at least one computer processor, the non-executable data item being backed up into segments using the one or more detected syntactic boundaries, wherein dividing the non-executable data item being backed up into segments comprises padding at least one of the segments to make the at least one of the segments syntactically correct by completing the at least one of the segments according to a type of file to be indexed, and wherein the at least one of the segments comprises at least one of;
an XML node, a sentence, a paragraph, and a page, wherein segmentation is performed for a plurality of different types of syntactical boundaries including paragraphs and at least one of;
an XML node, a sentence, and a page and wherein padding is based at least in part on a format, received from an index engine, for a type of file to be indexed;generating a fingerprint for each segment; and saving an entry for each segment in an index database, wherein each entry comprises a resource list and the fingerprint for the segment, the resource list comprising a resource name and a reference count, wherein the reference count is configured to allow counting of a plurality of references to the resource name. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 14)
-
-
11. A system for performing global single instance segment-based indexing for backup data comprising:
-
a boundary detector module for parsing a non-executable data item being backed up to detect one or more syntactic boundaries within the non-executable data item being backed up between adjacent sections; the boundary director module further configured to divide the non-executable data item being backed up into segments using the one or more detected syntactic boundaries and to generate a fingerprint for each segment, wherein dividing the non-executable data item being backed up into segments comprises padding at least one of the segments to make the at least one of the segments syntactically correct by completing the at least one of the segments according to a type of file to be indexed, and wherein the at least one of the segments comprises at least one of;
an XML node, a sentence, a paragraph, and a page, wherein segmentation is performed for a plurality of different types of syntactical boundaries including paragraphs and at least one of;
an XML node, a sentence, and a page and wherein padding is based at least in part on a format, received from an index engine, for a type of file to be indexed; andelectronic storage for saving an entry for each segment in an index database, wherein each entry, comprises a resource list and the fingerprint for the segment, the resource list comprising a resource name and a reference count, wherein the reference count is configured to allow counting of a plurality of references to the resource name. - View Dependent Claims (12, 13)
-
-
15. A non-transitory article of manufacture for performing global single instance segment-based indexing for backup data, the article of manufacture comprising:
-
at least one non-transitory processor readable medium; and instructions stored on the at least one medium; wherein the instructions are configured to be readable from the at least one medium by at least one processor and thereby cause the at least one processor to operate so as to; parse a non-executable data item being backed up to detect one or more syntactic boundaries within the non-executable data item being backed up between adjacent sections; divide the non-executable data item being backed up into segments using the one or more detected syntactic boundaries, wherein dividing the non-executable data item being backed up into segments comprises padding at least one of the segments to make the at least one of the segments syntactically correct by completing the at least one of the segments according to a type of file to be indexed, and wherein the at least one of the segments comprises at least one of;
an XML node, a sentence, a paragraph, and a page, wherein segmentation is performed for a plurality of different types of syntactical boundaries including paragraphs and at least one of;
an XML node, a sentence, and a page and wherein padding is based at least in part on a format, received from an index engine, for a type of file to be indexed;generate a fingerprint for each segment; and save an entry for each segment in an index database, wherein each entry comprises a resource list and the fingerprint for the segment, the resource list comprising a resource name and a reference count, wherein the reference count is configured to allow counting of a plurality of references to the resource name. - View Dependent Claims (16, 17)
-
Specification