System and method for chunk-based indexing of file system content
First Claim
Patent Images
1. A system, comprising:
- a storage device configured to store data; and
a file system configured to manage access to said storage device and to store file system content including a plurality of files to said storage device; and
a search engine configured to construct an index of said file system content;
wherein said file system is further configured to partition a given one of said plurality of files into a plurality of logical chunks, wherein given ones of said logical chunks include structured data records formatted according to a self-describing data format, wherein each of said structured data records includes one or more data elements delimited by respective tag fields, wherein said tag fields are defined according to said self-describing data format;
wherein to partition said given file, said file system is further configured to adjust a chunk boundary between two adjacent given ones of said logical chunks such that said chunk boundary falls between boundaries of said structured data records;
wherein to construct said index, said search engine is further configured to generate respective index information associated with each of said plurality of logical chunks, such that boundaries of said respective index information correspond to boundaries of said logical chunks;
wherein for each given one of said plurality of logical chunks, said respective index information is indicative of one or more data patterns occurring within said given logical chunk of said given file; and
wherein in response to detecting an operation to modify said given file, said file system is further configured to identify one or more modified logical chunks of said given file, and wherein said search engine is further configured to regenerate respective index information associated with each of said one or more modified logical chunks without regenerating respective index information for one or more logical chunks of said given file that are unmodified by said operation.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for chunk-based indexing of file system content. In one embodiment, the system may include a storage device configured to store data and a file system configured to manage access to the storage device and to store file system content including a plurality of files. The system may further include a search engine configured to construct an index of the file system content. The file system may be further configured to partition a given one of the plurality of files into a plurality of logical chunks, and constructing an index may include generating respective index information associated with each of the plurality of logical chunks.
95 Citations
24 Claims
-
1. A system, comprising:
-
a storage device configured to store data; and a file system configured to manage access to said storage device and to store file system content including a plurality of files to said storage device; and a search engine configured to construct an index of said file system content; wherein said file system is further configured to partition a given one of said plurality of files into a plurality of logical chunks, wherein given ones of said logical chunks include structured data records formatted according to a self-describing data format, wherein each of said structured data records includes one or more data elements delimited by respective tag fields, wherein said tag fields are defined according to said self-describing data format; wherein to partition said given file, said file system is further configured to adjust a chunk boundary between two adjacent given ones of said logical chunks such that said chunk boundary falls between boundaries of said structured data records; wherein to construct said index, said search engine is further configured to generate respective index information associated with each of said plurality of logical chunks, such that boundaries of said respective index information correspond to boundaries of said logical chunks; wherein for each given one of said plurality of logical chunks, said respective index information is indicative of one or more data patterns occurring within said given logical chunk of said given file; and wherein in response to detecting an operation to modify said given file, said file system is further configured to identify one or more modified logical chunks of said given file, and wherein said search engine is further configured to regenerate respective index information associated with each of said one or more modified logical chunks without regenerating respective index information for one or more logical chunks of said given file that are unmodified by said operation. - View Dependent Claims (2, 3, 4, 5, 6, 19, 20)
-
-
7. A computer implemented method, comprising:
-
a file system storing file system content including a plurality of files to a storage device, wherein said file system is configured to manage access to said storage device; said file system partitioning a given one of said plurality of files into a plurality of logical chunks, wherein given ones of said logical chunks include structured data records formatted according to a self-describing data format, wherein each of said structured data records includes one or more data elements delimited by respective tag fields, wherein said tag fields are defined according to said self-describing data format; wherein said file system partitioning said given file comprises said file system adjusting a chunk boundary between two adjacent given ones of said logical chunks such that said chunk boundary falls between boundaries of said structured data records; a search engine constructing an index of said file system content, wherein said constructing includes generating respective index information associated with each of said plurality of logical chunks such that boundaries of said respective index information correspond to boundaries of said logical chunks, and wherein for each given one of said plurality of logical chunks, said respective index information is indicative of one or more data patterns occurring within said given logical chunk of said given file; in response to detecting an operation to modify said given file, said file system identifying one or more modified logical chunks of said given file; and said search engine regenerating respective index information associated with each of said one or more modified logical chunks without regenerating respective index information for one or more logical chunks of said given file that are unmodified by said operation. - View Dependent Claims (8, 9, 10, 11, 12, 21, 22)
-
-
13. A computer-accessible storage medium comprising program instructions, wherein the program instructions are executable to implement:
-
a file system storing file system content including a plurality of files to a storage device, wherein said file system is configured to manage access to said storage device; said file system partitioning a given one of said plurality of files into a plurality of logical chunks, wherein given ones of said logical chunks include structured data records formatted according to a self-describing data format, wherein each of said structured data records includes one or more data elements delimited by respective tag fields, wherein said tag fields are defined according to said self-describing data format; and wherein said file system partitioning said given file comprises said file system adjusting a chunk boundary between two adjacent given ones of said logical chunks such that said chunk boundary falls between boundaries of said structured data records; a search engine constructing an index of said file system content, wherein said constructing includes generating respective index information associated with each of said plurality of logical chunks such that boundaries of said respective index information correspond to boundaries of said logical chunks, and wherein for each given one of said plurality of logical chunks, said respective index information is indicative of one or more data patterns occurring within said given logical chunk of said given file; in response to detecting an operation to modify said given file, said file system identifying one or more modified logical chunks of said given file; and said search engine regenerating respective index information associated with each of said one or more modified logical chunks without regenerating respective index information for one or more logical chunks of said given file that are unmodified by said operation. - View Dependent Claims (14, 15, 16, 17, 18, 23, 24)
-
Specification