Data processing apparatus and method of processing data
First Claim
Patent Images
1. An apparatus comprising:
- a chunk store having a plurality of chunk sections each storing specimen data chunks;
a manifest store for containing a manifest representing at least part of a data set and having references to said chunk sections;
at least one processor configured to;
process an input data set into input data chunks;
identify, using the manifest, a specimen data chunk in a given one of said chunk sections which corresponds to a first of the input data chunks;
identify a second of the input data chunks not corresponding to a specimen data chunk in the chunk store;
store the second input data chunk as a specimen data chunk in deliberate proximity to the identified specimen data chunk, wherein the storing in deliberate proximity results in selecting the given chunk section rather than another of said chunk sections to store the second input data chunk as a specimen data chunk;
associate a specimen data chunk in at least one chunk section with a back-reference to a manifest referencing that specimen data chunk;
determine when a given specimen data chunk is not associated with a back-reference to a manifest;
delete the given specimen data chunk from a particular chunk section after a predetermined time period or number of iterations in response to determining that the given specimen data chunk is not associated with a back-reference to a manifest; and
after the deleting, reduce fragmentation of the particular chunk section by rearranging chunks remaining in the particular chunk section.
2 Assignments
0 Petitions
Accused Products
Abstract
Data processing apparatus comprising: a chunk store having a plurality of chunk sections, each operable to store specimen data chunks, the apparatus being operable to: process an input data set into one or more input data chunks; identify a specimen data chunk in one of said chunk sections which corresponds to a first input data chunk; identify a second input data chunk not corresponding to a specimen data chunk in the chunk store; and store the second input data chunk as a specimen data chunk in proximity to the identified specimen data chunk corresponding to the first input data chunk.
-
Citations
21 Claims
-
1. An apparatus comprising:
-
a chunk store having a plurality of chunk sections each storing specimen data chunks; a manifest store for containing a manifest representing at least part of a data set and having references to said chunk sections; at least one processor configured to;
process an input data set into input data chunks;
identify, using the manifest, a specimen data chunk in a given one of said chunk sections which corresponds to a first of the input data chunks;
identify a second of the input data chunks not corresponding to a specimen data chunk in the chunk store;
store the second input data chunk as a specimen data chunk in deliberate proximity to the identified specimen data chunk, wherein the storing in deliberate proximity results in selecting the given chunk section rather than another of said chunk sections to store the second input data chunk as a specimen data chunk;
associate a specimen data chunk in at least one chunk section with a back-reference to a manifest referencing that specimen data chunk;
determine when a given specimen data chunk is not associated with a back-reference to a manifest;
delete the given specimen data chunk from a particular chunk section after a predetermined time period or number of iterations in response to determining that the given specimen data chunk is not associated with a back-reference to a manifest; and
after the deleting, reduce fragmentation of the particular chunk section by rearranging chunks remaining in the particular chunk section. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A data storage apparatus comprising:
-
a data storage medium provisioned with a plurality of chunk sections, at least one of said chunk sections storing specimen data chunks; a read/write head to read information from, or write information to, the data storage medium, the read/write head being configured to read/write a predetermined maximum extent of data in a single operation; a manifest store to store manifests, each of the manifests representing a corresponding input data set and containing references to respective specimen data chunks or chunk sections; at least one processor configured to; divide a particular input data set into input data chunks; identify a specimen data chunk in one of the chunk sections that corresponds to a first of the input data chunks, where the identifying is based on use of one of the manifests; store a further one of the input data chunks of the particular input data set as a specimen data chunk in the data storage medium, such that both the specimen data chunks may be accessed by the read/write head in a single operation; associate a given specimen data chunk with back-references to plural ones of the manifests, each of the plural manifests containing a reference to the given specimen data chunk; determine when a particular specimen data chunk is not associated with a back-reference to a manifest; delete the particular specimen data chunk from a particular one of the chunk sections after a predetermined time period or number of iterations; and after the deleting, reduce fragmentation of the particular chunk section by rearranging chunks remaining in the particular chunk section. - View Dependent Claims (16, 17)
-
-
18. A method comprising:
-
storing a chunk store containing a plurality of chunk sections each storing specimen data chunks; storing a manifest store that contains manifests, each of the manifests representing a corresponding input data set and containing references to respective specimen data chunks or chunk sections; processing an input data set into input data chunks; identifying, using one of the manifests, a specimen data chunk in a given one of the chunk sections which corresponds to a first of the input data chunks; identifying a second of the input data chunks not corresponding to a specimen data chunk in the chunk store; storing the second input data chunk as a specimen data chunk in deliberate proximity to the identified specimen data chunk, wherein the storing in deliberate proximity results in selecting the given chunk, section rather than another of the chunk sections to store the second input data chunk as a specimen data chunk; associating a given specimen data chunk with back-references to plural ones of the manifests, each of the plural manifests containing a reference to the given specimen data chunk; determining when a particular specimen data chunk is not associated with a back-reference to a manifest; deleting the particular specimen data chunk from a particular one of the chunk sections after a predetermined time period or number of iterations; and after the deleting, reducing fragmentation of the particular chunk section by rearranging chunks remaining in the particular chunk section. - View Dependent Claims (19, 20, 21)
-
Specification