Data de-duplication in a distributed network
First Claim
Patent Images
1. A computer-implemented method for de-duplication of data in a distributed network, the method comprising:
- receiving, by a first de-duplication manager (DDM) in the distributed network, at least a unique identification (ID) of the data and a network address of a first storage medium in which the data is stored;
locating one or more storage media in the distributed network in which the data is stored using an association of the unique ID of the data and one or more physical addresses where the data is stored, wherein a logical address of the data is associated with network addresses of the one or more storage media;
determining, via the association, if there is more than a predetermined threshold number of copies of the data; and
if there is more than the predetermined threshold number of copies of the data;
selecting one or more copies of the data for removal, andremoving the selected one or more copies of the data from a second storage medium selected from among the one or more storage media, wherein selecting the one or more copies comprises selecting the one or more copies of the data that are furthest from a client that frequently accesses the data.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method for efficient data storage is provided. A first storage medium associates data stored on one or more data storage media with a unique identification value (ID) for the purpose of determining de-duplication status of the data. In response to receiving a request to read the data from a logical address, the first storage medium retrieves the data from a second storage medium based on the unique ID. In response to receiving a request to write the data to a logical address, the one or more data storage media store at least one copy of the data based on the de-duplication status of the data.
12 Citations
24 Claims
-
1. A computer-implemented method for de-duplication of data in a distributed network, the method comprising:
-
receiving, by a first de-duplication manager (DDM) in the distributed network, at least a unique identification (ID) of the data and a network address of a first storage medium in which the data is stored; locating one or more storage media in the distributed network in which the data is stored using an association of the unique ID of the data and one or more physical addresses where the data is stored, wherein a logical address of the data is associated with network addresses of the one or more storage media; determining, via the association, if there is more than a predetermined threshold number of copies of the data; and if there is more than the predetermined threshold number of copies of the data; selecting one or more copies of the data for removal, and removing the selected one or more copies of the data from a second storage medium selected from among the one or more storage media, wherein selecting the one or more copies comprises selecting the one or more copies of the data that are furthest from a client that frequently accesses the data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for de-duplication of data in a distributed network, comprising:
-
a processor; and a memory coupled to the processor, wherein the memory stores computer code that, when executed by the processor, cause the processor to; receive, by a first de-duplication manager (DDM) in the distributed network, at least a unique identification (ID) of the data and a network address of a first storage medium in which the data is stored, locate one or more storage media in the distributed network in which the data is stored using an association of the unique ID of the data and one or more physical addresses where the data is stored, wherein a logical address of the data is associated with network addresses of the one or more storage media; determine, via the association, if there is more than a predetermined threshold number of copies of the data, and if there is more than the predetermined threshold number of copies of the data; select one or more copies of the data for removal, and remove the selected one or more copies of the data from a second storage medium selected from among the one or more storage media, wherein selecting the one or more copies comprises selecting the one or more copies of the data that are furthest from a client that frequently accesses the data. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A physical computer storage memory comprising a computer program product method for de-duplication of data in a distributed network, the physical computer storage memory comprising:
-
computer code for receiving, by a first de-duplication manager (DDM) in the distributed network, at least a unique identification (ID) of the data and a network address of a first storage medium in which the data is stored; computer code for locating one or more storage media in the distributed network in which the data is stored using an association of the unique ID of the data and one or more physical addresses where the data is stored, wherein a logical address of the data is associated with network addresses of the one or more storage media; computer code for determining, via the association, if there is more than a predetermined threshold number of copies of the data; and if there is more than the predetermined threshold number of copies of the data; computer code for selecting one or more copies of the data for removal, and computer code for removing the selected one or more copies of the data from a second storage medium selected from among the one or more storage media, wherein selecting the one or more copies comprises selecting the one or more copies of the data that are furthest from a client that frequently accesses the data. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification