Data de-duplication in a distributed network

US 8,572,137 B2
Filed: 09/08/2009
Issued: 10/29/2013
Est. Priority Date: 09/08/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for de-duplication of data in a distributed network, the method comprising:

receiving, by a first de-duplication manager (DDM) in the distributed network, at least a unique identification (ID) of the data and a network address of a first storage medium in which the data is stored;

locating one or more storage media in the distributed network in which the data is stored using an association of the unique ID of the data and one or more physical addresses where the data is stored, wherein a logical address of the data is associated with network addresses of the one or more storage media;

determining, via the association, if there is more than a predetermined threshold number of copies of the data; and

if there is more than the predetermined threshold number of copies of the data;

selecting one or more copies of the data for removal, andremoving the selected one or more copies of the data from a second storage medium selected from among the one or more storage media, wherein selecting the one or more copies comprises selecting the one or more copies of the data that are furthest from a client that frequently accesses the data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method for efficient data storage is provided. A first storage medium associates data stored on one or more data storage media with a unique identification value (ID) for the purpose of determining de-duplication status of the data. In response to receiving a request to read the data from a logical address, the first storage medium retrieves the data from a second storage medium based on the unique ID. In response to receiving a request to write the data to a logical address, the one or more data storage media store at least one copy of the data based on the de-duplication status of the data.

12 Citations

View as Search Results

24 Claims

1. A computer-implemented method for de-duplication of data in a distributed network, the method comprising:
- receiving, by a first de-duplication manager (DDM) in the distributed network, at least a unique identification (ID) of the data and a network address of a first storage medium in which the data is stored;
  
  locating one or more storage media in the distributed network in which the data is stored using an association of the unique ID of the data and one or more physical addresses where the data is stored, wherein a logical address of the data is associated with network addresses of the one or more storage media;
  
  determining, via the association, if there is more than a predetermined threshold number of copies of the data; and
  
  if there is more than the predetermined threshold number of copies of the data;
  
  selecting one or more copies of the data for removal, andremoving the selected one or more copies of the data from a second storage medium selected from among the one or more storage media, wherein selecting the one or more copies comprises selecting the one or more copies of the data that are furthest from a client that frequently accesses the data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the second storage medium is selected according to a dynamic evaluation based on at least one of affinity, frequency of use, price of storage, or estimated risk of failure.
  - 3. The method of claim 1, wherein the removing comprises providing the unique ID of the data to the second storage medium and requesting the second storage medium to remove the data.
  - 4. The method of claim 1, wherein the second storage medium uses the unique ID of the data to locate the data in memory and removes the data.
  - 5. The method of claim 1, further comprising adding a first association comprising at least the unique ID of the data and the network address of the first storage medium to a first database.
  - 6. The method of claim 5, further comprising removing a second association comprising at least the unique ID of the data and the network address of the second storage medium from the first database.
  - 7. The method of claim 1, wherein the second storage medium removes an association comprising the unique ID of the data and a physical address of the data in the second storage medium is removed from a second database.
  - 8. The method of claim 7, wherein the second storage medium removes an association comprising the unique ID of the data, a logical address of the data, and the network address of the second storage medium from the second database.
  - 9. The method of claim 1, further comprising designating the data for background processing by a second DDM in the distributed network.

10. A system for de-duplication of data in a distributed network, comprising:
- a processor; and
  
  a memory coupled to the processor, wherein the memory stores computer code that, when executed by the processor, cause the processor to;
  
  receive, by a first de-duplication manager (DDM) in the distributed network, at least a unique identification (ID) of the data and a network address of a first storage medium in which the data is stored,locate one or more storage media in the distributed network in which the data is stored using an association of the unique ID of the data and one or more physical addresses where the data is stored, wherein a logical address of the data is associated with network addresses of the one or more storage media;
  
  determine, via the association, if there is more than a predetermined threshold number of copies of the data, andif there is more than the predetermined threshold number of copies of the data;
  
  select one or more copies of the data for removal, andremove the selected one or more copies of the data from a second storage medium selected from among the one or more storage media, wherein selecting the one or more copies comprises selecting the one or more copies of the data that are furthest from a client that frequently accesses the data.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The system of claim 10, wherein the second storage medium is selected according to a dynamic evaluation based on at least one of affinity, frequency of use, price of storage, or estimated risk of failure.
  - 12. The system of claim 10, wherein, when removing the selected one or more copies of the data, the processor is configured to provide the unique ID of the data to the second storage medium and requesting the second storage medium to remove the data.
  - 13. The system of claim 10, wherein the second storage medium uses the unique ID of the data to locate the data in the memory and remove the data.
  - 14. The system of claim 10, wherein the processor is further configured to add a first association comprising at least the unique ID of the data and the network address of the first storage medium to a first database.
  - 15. The system of claim 14, wherein the processor is further configured to remove a second association comprising at least the unique ID of the data and the network address of the second storage medium from the first database.
  - 16. The system of claim 10, wherein the second storage medium removes an association comprising the unique ID of the data and a physical address of the data in the second storage medium is removed from a second database.
  - 17. The system of claim 16, wherein the second storage medium removes an association comprising the unique ID of the data, a logical address of the data, and the network address of the second storage medium from the second database.
  - 18. The system of claim 10, wherein the processor is further configured to designate the data for background processing by a second DDM in the distributed network.

19. A physical computer storage memory comprising a computer program product method for de-duplication of data in a distributed network, the physical computer storage memory comprising:
- computer code for receiving, by a first de-duplication manager (DDM) in the distributed network, at least a unique identification (ID) of the data and a network address of a first storage medium in which the data is stored;
  
  computer code for locating one or more storage media in the distributed network in which the data is stored using an association of the unique ID of the data and one or more physical addresses where the data is stored, wherein a logical address of the data is associated with network addresses of the one or more storage media;
  
  computer code for determining, via the association, if there is more than a predetermined threshold number of copies of the data; and
  
  if there is more than the predetermined threshold number of copies of the data;
  
  computer code for selecting one or more copies of the data for removal, andcomputer code for removing the selected one or more copies of the data from a second storage medium selected from among the one or more storage media, wherein selecting the one or more copies comprises selecting the one or more copies of the data that are furthest from a client that frequently accesses the data.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The physical computer storage memory of claim 19, wherein the second storage medium is selected according to a dynamic evaluation based on at least one of affinity, frequency of use, price of storage, or estimated risk of failure.
  - 21. The physical computer storage memory of claim 19, wherein the removing comprises providing the unique ID of the data to the second storage medium and requesting the second storage medium to remove the data.
  - 22. The physical computer storage memory of claim 19, wherein the second storage medium uses the unique ID of the data to locate the data in memory and removes the data.
  - 23. The physical computer storage memory of claim 19, further comprising computer code for adding a first association comprising at least the unique ID of the data and the network address of the first storage medium to a first database.
  - 24. The physical computer storage memory of claim 19, wherein the second storage medium removes an association comprising the unique ID of the data and a physical address of the data in the second storage medium is removed from a second database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Fienblit, Shachar, Goldberg, Itzhack, Schmeilin, Evgeny, Zlotnick, Aviad
Primary Examiner(s)
TRAN, ANHTAI V

Application Number

US12/555,703
Publication Number

US 20110060759A1
Time in Patent Office

1,512 Days
Field of Search

None
US Class Current

707/827
CPC Class Codes

G06F 16/1748 De-duplication implemented ...

G06F 16/27 Replication, distribution o...

Data de-duplication in a distributed network

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

12 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Data de-duplication in a distributed network

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

12 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links