Methods and systems for eliminating data redundancies

US 20020169934A1
Filed: 03/22/2002
Published: 11/14/2002
Est. Priority Date: 03/23/2001
Status: Active Grant

First Claim

Patent Images

1. A method for eliminating data redundancies in a data processing system, the method comprising the steps of:

obtaining a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block;

determining whether a second data block identifier matching the first data block identifier exists, the second data block identifier being calculated based on data of a second data block; and

when it is determined that the second data block identifier matching the first data block identifier exists, indicating that the first data block identifier is redundant.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and articles of manufacture consistent with the present invention eliminate data redundancies. A first data block identifier is obtained for a first data block, the first data block identifier being calculated based on data of the first data block. It is determined whether a second data block identifier matching the first data block identifier exists, the second data block identifier being calculated based on data of a second data block. When it is determined that the second data block identifier matching the first data block identifier exists, the first data block identifier is indicated as being is redundant.

Citations

54 Claims

1. A method for eliminating data redundancies in a data processing system, the method comprising the steps of:
- obtaining a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block;
  
  determining whether a second data block identifier matching the first data block identifier exists, the second data block identifier being calculated based on data of a second data block; and
  
  when it is determined that the second data block identifier matching the first data block identifier exists, indicating that the first data block identifier is redundant.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 39, 40, 41, 42, 43, 44, 45)
- - 2. The method of claim 1, further comprising the step of:
    - adding the first data block identifier to a list of other data block identifiers when it is determined that the second data block identifier does not exist.
  - 3. The method of claim 2, wherein the first data block identifier is added to the list of other data block identifiers with an address of the first data block.
  - 4. The method of claim 1, further comprising the step of:
    - when it is determined that the second data block identifier matching the first data block identifier exists, deleting the first data block.
  - 5. The method of claim 1, further comprising the step of:
    - when it is determined that the second data block identifier matching the first data block identifier exists, replacing the first data block with a reference to the second data block.
  - 6. The method of claim 1, further comprising the step of:
    - when it is determined that the second data block identifier matching the first data block identifier does not exist, storing the first data block.
  - 7. The method of claim 1, further comprising the step of:
    - when it is determined that the second data block identifier matching the first data block identifier does not exist, transmitting the first data block to a client.
  - 8. The method of claim 1, wherein the first and second data block identifiers comprise checksums.
  - 9. The method of claim 1, further comprising the steps of:
    - when it is determined that the second data block identifier matching the first data block identifier exists, retrieving the second data block associated with the second data block identifier; and
      
      determining whether the first data block and the retrieved second data block correspond to one another.
  - 10. The method of claim 9, wherein determining whether the first data block and the retrieved second data block correspond to one another comprises:
    - obtaining a revised first data block identifier having more bits than the first data block identifier; and
      
      obtaining a revised second data block identifier having more bits than the second data block identifier.
  - 11. The method of claim 9, wherein determining whether the first data block and the retrieved second data block correspond to one another comprises:
    - comparing at least one of a plurality of bits of the first data block with at least one of a plurality of bits of the second data block.
  - 12. The method of claim 9, wherein the step of determination whether the first data block and the second data block correspond to one another comprises comparing at least one of a data block name of the first and a data block name of the second data block, a date indicating a time of creation of the first data block and a date indicating a time of creation of the second data block, and a date indicating a time of alteration of the first data block and a date indicating a time of alteration of the second data block.
  - 14. The method of claim 13 further comprising the steps of:
    - when it is determined that the new identifier is not equivalent to one of the associated identifiers, allocating memory for the data;
      
      storing the data in the allocated memory; and
      
      returning a reference to the allocated memory.
  - 16. The method of claim 15, wherein the allocation request includes at least one of an address of the first data block and the first data block identifier;
    - and wherein the allocation response includes a reference to the second data block when the second data block identifier exists; and
      
      wherein the data processing unit uses the reference to the second data block to access the second data block.
  - 17. The method of claim 15, wherein the allocation response includes an allocation instruction to allocate storage space for the first data block when the second data block identifier does not exist and to transmit an address of the allocated storage space to the redundancy handler for instructing the redundancy handler to store the address of the first data block in association with the first data block identifier in a list of data block identifiers.
  - 18. The method of claim 15, wherein the first and second data block identifiers comprise checksums.
  - 19. The method of claim 15, further comprising the steps of:
    - retrieving the second data block associated with the second data block identifier when the second data block identifier exists; and
      
      determining whether the first data block and the retrieved second data block correspond to one another.
  - 20. The method of claim 19, wherein determining whether the first data block and the retrieved second data block correspond to one another comprises:
    - obtaining a revised first data block identifier having more bits than the first data block identifier; and
      
      obtaining a revised second data block identifier having more bits than the second data block identifier.
  - 21. The method of claim 19, wherein determining whether the first data block and the retrieved second data block correspond to one another comprises:
    - comparing at least one of a plurality of bits of the first data block with at least one of a plurality of bits of the second data block.
  - 22. The method of claim 19, wherein the step of determination whether the first data block and the second data block correspond to one another comprises comparing at least one of a data block name of the first and a data block name of the second data block, a date indicating a time of creation of the first data block and a date indicating a time of creation of the second data block, and a date indicating a time of alteration of the first data block and a date indicating a time of alteration of the second data block.
  - 25. The computer-readable medium of claim 24, further comprising the step of:
    - adding the first data block identifier to a list of other data block identifiers when it is determined that the second data block identifier does not exist.
  - 26. The computer-readable medium of claim 25, wherein the first data block identifier is added to the list of other data block identifiers with an address of the first data block.
  - 27. The computer-readable medium of claim 24, further comprising the step of:
    - when it is determined that the second data block identifier matching the first data block identifier exists, deleting the first data block.
  - 28. The computer-readable medium of claim 24, further comprising the step of:
    - when it is determined that the second data block identifier matching the first data block identifier exists, replacing the first data block with a reference to the second data block.
  - 29. The computer-readable medium of claim 24, further comprising the step of:
    - when it is determined that the second data block identifier matching the first data block identifier does not exist, storing the first data block.
  - 30. The computer-readable medium of claim 24, further comprising the step of:
    - when it is determined that the second data block identifier matching the first data block identifier does not exist, transmitting the first data block to a client.
  - 31. The computer-readable medium of claim 24, wherein the first and second data block identifiers comprise checksums.
  - 32. The computer-readable medium of claim 24, further comprising the steps of:
    - when it is determined that the second data block identifier matching the first data block identifier exists, retrieving the second data block associated with the second data block identifier; and
      
      determining whether the first data block and the retrieved second data block correspond to one another.
  - 33. The computer-readable medium of claim 32, wherein determining whether the first data block and the retrieved second data block correspond to one another comprises:
    - obtaining a revised first data block identifier having more bits than the first data block identifier; and
      
      obtaining a revised second data block identifier having more bits than the second data block identifier.
  - 34. The computer-readable medium of claim 32, wherein determining whether the first data block and the retrieved second data block correspond to one another comprises:
    - comparing at least one of a plurality of bits of the first data block with at least one of a plurality of bits of the second data block.
  - 35. The computer-readable medium of claim 32, wherein the step of determination whether the first data block and the second data block correspond to one another comprises comparing at least one of a data block name of the first and a data block name of the second data block, a date indicating a time of creation of the first data block and a date indicating a time of creation of the second data block, and a date indicating a time of alteration of the first data block and a date indicating a time of alteration of the second data block.
  - 37. The computer-readable medium of claim 36 further comprising the steps of:
    - when it is determined that the new identifier is not equivalent to one of the associated identifiers, allocating memory for the data;
      
      storing the data in the allocated memory; and
      
      returning a reference to the allocated memory.
  - 39. The computer-readable medium of claim 38, wherein the allocation request includes at least one of an address of the first data block and the first data block identifier;
    - and wherein the allocation response includes a reference to the second data block when the second data block identifier exists; and
      
      wherein the data processing unit uses the reference to the second data block to access the second data block.
  - 40. The computer-readable medium of claim 38, wherein the allocation response includes an allocation instruction to allocate storage space for the first data block when the second data block identifier does not exist and to transmit an address of the allocated storage space to the redundancy handler for instructing the redundancy handler to store the address of the first data block in association with the first data block identifier in a list of data block identifiers.
  - 41. The computer-readable medium of claim 38, wherein the first and second data block identifiers comprise checksums.
  - 42. The computer-readable medium of claim 38, further comprising the steps of:
    - retrieving the second data block associated with the second data block identifier when the second data block identifier exists; and
      
      determining whether the first data block and the retrieved second data block correspond to one another.
  - 43. The computer-readable medium of claim 42, wherein determining whether the first data block and the retrieved second data block correspond to one another comprises:
    - obtaining a revised first data block identifier having more bits than the first data block identifier; and
      
      obtaining a revised second data block identifier having more bits than the second data block identifier.
  - 44. The computer-readable medium of claim 42, wherein determining whether the first data block and the retrieved second data block correspond to one another comprises:
    - comparing at least one of a plurality of bits of the first data block with at least one of a plurality of bits of the second data block.
  - 45. The computer-readable medium of claim 42, wherein the step of determination whether the first data block and the second data block correspond to one another comprises comparing at least one of a data block name of the first and a data block name of the second data block, a date indicating a time of creation of the first data block and a date indicating a time of creation of the second data block, and a date indicating a time of alteration of the first data block and a date indicating a time of alteration of the second data block.

13. A method in a data processing system having data blocks with associated identifiers, the method comprising the steps of:
- receiving a request for a reference to a memory location that stores data, the request comprising the data;
  
  creating a new identifier that is based on the data;
  
  determining whether the new identifier is equivalent to one of the associated identifiers;
  
  when it is determined that the new identifier is equivalent to one of the associated identifiers, returning a reference to the data block that is associated with the one associated identifier.

15. A method for avoiding data redundancies in a data processing system, the method comprising the steps of:
- obtaining a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block;
  
  generating a memory allocation request for the first data block;
  
  transmitting the memory allocation request to a redundancy handler, the memory allocation request instructing the redundancy handler to determine whether a second data block identifier matching the first data block identifier exists, wherein the second data block identifier is calculated based on data of a second data block; and
  
  receiving an allocation response indicating whether the second data block identifier of the second data block exists.

23. A method for eliminating data redundancies in a data processing system, the method comprising the steps of:
- receiving a first data block;
  
  calculating a first data block identifier based on data of the first data block;
  
  determining whether a second data block identifier matching the first data block identifier exists in a list of other data block identifiers, the second data block identifier being calculated based on data of a second data block;
  
  when it is determined that the second data block identifier matching the first data block identifier exists, deleting the first data block; and
  
  when it is determined that the second data block identifier matching the first data block identifier does not exist, adding the first data block identifier to the list.

24. A computer-readable medium containing instructions that cause a data processing system to perform a method comprising the steps of:
- obtaining a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block;
  
  determining whether a second data block identifier matching the first data block identifier exists, the second data block identifier being calculated based on data of a second data block; and
  
  when it is determined that the second data block identifier matching the first data block identifier exists, indicating that the first data block identifier is redundant.

36. A computer-readable medium containing instructions that cause a data processing system having blocks associated with identifiers to perform a method comprising the steps of:
- receiving a request for a reference to a memory location that stores data, the request comprising the data;
  
  creating a new identifier that is based on the data;
  
  determining whether the new identifier is equivalent to one of the associated identifiers;
  
  when it is determined that the new identifier is equivalent to one of the associated identifiers, returning a reference to the data block that is associated with the one associated identifier.

38. A computer-readable medium containing instructions that cause a data processing system to perform a method comprising the steps of:
- obtaining a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block;
  
  generating a memory allocation request for the first data block;
  
  transmitting the memory allocation request to a redundancy handler, the memory allocation request instructing the redundancy handler to determine whether a second data block identifier matching the first data block identifier exists, wherein the second data block identifier is calculated based on data of a second data block; and
  
  receiving an allocation response indicating whether the second data block identifier of the second data block exists.

46. A computer-readable medium containing instructions that cause a data processing system to perform a method comprising the steps of:
- receiving a first data block;
  
  calculating a first data block identifier based on data of the first data block;
  
  determining whether a second data block identifier matching the first data block identifier exists in a list of other data block identifiers, the second data block identifier being calculated based on data of a second data block;
  
  when it is determined that the second data block identifier matching the first data block identifier exists, deleting the first data block; and
  
  when it is determined that the second data block identifier matching the first data block identifier does not exist, adding the first data block identifier to the list.

47. A data processing system comprising:
- a secondary storage device having a stored data block with data;
  
  a memory comprising a computer program that obtains a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block, determines whether a second data block identifier matching the first data block identifier exists, the second data block identifier being calculated based on data of a second data block, and when it is determined that the second data block identifier matching the first data block identifier exists, indicates that the first data block identifier is redundant; and
  
  a processing unit that runs the computer program.

48. A data processing system comprising:
- a secondary storage device having a stored data block with data;
  
  a memory comprising a computer program that receives a request for a reference to a memory location that stores data, the request comprising the data, creates a new identifier that is based on the data, determines whether the new identifier is equivalent to one of the associated identifiers, and when it is determined that the new identifier is equivalent to one of the associated identifiers, returns a reference to the data block that is associated with the one associated identifier; and
  
  a processing unit that runs the computer program.

49. A data processing system comprising:
- a secondary storage device having a stored data block with data;
  
  a memory comprising a computer program that obtains a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block, generates a memory allocation request for the first data block, transmits the memory allocation request to a redundancy handler, the memory allocation request instructing the redundancy handler to determine whether a second data block identifier matching the first data block identifier exists, wherein the second data block identifier is calculated based on data of a second data block, and receives an allocation response indicating whether the second data block identifier of the second data block exists; and
  
  a processing unit that runs the computer program.

50. A data processing system for eliminating data redundancies, the data processing system comprising:
- means for obtaining a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block;
  
  means for determining whether a second data block identifier matching the first data block identifier exists, the second data block identifier being calculated based on data of a second data block; and
  
  means for, when it is determined that the second data block identifier matching the first data block identifier exists, indicating that the first data block identifier is redundant.

51. A data processing system for eliminating data redundancies, the data processing system having data blocks with associated identifiers, the data processing system comprising:
- means for receiving a request for a reference to a memory location that stores data, the request comprising the data;
  
  means for creating a new identifier that is based on the data;
  
  means for determining whether the new identifier is equivalent to one of the associated identifiers;
  
  means for, when it is determined that the new identifier is equivalent to one of the associated identifiers, means for returning a reference to the data block that is associated with the one associated identifier.

52. A data processing system for eliminating data redundancies, the data processing system comprising:
- means for obtaining a first data block identifier for a first data block, the first data block identifier being calculated based on data of the first data block;
  
  means for generating a memory allocation request for the first data block;
  
  means for transmitting the memory allocation request to a redundancy handler, the memory allocation request instructing the redundancy handler to determine whether a second data block identifier matching the first data block identifier exists, wherein the second data block identifier is calculated based on data of a second data block; and
  
  means for receiving an allocation response indicating whether the second data block identifier of the second data block exists.

53. A data processing system for eliminating data redundancies, the data processing system comprising:
- means for receiving a first data block;
  
  means for calculating a first data block identifier based on data of the first data block;
  
  means for determining whether a second data block identifier matching the first data block identifier exists in a list of other data block identifiers, the second data block identifier being calculated based on data of a second data block;
  
  means for, when it is determined that the second data block identifier matching the first data block identifier exists, deleting the first data block; and
  
  means for, when it is determined that the second data block identifier matching the first data block identifier does not exist, adding the first data block identifier to the list.

54. A computer-readable memory device encoded with a data structure and a program that accesses the data structure, the program is run by a processor in a data processing system, the data structure having a plurality of entries, each entry comprising:
- a reference to a data block that contains data and an identifier that is based on the data using a calculation, wherein when the program receives a request to create a new data block containing new data, the program creates a new identifier based on the new data using the calculation and compares the new identifier to the identifiers in the entries to prevent a data block redundancy.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle America, Inc. (Oracle Corporation)
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
Heilig, Joerg, Laux, Thorsten, Krapp, Oliver

Granted Patent

US 6,889,297 B2
Time in Patent Office

Days
Field of Search
US Class Current

711/159
CPC Class Codes

G06F 16/10   File systems; File servers

H03M 7/30   Compression speech analysis...

Y10S 707/99953   Recoverability

Y10S 707/99955   Archiving or backup

Methods and systems for eliminating data redundancies

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

54 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for eliminating data redundancies

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

54 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links