×

Method and apparatus for storing information in a data processing system

  • US 6,374,266 B1
  • Filed: 07/24/1999
  • Issued: 04/16/2002
  • Est. Priority Date: 07/28/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. In a computer system including at least one data source wherein data is stored in source allocation units and a data repository having access to the data source and including a storage device for storing data in repository allocation units, a method for storing data from the data source in the storage device of the data repository, comprising the steps of:

  • (a) reading data from the source allocation units and restructuring the data into data unit having a size corresponding to the repository allocation units;

    (b) for each data unit read from the data source, generating a hash value for the data of each data unit;

    (c) for each data unit read from the data source, searching a data table for a table entry having a hash value matching a hash value of the data unit read from the data source, wherein each table entry contains the hash value of a data unit stored in a repository allocation unit and a repository allocation unit pointer to the corresponding repository allocation unit;

    (d) when the hash value of a data unit does not match any hash value of any table entry in the data table, writing the data of the data unit into a newly allocated repository allocation unit, generating a new table entry containing the hash value of the data unit and a repository allocation unit pointer to the newly allocated repository allocation unit, and writing the new table entry containing the hash value and a repository allocation unit pointer to the newly allocated repository allocation unit to the data table;

    (e) when the hash value of a data unit matches the hash value of a data entry in the data table, accessing the table entry having a matching hash value and using the repository allocation unit pointer therein to read the data of the corresponding repository allocation unit, and comparing the data of the data unit and the data of the corresponding repository allocation unit, if the data of the data unit matches the data of the corresponding repository allocation unit, discarding the data unit, and if the data of the data unit does not match the data of the corresponding repository allocation unit, writing the data of the data unit into a newly allocated repository allocation unit, generating a new table entry containing the hash value of the data unit and a repository allocation unit pointer to the newly allocated repository allocation unit, and inserting the new table entry into the data table; and

    , (f) repeating steps (a) through (e) until all source allocation units have been read.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×