Hash file system and method for use in a commonality factoring system

US 20040148306A1
Filed: 01/14/2004
Published: 07/29/2004
Est. Priority Date: 02/18/2000
Status: Abandoned Application

First Claim

Patent Images

1. A method for managing data comprising:

producing a probabilistically unique identifier for a digital sequence; and

comparing said probabilistically unique identifier to a list of other identifiers with their corresponding digital sequences.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for a computer file system that is based and organized upon hashes and/or strings of digits of certain, different, or changing lengths and which is capable of eliminating or screening redundant copies of aggregate blocks of data (or parts of data blocks) from the system. The hash file system of the present invention utilizes hash values for computer files or file pieces which may be produced by a checksum generating program, engine or algorithm such as industry standard MD4, MD5, SHA or SHA-1 algorithms. Alternatively, the hash values may be generated by a checksum program, engine, algorithm or other means that produces an effectively unique hash value for a block of data of indeterminate size based upon a non-linear probablistic mathematical algorithm.

Citations

40 Claims

1. A method for managing data comprising:
- producing a probabilistically unique identifier for a digital sequence; and
  
  comparing said probabilistically unique identifier to a list of other identifiers with their corresponding digital sequences.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 24)
- - 2. The method of claim 1 further comprising:
    - adding said probabilistically unique identifier to said list if said probabilistically unique identifier is not previously in said list.
  - 3. The method of claim 1 further comprising:
    - removing said probabilistically unique identifier from said list if said probabilistically unique identifier is previously in said list.
  - 4. The method of claim 2 further comprising:
    - adding said digital sequence corresponding to said probabilistically unique identifier to said list.
  - 5. The method of claim 3 further comprising:
    - removing said digital sequence corresponding to said probabilistically unique identifier from said list.
  - 6. The method of claim 4 further comprising:
    - adding a correspondence between said digital sequence and said probabilistically unique identifier for that sequence.
  - 7. The method of claim 1 wherein said step of producing comprises:
    - hashing said digital sequence to produce said probabalistically unique identifier.
  - 8. The method of claim 7 wherein said step of hashing is carried out by means of an industry standard digest algorithm.
  - 9. The method of claim 8 wherein said step of hashing is carried out by one of an MD4, MD5, SHA or SHA-1 algorithm.
  - 10. The method of claim 1 wherein said step of producing comprises:
    - generating a checksum for said digital sequence to produce said probabilistically unique identifier.
  - 11. The method of claim 1 wherein said digital sequence is descriptive meta data of at least one other digital sequence.
  - 12. The method of claim 1 wherein said digital sequence is descriptive meta data of at least one probabilistically unique identifier.
  - 13. The method of claim 1 wherein said digital sequence describes a method that represents at least one digital sequence.
  - 24. The method of claim 9 further comprising the step of:
    - utilizing at least a portion of said probabilistically unique identifier as an indicator to a location in said list for said step of comparing.

14. A method for managing data comprising:
- dividing a digital sequence into a plurality of shorter digital sequences; and
  
  producing probabilistically unique identifiers for each said plurality of shorter digital sequences; and
  
  comparing said probabilistically unique identifiers to a list of other identifiers.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 15. The method of claim 14 further comprising the step of:
    - dividing said digital sequence into a plurality of shorter digital sequences; and
      
      producing a like plurality of probabilistically unique identifiers corresponding to each of said plurality of shorter digital sequences.
  - 16. The method of claim 14 further comprising;
    - comparing each plurality of identifiers to said list.
  - 17. The method of claim 14 wherein said step of dividing produces said shorter digital sequences having individually variable lengths.
  - 18. The method of claim 14 wherein said step of dividing is based on the content of said digital sequence.
  - 19. The method of claim 14 wherein said step of dividing is based on meta data describing said digital sequence.
  - 20. The method of claim 14 wherein said step of dividing produces said shorter digital sequences having substantially invariable lengths.
  - 21. The method of claim 14 wherein said step of producing said like plurality of probabilistically unique identifiers comprises:
    - individually hashing said shorter digital sequences to produce said like plurality of probabilistically unique identifiers.
  - 22. The method of claim 14 further comprising the step of:
    - adding said plurality of shorter digital sequences and said corresponding like plurality of probabilistically unique identifiers to said list.
  - 23. The method of claim 14 further comprising the step of:
    - removing said plurality of shorter digital sequences and said corresponding like plurality of probabilistically unique identifiers from said list.

25. A computer program product comprising:
- a computer usable medium having computer readable code embodied therein for managing data, said computer program product comprising;
  
  computer readable program code devices configured to cause a computer to effect producing a probabilistically unique identifier for a digital sequence; and
  
  computer readable program code devices configured to cause a computer to effect comparing said probabilistically unique identifier to a list of other identifiers corresponding to other digital sequences.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 26. The computer program product of claim 25 further comprising:
    - computer readable program code devices configured to cause a computer to effect adding said probabilistically unique identifier to said list if said probabilistically unique identifier is not previously in said list.
  - 27. The computer program product of claim 26 further comprising:
    - computer readable program code devices configured to cause a computer to effect adding said corresponding digital sequence to said list.
  - 28. The computer program product of claim 25 wherein said computer readable program code devices configured to cause said computer to effect producing comprises:
    - computer readable program code devices configured to cause a computer to effect hashing said digital sequence to produce said probabilistically unique identifier.
  - 29. The computer program product of claim 28 wherein said computer readable program code devices configured to cause a computer to effect hashing is carried out by means of an industry standard digest algorithm.
  - 30. The computer program product of claim 29 wherein said computer readable program code devices configured to cause a computer to effect hashing is carried out by one of an MD4, MD5, SHA or SHA-1 algorithm.
  - 31. The computer program product of claim 25 wherein said computer readable program code devices configured to cause a computer to effect producing comprises:
    - computer readable program code devices configured to cause a computer to effect generating a checksum for said digital sequence to produce said probabilistically unique identifier.
  - 32. The computer program product of claim 25 further comprising:
    - computer readable program code devices configured to cause a computer to effect creating a directory list containing said probabilistically unique identifier for said digital sequence.
  - 33. The computer program product of claim 25 further comprising:
    - computer readable program code devices configured to cause a computer to effect dividing said digital sequence into a plurality of shorter digital sequences; and
      
      computer readable program code devices configured to cause a computer to effect producing a like plurality of probabilistically unique identifiers corresponding to each of said plurality of shorter digital sequences.
  - 34. The computer program product of claim 33 wherein said computer readable program code devices configured to cause a computer to effect dividing produces said shorter digital sequences having individually variable length.
  - 35. The computer program product of claim 33 wherein said computer readable program code devices configured to cause a computer to effect dividing produces said shorter digital sequences having substantially invariable length.
  - 36. The computer program product of claim 33 wherein said computer readable program code devices configured to cause a computer to effect producing said like plurality of probabilistically unique identifiers comprises:
    - computer readable program code devices configured to cause a computer to effect individually hashing said shorter digital sequences to produce said like plurality of probabilistically unique identifiers.
  - 37. The computer program product of claim 33 further comprising:
    - computer readable program code devices configured to cause a computer to effect adding said plurality of shorter digital sequences and said corresponding like plurality of probabilistically unique identifiers to said list.
  - 38. The computer program product of claim 25 further comprising:
    - computer readable program code devices configured to cause a computer to effect utilizing at least a portion of said probabilistically unique identifier as an index into a table of locations for said list for said step of comparing.

39. A method for managing data comprising:
- producing a probabilistically unique identifier for a digital sequence; and
  
  comparing said probabilistically unique identifier to a list of other identifiers corresponding to other digital sequences.
- View Dependent Claims (40)
- - 40. The method of claim 39 further comprising:
    - adding said probabilistically unique identifier to said list if said probabilistically unique identifier is not previously in said list.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Gregory Hagan Moulton, Stephen B. Whitehill
Original Assignee
Gregory Hagan Moulton, Stephen B. Whitehill
Inventors
Moulton, Gregory Hagan, Whitehill, Stephen B.

Application Number

US10/757,753
Publication Number

US 20040148306A1
Time in Patent Office

Days
Field of Search
US Class Current

707/101
CPC Class Codes

G06F 11/1453   using de-duplication of the...

G06F 16/137   Hash-based content-based in...

Y10S 707/99936   Pattern matching access

Y10S 707/99937   Sorting

Y10S 707/99953   Recoverability

Hash file system and method for use in a commonality factoring system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

40 Claims

Specification

Solutions

Use Cases

Quick Links

Hash file system and method for use in a commonality factoring system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

40 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links