System and method for identifying substantially similar files

US 8,185,507 B1
Filed: 04/05/2007
Issued: 05/22/2012
Est. Priority Date: 04/20/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A system, comprising:

a database configured to store data associated with a first file and a second file; and

a processor configured to determine if binary data associated with a first file and a second file are substantially similar, the processor configured to run a first hashing algorithm against a first portion of a first file to generate a first hash value, the first portion being a first predetermined subset of binary data in the first file, and running a second hashing algorithm against the first portion of the first file to generate a second hash value, to determine whether the first hash value and the second hash value are substantially similar to a third hash value and a fourth hash value associated with a second portion of a second file, the third hash value generated using the first hashing algorithm and the fourth hash value generated using the second hashing algorithm, the second portion being a second predetermined subset of binary data in the second file, the second file further having one or more attributes that are substantially similar to one or more corresponding attributes associated with the first file, the processor further configured to identify a uniform resource locator (URL) of the second file if the first hash value and the second hash value are substantially similar to the third hash value and the fourth hash value associated with the second portion of the second file.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Surrogate hashing is described, including a database configured to store data associated with a first file and a second file, and a processor configured to run a first hashing algorithm against a first portion of a first file to generate a first hash value, and running a second hashing algorithm against the first portion of the first file to generate a second hash value, to determine whether the first hash value and the second hash value are substantially similar to one or more stored hash values associated with a second portion of a second file, wherein the second portion is identified by one or more attributes that are substantially similar to one or more corresponding attributes associated with the first portion, and to identify a location of the second file if the first hash value and the second hash value are substantially similar to the one or more stored hash values associated with the second portion of the second file.

114 Citations

8 Claims

1. A system, comprising:
- a database configured to store data associated with a first file and a second file; and
  
  a processor configured to determine if binary data associated with a first file and a second file are substantially similar, the processor configured to run a first hashing algorithm against a first portion of a first file to generate a first hash value, the first portion being a first predetermined subset of binary data in the first file, and running a second hashing algorithm against the first portion of the first file to generate a second hash value, to determine whether the first hash value and the second hash value are substantially similar to a third hash value and a fourth hash value associated with a second portion of a second file, the third hash value generated using the first hashing algorithm and the fourth hash value generated using the second hashing algorithm, the second portion being a second predetermined subset of binary data in the second file, the second file further having one or more attributes that are substantially similar to one or more corresponding attributes associated with the first file, the processor further configured to identify a uniform resource locator (URL) of the second file if the first hash value and the second hash value are substantially similar to the third hash value and the fourth hash value associated with the second portion of the second file.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, further comprising a hash module.
  - 3. The system of claim 2, wherein the hash module is configured to generate the first hashing algorithm.
  - 4. The system of claim 2, wherein the hash module is configured to generate the second hashing algorithm.
  - 5. The system of claim 2, wherein the hash module is configured to generate the first hashing algorithm and the second hashing algorithm.
  - 6. The system of claim 1, wherein the processor is further configured to select a standardized portion.
  - 7. The system of claim 6, wherein the standardized portion is the first portion.
  - 8. The system of claim 6, wherein the standardized portion is the second portion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Concert Technology Corporation
Original Assignee
Pinehill Technology, LLC (Concert Technology Corporation)
Inventors
Kaminski, Charles Jr.
Primary Examiner(s)
Cottingham, John R.
Assistant Examiner(s)
Reyes, Mariela

Application Number

US11/732,832
Time in Patent Office

1,874 Days
Field of Search

707/687, 707/698, 707/705, 707/747
US Class Current

707/698
CPC Class Codes

G06F 16/50   of still image data

G06F 16/532   Query formulation, e.g. gra...

G06F 16/951   Indexing; Web crawling tech...

System and method for identifying substantially similar files

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

114 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for identifying substantially similar files

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

114 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links