Please download the dossier by clicking on the dossier button x
×

Apparatus and methods of identifying potentially similar content for data reduction

  • US 7,836,053 B2
  • Filed: 12/28/2007
  • Issued: 11/16/2010
  • Est. Priority Date: 12/28/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of identifying potentially similar content for data reduction, comprising:

  • receiving content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata comprises a workflow processing characteristic of the data component;

    wherein the content workflow metadata corresponding to the content to be processed further comprises a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata comprises a respective workflow processing characteristic corresponding to a respective data component;

    receiving known content workflow metadata corresponding to a plurality of known content, wherein each known content includes a known data component, and wherein each known content workflow metadata comprises a workflow processing characteristic of the corresponding known data component;

    comparing the content workflow metadata of the content to be processed and the known content workflow metadata of the plurality of known content according to a similarity rule to identify a first subset of the plurality of known content having potentially similar content relative to the content to be processed;

    identifying a first subset of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; and

    outputting an identification of the first subset of the plurality of known content and the first subset of the plurality of content to be processed to use in reducing data in the content to be processed;

    wherein each workflow processing characteristic relates to a workflow process applicable to the corresponding content; and

    wherein the similarity rule comprises a workflow-specific similarity rule, wherein the workflow-specific similarity rule depends on a type of the workflow associated with the content to be processed.

View all claims
  • 12 Assignments
Timeline View
Assignment View
    ×
    ×