Apparatus and methods of identifying potentially similar content for data reduction
First Claim
Patent Images
1. A computer-implemented method of identifying potentially similar content for data reduction, comprising:
- receiving content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata comprises a workflow processing characteristic of the data component;
wherein the content workflow metadata corresponding to the content to be processed further comprises a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata comprises a respective workflow processing characteristic corresponding to a respective data component;
receiving known content workflow metadata corresponding to a plurality of known content, wherein each known content includes a known data component, and wherein each known content workflow metadata comprises a workflow processing characteristic of the corresponding known data component;
comparing the content workflow metadata of the content to be processed and the known content workflow metadata of the plurality of known content according to a similarity rule to identify a first subset of the plurality of known content having potentially similar content relative to the content to be processed;
identifying a first subset of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; and
outputting an identification of the first subset of the plurality of known content and the first subset of the plurality of content to be processed to use in reducing data in the content to be processed;
wherein each workflow processing characteristic relates to a workflow process applicable to the corresponding content; and
wherein the similarity rule comprises a workflow-specific similarity rule, wherein the workflow-specific similarity rule depends on a type of the workflow associated with the content to be processed.
12 Assignments
0 Petitions
Accused Products
Abstract
Apparatus and methods of identifying potentially similar content include utilizing workflow metadata to identify potential similarities in content to be processed, or between content to be processed and known content. As a result, a subset of potentially similar content is identified, and the subset can be used in data reduction operations to reduce data in the content to be processed.
-
Citations
44 Claims
-
1. A computer-implemented method of identifying potentially similar content for data reduction, comprising:
-
receiving content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata comprises a workflow processing characteristic of the data component; wherein the content workflow metadata corresponding to the content to be processed further comprises a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata comprises a respective workflow processing characteristic corresponding to a respective data component; receiving known content workflow metadata corresponding to a plurality of known content, wherein each known content includes a known data component, and wherein each known content workflow metadata comprises a workflow processing characteristic of the corresponding known data component; comparing the content workflow metadata of the content to be processed and the known content workflow metadata of the plurality of known content according to a similarity rule to identify a first subset of the plurality of known content having potentially similar content relative to the content to be processed; identifying a first subset of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; and outputting an identification of the first subset of the plurality of known content and the first subset of the plurality of content to be processed to use in reducing data in the content to be processed; wherein each workflow processing characteristic relates to a workflow process applicable to the corresponding content; and wherein the similarity rule comprises a workflow-specific similarity rule, wherein the workflow-specific similarity rule depends on a type of the workflow associated with the content to be processed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product configured to identify potentially similar content for data reduction, comprising:
-
a computer-readable medium comprising; at least one set of instructions operable to cause a computer to receive content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata comprises a workflow processing characteristic of the data component; wherein the content workflow metadata corresponding to the content to be processed further comprises a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata comprises a respective workflow processing characteristic corresponding to a respective data component, at least one set of instructions operable to cause the computer to receive known content workflow metadata corresponding to a plurality of known contents, wherein each known content includes a known data component, and wherein each known content workflow metadata comprises a workflow processing characteristic of the corresponding known data component; at least one set of instructions operable to cause the computer to compare the content workflow metadata of the content to be processed and the known content workflow metadata of the plurality of known content according to a similarity rule to identify a first subset of the plurality of known content having potentially similar content relative to the content to be processed; at least one set of instructions operable to cause the computer to identify a first subset of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; and at least one set of instructions operable to cause the computer to output an identification of the first subset of the plurality of known content and the first subset of the plurality of content to be processed to use in reducing data in the content to be processed; wherein each workflow processing characteristic relates to a workflow process applicable to the corresponding content; and wherein the similarity rule comprises a workflow-specific similarity rule, wherein the workflow-specific similarity rule depends on a type of the workflow associated with the content to be processed.
-
-
20. At least one processor configured to identify potentially similar content for data reduction, comprising:
-
a first hardware module for receiving content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata comprises a workflow processing characteristic of the data component; wherein the content workflow metadata corresponding to the content to be processed further comprises a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata comprises a respective workflow processing characteristic corresponding to a respective data component; a second module for receiving known content workflow metadata corresponding to a plurality of known contents, wherein each known content workflow metadata comprises a workflow processing characteristic of the corresponding known data component; a third module for comparing the content workflow metadata of the content to be processed and the known content workflow metadata of the plurality of known content according to a similarity rule to identify a first subset of the plurality of known content having potentially similar content relative to the content to be processed; a fourth module for identifying a first subset of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; and a fifth module for outputting an identification of the first subset of the plurality of known content and the first subset of the plurality of content to be processed to use in reducing data in the content to be processed; wherein each workflow processing characteristic relates to a workflow process applicable to the corresponding content; and wherein the similarity rule comprises a workflow-specific similarity rule, wherein the workflow-specific similarity rule depends on a type of the workflow associated with the content to be processed.
-
-
21. A computing device for identifying potentially similar content for data reduction, comprising:
-
means for receiving content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata comprises a workflow processing characteristic of the data component; wherein the content workflow metadata corresponding to the content to be processed further comprises a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata comprises a respective workflow processing characteristic corresponding to a respective data component; means for receiving known content workflow metadata corresponding to a plurality of known contents, wherein each known content includes a known data component, and wherein each known content workflow metadata comprises a workflow processing characteristic of the corresponding known data component; means for comparing the content workflow metadata of the content to be processed and the known content workflow metadata of the plurality of known content according to a similarity rule to identify a first subset of the plurality of known content having potentially similar content relative to the content to be processed; means for identifying a first subset of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; and means for outputting an identification of the first subset of the plurality of known content and the first subset of the plurality of content to be processed to use in reducing data in the content to be processed; wherein each workflow processing characteristic relates to a workflow process applicable to the corresponding content; and wherein the similarity rule comprises a workflow-specific similarity rule, wherein the workflow-specific similarity rule depends on a type of the workflow associated with the content to be processed.
-
-
22. A computing device for identifying potentially similar content for data reduction, comprising:
-
a communications hardware module operable to receive content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata comprises a workflow processing characteristic of the data component; wherein the content workflow metadata corresponding to the content to be processed further comprises a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata comprises a respective workflow processing characteristic corresponding to a respective data component; wherein the communications module is further operable to receive known content workflow metadata corresponding to a plurality of known content, wherein each known content includes a known data component, and wherein each known content workflow metadata comprises a workflow processing characteristic of the corresponding known data component; a similarity identifier module having one or more similarity rules operable to compare the content workflow metadata of the content to be processed and the known content workflow metadata of the plurality of known content according to a similarity rule to identify a first subset of the plurality of known content having potentially similar content relative to the content to be processed; wherein the similarity identifier component is further operable to identify a first subset of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; and wherein the similarity identifier component is further operable to output an identification of the first subset of the plurality of known content and the first subset of the plurality of content to be processed to use in reducing data in the content to be processed; wherein each workflow processing characteristic relates to a workflow process applicable to the corresponding content; and wherein the similarity rule comprises a workflow-specific similarity rule, wherein the workflow-specific similarity rule depends on a type of the workflow associated with the content to be processed. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
-
40. A computer-implemented method of identifying potentially similar content for data reduction, comprising:
-
receiving content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata represents workflow processing information corresponding to the data component; receiving known content workflow metadata corresponding to a first plurality of known content, wherein each known content includes a known data component, and wherein the known content workflow metadata represents workflow processing information corresponding to each respective known data component; determining a potential similarity between the data component of the content to be processed and at least one known data component of at least one of the first plurality of known content based on a similarity between the respective content workflow metadata and the respective known content workflow metadata; outputting an identification of potentially similar content, based on the determined potential similarity, for use in reducing data in the content to be processed; wherein receiving content workflow metadata corresponding to a content to be processed further comprises receiving a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata represents workflow processing information corresponding to a respective data component; identifying potentially similar ones of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; identifying a proper subset of the plurality of content to be processed based on performing a data compression technique on the identified potentially similar ones of the plurality of content to be processed; wherein determining the potential similarity with the first plurality of known content further comprise determining a potential similarity between a respective data component of a respective one of the proper subset of the plurality of content to be processed and a respective known data component of a respective one of the first plurality of known content based on a similarity between the respective content workflow metadata and the respective known content metadata; identifying a second plurality of known content that represent content potentially similar to the proper subset of the plurality of content to be processed based on the determined potential similarity, wherein the second plurality of known content is a proper subset of the first plurality of known content; performing a data compression technique on the proper subset of the plurality of content to be processed and the second plurality of known content to identify a reduced data representation of the plurality of content to be processed; and wherein outputting comprises outputting the reduced data representation.
-
-
41. A computer program product configured to identify potentially similar content for data reduction, comprising:
-
a computer-readable medium comprising; at least one set of instructions operable to cause a computer to receive content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata represents workflow processing information corresponding to the data component; at least one set of instructions operable to cause the computer to receive known content workflow metadata corresponding to a first plurality of known contents, wherein each known content includes a known data component, and wherein the known content workflow metadata represents workflow processing information corresponding to each respective known data component; at least one set of instructions operable to cause the computer to determine a potential similarity between the data component of the content to be processed and at least one known data component of at least one of the first plurality of known contents based on a potential similarity between the respective content workflow metadata and the respective known content workflow metadata; and at least one set of instructions operable to cause the computer to output an identification of potentially similar content, based on the determined potential similarity, for use in reducing data in the content to be processed wherein receiving content workflow metadata corresponding to a content to be processed further comprises receiving a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata represents workflow processing information corresponding to a respective data component; identifying potentially similar ones of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; identifying a proper subset of the plurality of content to be processed based on performing a data compression technique on the identified potentially similar ones of the plurality of content to be processed; wherein determining the potential similarity with the first plurality of known content further comprise determining a potential similarity between a respective data component of a respective one of the proper subset of the plurality of content to be processed and a respective known data component of a respective one of the first plurality of known content based on a similarity between the respective content workflow metadata and the respective known content metadata; identifying a second plurality of known content that represent content potentially similar to the proper subset of the plurality of content to be processed based on the determined potential similarity, wherein the second plurality of known content is a proper subset of the first plurality of known content; performing a data compression technique on the proper subset of the plurality of content to be processed and the second plurality of known content to identify a reduced data representation of the plurality of content to be processed; and wherein outputting comprises outputting the reduced data representation.
-
-
42. At least one processor configured to identify potentially similar content for data reduction, comprising:
-
a first hardware module for receiving content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata represents workflow processing information corresponding to the data component; a second module for receiving known content workflow metadata corresponding to a first plurality of known contents, wherein each known content includes a known data component, and wherein the known content workflow metadata represents workflow processing information corresponding to each respective known data component; a third module for determining a potential similarity between the data component of the content to be processed and at least one known data component of at least one of the first plurality of known contents based on a potential similarity between the respective content workflow metadata and the respective known content workflow metadata; and a fourth module for outputting an identification of potentially similar content, based on the determined potential similarity, for use in reducing data in the content to be processed wherein receiving content workflow metadata corresponding to a content to be processed further comprises receiving a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata represents workflow processing information corresponding to a respective data component; identifying potentially similar ones of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; identifying a proper subset of the plurality of content to be processed based on performing a data compression technique on the identified potentially similar ones of the plurality of content to be processed; wherein determining the potential similarity with the first plurality of known content further comprise determining a potential similarity between a respective data component of a respective one of the proper subset of the plurality of content to be processed and a respective known data component of a respective one of the first plurality of known content based on a similarity between the respective content workflow metadata and the respective known content metadata; identifying a second plurality of known content that represent content potentially similar to the proper subset of the plurality of content to be processed based on the determined potential similarity, wherein the second plurality of known content is a proper subset of the first plurality of known content; performing a data compression technique on the proper subset of the plurality of content to be processed and the second plurality of known content to identify a reduced data representation of the plurality of content to be processed; and wherein outputting comprises outputting the reduced data representation.
-
-
43. A computing device for identifying potentially similar content for data reduction, comprising:
-
means for receiving content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata represents workflow processing information corresponding to the data component; means for receiving known content workflow metadata corresponding to a first plurality of known contents, wherein each known content includes a known data component, and wherein the known content workflow metadata represents workflow processing information corresponding to each respective known data component; means for determining a potential similarity between the data component of the content to be processed and at least one known data component of at least one of the first plurality of known contents based on a potential similarity between the respective content workflow metadata and the respective known content workflow metadata; and means for outputting an identification of potentially similar content, based on the determined potential similarity, for use in reducing data in the content to be processed wherein receiving content workflow metadata corresponding to a content to be processed further comprises receiving a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of content to be processed includes a respective data component, and wherein each respective content workflow metadata represents workflow processing information corresponding to a respective data component; identifying potentially similar ones of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; identifying a proper subset of the plurality of content to be processed based on performing a data compression technique on the identified potentially similar ones of the plurality of content to be processed; wherein determining the potential similarity with the first plurality of known content further comprise determining a potential similarity between a respective data component of a respective one of the proper subset of the plurality of content to be processed and a respective known data component of a respective one of the first plurality of known content based on a similarity between the respective content workflow metadata and the respective known content metadata; identifying a second plurality of known content that represent content potentially similar to the proper subset of the plurality of content to be processed based on the determined potential similarity, wherein the second plurality of known content is a proper subset of the first plurality of known content; performing a data compression technique on the proper subset of the plurality of content to be processed and the second plurality of known content to identify a reduced data representation of the plurality of content to be processed; and wherein outputting comprises outputting the reduced data representation.
-
-
44. A computing device for identifying potentially similar content for data reduction, comprising:
-
a communications hardware module operable to receive content workflow metadata corresponding to content to be processed, wherein the content to be processed includes a data component, and wherein the content workflow metadata represents workflow processing information corresponding to the data component; wherein the communications module is further operable to receive known content workflow metadata corresponding to a first plurality of known content, wherein each known content includes a known data component, and wherein the known content workflow metadata represents workflow processing information corresponding to each respective known data component; a similarity identifier module having one or more similarity rules operable to determine a potential similarity between the data component of the content to be processed and at least one known data component of at least one of the first plurality of known content based on a potential similarity between the respective content workflow metadata and the respective known content workflow metadata; wherein the similarity identifier component is further operable to output an identification of potentially similar content, based on the determined potential similarity, for use in reducing data in the content to be processed; wherein the content workflow metadata corresponding to the content to be processed further comprises a plurality of content workflow metadata corresponding to a plurality of content to be processed, wherein each of the plurality of contents to be processed includes a respective data component, and wherein each respective content workflow metadata represents workflow processing information corresponding to a respective data component; wherein the similarity identifier component is further operable to identify potentially similar ones of the plurality of content to be processed based on determining a potential similarity between respective data components based on the respective content workflow metadata; a data reduction component having a data compression protocol operable to identify a proper subset of the plurality of content to be processed based on performing a data compression technique on the identified potentially similar ones of the plurality of content to be processed; wherein the similarity identifier component is further operable to determine a potential similarity between a respective data component of a respective one of the proper subset of the plurality of content to be processed and a respective known data component of a respective one of the first plurality of known content based on a similarity between the respective content workflow metadata and the respective known content metadata; wherein the similarity identifier component is further operable to identify a second plurality of known content that represent content potentially similar to the proper subset of the plurality of contents to be processed based on the determined potential similarity, wherein the second plurality of known content is a proper subset of the first plurality of known content; wherein the data reduction component is further operable to execute the data compression protocol on the proper subset of the plurality of content to be processed and the second plurality of known content to identify a reduced data representation of the plurality of content to be processed; and wherein the data reduction component is further operable to output the reduced data representation.
-
Specification