Stopping functions for grouping and differentiating files based on content
First Claim
1. In a computing system environment, a method of differentiating files stored on one or more computing devices, comprising:
- over a sequence of rounds, grouping together files compressed according to an original relationship of highly occurring patterns in all bits of binary data of the uncompressed data of the files without any prior classification scheme nor metadata analysis of said files, the original relationship including a distance value between said compressed files in a multi-dimensional space;
determining a tolerance value for the sequence of rounds based on a minimum and a maximum said distance value; and
applying a stopping function to the grouping together.
16 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus teach a digital spectrum of a data file. The digital spectrum is used to map a file'"'"'s position in multi-dimensional space. This position relative to another file'"'"'s position reveals closest neighbors. Certain of the closest neighbors are grouped together, while others are differentiated. Grouping ceases upon application of a stopping function so that rightly sized, optimum numbers of file groups are obtained. Embodiments of stopping functions relate to curve types in a mapping of numbers of groups per sequential rounds of grouping, recognizing whether groups have overlapping file members or not, and/or determining whether groups meet predetermined numbers of members, to name a few. Properly grouped files can then be further acted upon.
-
Citations
18 Claims
-
1. In a computing system environment, a method of differentiating files stored on one or more computing devices, comprising:
-
over a sequence of rounds, grouping together files compressed according to an original relationship of highly occurring patterns in all bits of binary data of the uncompressed data of the files without any prior classification scheme nor metadata analysis of said files, the original relationship including a distance value between said compressed files in a multi-dimensional space; determining a tolerance value for the sequence of rounds based on a minimum and a maximum said distance value; and applying a stopping function to the grouping together. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. In a computing system environment, a method of differentiating files stored on one or more computing devices, comprising:
-
over a sequence of rounds, grouping together files compressed according to an original relationship of highly occurring patterns in all bits of binary data of uncompressed data of the files without any prior classification scheme nor metadata analysis of said files, the original relationship including a distance value between said compressed files in a multi-dimensional space; determining a tolerance value for the sequence of rounds based on a minimum and a maximum said distance value; and applying a stopping function to the grouping together to obtain rightly sized pluralities of file groups. - View Dependent Claims (11, 12, 13)
-
-
14. In a computing system environment, a method of differentiating files stored on one or more computing devices, comprising:
-
over a sequence of rounds, grouping together files compressed according to an original relationship of highly occurring patterns in all bits of binary data of uncompressed data of the files without any prior classification scheme nor metadata analysis of said files, the original relationship including a distance value between said compressed files in a multi-dimensional space; determining a tolerance value for the sequence of rounds based on a minimum and a maximum said distance value; and applying a stopping function to the grouping together to obtain rightly sized pluralities of file groups having members of files below a predetermined numerical threshold. - View Dependent Claims (15, 16, 17, 18)
-
Specification