Grouping and Differentiating Files Based on Content
First Claim
1. In a computing system environment, a method of differentiating files stored on one or more computing devices, each file having a plurality of symbols representing an underlying data stream of original bits of data, comprising:
- determining a number of occurrences of each said symbol in said each file; and
computing a distance between said each file and every other file based on the determined number of occurrences.
16 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus teach a digital spectrum of a file. The digital spectrum is used to map a file'"'"'s position. This position relative to another file'"'"'s position reveals distances between the files. Representatively, files have a plurality of symbols representing an underlying data stream of original bits of data. The number of occurrences of each symbol in each file is compared to like symbols in other files. This can occur via algorithms, mapping, or both. In certain instances, comparison reveals a difference in counts between the symbols of the files. This difference is then squared, added together, and a square root taken. Comparing “distance values” reveals file adjacency, grouping, or the like. Also, normalizing, weighting, filtering functions and/or other statistical computations are applied in certain instances.
-
Citations
20 Claims
-
1. In a computing system environment, a method of differentiating files stored on one or more computing devices, each file having a plurality of symbols representing an underlying data stream of original bits of data, comprising:
-
determining a number of occurrences of each said symbol in said each file; and computing a distance between said each file and every other file based on the determined number of occurrences. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
- 10. In a computing system environment, a method of differentiating files stored on one or more computing devices, each file having a plurality of symbols representing an underlying data stream of original bits of data, comprising computing a distance between said each file and every other file based on said symbols.
-
16. In a computing system environment, a method of determining closest files stored on one or more computing devices, each file having a plurality of symbols representing an underlying data stream of original bits of data, comprising:
-
computing a distance value between said each file and every other file; and concluding a closest two files based on the computed distance value. - View Dependent Claims (17, 18, 19, 20)
-
Specification