Method and system for authoritative name analysis of true origin of a file
First Claim
1. A method for providing an authoritative name source for files within an ecosystem, comprising the following performed by at least one processor:
- clustering the files in the ecosystem into a plurality of superclusters, in which the files in each supercluster of the plurality of superclusters have identical contents;
thendetermining, of the files in the ecosystem which are clustered into the plurality of superclusters, which of the files have similar contents to each other, and merging the files which have similar contents to each other into the same supercluster, to capture possibly incremental changes to the files over time in one of the superclusters which has the files with identical contents and similar contents;
for each supercluster which has the files with identical and similar contents;
breaking the each supercluster down into package clusters, based on packages to which the files belong, each of the package clusters has the files from a same package; and
determining which of the package clusters has most change frequency across versions of the files within the same package, as the authoritative package, wherein change frequency refers to how frequently the version is changed in relation to how frequently the package is released;
thenresolving an authoritative name for the files, based on the authoritative packages that are determined, across the plurality of superclusters which have files with identical and similar contents, and generating the authoritative name; and
resolving any authoritative name collision.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer system, method, or non-transitory computer-readable medium provides an authoritative name source for files within an ecosystem. Files in the ecosystem which have identical contends and similar contents to each other are merged into the same supercluster, to capture possibly incremental changes to the files over time in one of the superclusters. For each supercluster which has files with identical and similar contents, the supercluster is broken down into package clusters, based on packages to which the files belong, each of the package clusters has the files from a same package. The package cluster which has most change frequency across versions, is identified as the authoritative package. The authoritative name for the files is resolved, based on the authoritative packages that are determined, across the plurality of superclusters which have files with identical and similar contents, and the authoritative name is generated. Any authoritative name collision is resolved.
222 Citations
20 Claims
-
1. A method for providing an authoritative name source for files within an ecosystem, comprising the following performed by at least one processor:
-
clustering the files in the ecosystem into a plurality of superclusters, in which the files in each supercluster of the plurality of superclusters have identical contents;
thendetermining, of the files in the ecosystem which are clustered into the plurality of superclusters, which of the files have similar contents to each other, and merging the files which have similar contents to each other into the same supercluster, to capture possibly incremental changes to the files over time in one of the superclusters which has the files with identical contents and similar contents; for each supercluster which has the files with identical and similar contents; breaking the each supercluster down into package clusters, based on packages to which the files belong, each of the package clusters has the files from a same package; and determining which of the package clusters has most change frequency across versions of the files within the same package, as the authoritative package, wherein change frequency refers to how frequently the version is changed in relation to how frequently the package is released;
thenresolving an authoritative name for the files, based on the authoritative packages that are determined, across the plurality of superclusters which have files with identical and similar contents, and generating the authoritative name; and resolving any authoritative name collision. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer readable medium comprising instructions for execution by a computer, the instructions including a computer-implemented method for providing an authoritative name source for files within an ecosystem, the instructions for implementing:
-
clustering the files in the ecosystem into a plurality of superclusters, in which the files in each supercluster of the plurality of superclusters have identical contents;
thendetermining, of the files in the ecosystem which are clustered into the plurality of superclusters, which of the files have similar contents to each other, and merging the files which have similar contents to each other into the same supercluster, to capture possibly incremental changes to the files over time in one of the superclusters which has the files with identical contents and similar contents; for each supercluster which has the files with identical and similar contents; breaking the each supercluster down into package clusters, based on packages to which the files belong, each of the package clusters has the files from a same package; and determining which of the package clusters has most change frequency across versions of the files within the same package, as the authoritative package, wherein change frequency refers to how frequently the version is changed in relation to how frequently the package is released;
thenresolving an authoritative name for the files, based on the authoritative packages that are determined, across the plurality of superclusters which have files with identical and similar contents, and generating the authoritative name; and resolving any authoritative name collision. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer system that provides an authoritative name source for files within an ecosystem, comprising:
at least one processor, the at least one processor is configured to; identify the ecosystem; cluster the files in the ecosystem into a plurality of superclusters, in which the files in each supercluster of the plurality of superclusters have identical contents;
thendetermine, of the files in the ecosystem which are clustered into the plurality of superclusters, which of the files have similar contents to each other, and merge the files which have similar contents to each other into the same supercluster, to capture possibly incremental changes to the files over time in one of the superclusters which has the files with identical contents and similar contents; for each supercluster which has the files with identical and similar contents; break the each supercluster down into package clusters, based on packages to which the files belong, each of the package clusters has the files from a same package; and determine which of the package clusters has most change frequency across versions of the files within the same package, as the authoritative package, wherein change frequency refers to how frequently the version is changed in relation to how frequently the package is released;
thenresolve an authoritative name for the files, based on the authoritative packages that are determined, across the plurality of superclusters which have files with identical and similar contents, and generate the authoritative name; and resolve any authoritative name collision. - View Dependent Claims (16, 17, 18, 19, 20)
Specification