System and method for clustering host inventories
First Claim
1. One or more non-transitory media including code for execution that, when executed by a processor, is operable to:
- obtain a plurality of host file inventories corresponding respectively to a plurality of hosts in a network environment, wherein each of the plurality of host file inventories includes one or more file identifiers, each of the file identifiers of a particular host file inventory representing a different executable file on one of the plurality of hosts corresponding to the particular host file inventory;
calculate input data by transforming the plurality of host file inventories into a similarity matrix for the plurality of hosts, wherein for at least each unique pair of host file inventories of the plurality of host file inventories, the transforming includes;
determining a normalized compression distance (NCD) between the unique pair of host file inventories;
determining a numerical value representing a similarity distance between the unique pair of host file inventories, the numerical value being determined based on the NCD; and
updating the similarity matrix to include the numerical value representing the similarity distance between the unique pair of host file inventories; and
provide the input data to a clustering procedure to group the plurality of hosts into one or more clusters of hosts, wherein the one or more clusters of hosts are grouped using a predetermined similarity criteria.
9 Assignments
0 Petitions
Accused Products
Abstract
A method in one example implementation includes obtaining a plurality of host file inventories corresponding respectively to a plurality of hosts, calculating input data using the plurality of host file inventories, and then providing the input data to a clustering procedure to group the plurality of hosts into one or more clusters of hosts. The method further includes each cluster of hosts being grouped using predetermined similarity criteria. In more specific embodiments, each of the host file inventories includes a set of one or more file identifiers with each file identifier representing a different executable software file on a corresponding one of the plurality of hosts. In other more specific embodiments, calculating the input data includes transforming the host file inventories into a matrix of keyword vectors in Euclidean space. In further embodiments, calculating the input data includes transforming the host file inventories into a similarity matrix.
-
Citations
24 Claims
-
1. One or more non-transitory media including code for execution that, when executed by a processor, is operable to:
-
obtain a plurality of host file inventories corresponding respectively to a plurality of hosts in a network environment, wherein each of the plurality of host file inventories includes one or more file identifiers, each of the file identifiers of a particular host file inventory representing a different executable file on one of the plurality of hosts corresponding to the particular host file inventory; calculate input data by transforming the plurality of host file inventories into a similarity matrix for the plurality of hosts, wherein for at least each unique pair of host file inventories of the plurality of host file inventories, the transforming includes; determining a normalized compression distance (NCD) between the unique pair of host file inventories; determining a numerical value representing a similarity distance between the unique pair of host file inventories, the numerical value being determined based on the NCD; and updating the similarity matrix to include the numerical value representing the similarity distance between the unique pair of host file inventories; and provide the input data to a clustering procedure to group the plurality of hosts into one or more clusters of hosts, wherein the one or more clusters of hosts are grouped using a predetermined similarity criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus, comprising:
-
at least one processor coupled to at least one memory element; a host inventory preparation module that when executed by the at least one processor, is configured to; obtain a plurality of host file inventories corresponding respectively to a plurality of hosts in a network environment, wherein each of the plurality of host file inventories includes one or more file identifiers, each of the file identifiers of a particular host file inventory representing a different executable file on one of the plurality of hosts corresponding to the particular host file inventory; and calculate input data by transforming the plurality of host file inventories into a similarity matrix for the plurality of hosts, wherein for at least each unique pair of host file inventories of the plurality of host file inventories, the transforming includes; determining a normalized compression distance (NCD) between the pair of host file inventories; determining a numerical value representing a similarity distance between the pair of host file inventories, the numerical value being determined based on the NCD; and updating the similarity matrix to include the numerical value representing the similarity distance between the pair of host file inventories; and a clustering module that when executed by the at least one processor, is configured to; receive the input data; and group the plurality of hosts into one or more clusters of hosts, wherein the one or more clusters of hosts are grouped using a predetermined similarity criteria. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer implemented method executed by one or more processors, comprising:
-
obtaining a plurality of host file inventories corresponding respectively to a plurality of hosts in a network environment; calculating input data by transforming the plurality of host file inventories into a similarity matrix for the plurality of hosts, wherein, for at least each unique pair of host file inventories of the plurality of host file inventories, the transforming includes; storing, in a first file, one or more file identifiers of a first host file inventory of the pair of host file inventories; storing, in a second file, one or more file identifiers of a second host file inventory of the pair of host file inventories; concatenating the first and second files in a concatenated file; compressing the first file into a compressed first file; compressing the second file into a compressed second file; and compressing the concatenated file into a compressed concatenated file; determining a normalized compression distance (NCD) between the first and second host file inventories based on the compressed first file, the compressed second file, and the compressed concatenated file; determining a numerical value representing a similarity distance between the pair of host file inventories, the numerical value being determined based on the NCD; and updating the similarity matrix to include the numerical value representing the similarity distance between the pair of host file inventories; and providing the input data to a clustering procedure to group the plurality of hosts into one or more clusters of hosts, wherein the one or more clusters of hosts are grouped using a predetermined similarity criteria. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification