Normalizing non-numeric features of files
First Claim
Patent Images
1. An apparatus for normalizing non-numeric features of files, comprising:
- a token segmenting module configured to segment at least one pair of positive instances of a non-numeric feature of a file into a number of tokens, wherein the non-numeric feature of the file comprises a file storage path of a configuration file stored in a networked computer environment;
a token matching module configured to compare the tokens in the at least one pair of positive instances to obtain matching tokens, wherein the token matching module comprises;
a token matching score calculating sub-module configured to calculate a maximum matching score between each token in a positive instance with the tokens in another positive instance;
a token selecting sub-module configured to select the tokens of which the maximum matching scores are greater than a given threshold, to get the matching tokens; and
a token base constructing module configured to, for the matching tokens, calculate weights of their matching the file, and store the tokens and their weights in a token base, wherein the matching tokens identify similar configuration files stored in different locations in the networked computer environment.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments include method, computer program products and apparatuses for normalizing non-numeric features of files and corresponding apparatus. Aspects include segmenting at least one pair of positive instances of a non-numeric feature of a file into a number of tokens and comparing the tokens in the at least one pair of positive instances to obtain matching tokens. Aspects also include calculating weights of their matching the file, for the matching tokens, and storing the tokens and their weights in a token base.
16 Citations
10 Claims
-
1. An apparatus for normalizing non-numeric features of files, comprising:
-
a token segmenting module configured to segment at least one pair of positive instances of a non-numeric feature of a file into a number of tokens, wherein the non-numeric feature of the file comprises a file storage path of a configuration file stored in a networked computer environment; a token matching module configured to compare the tokens in the at least one pair of positive instances to obtain matching tokens, wherein the token matching module comprises; a token matching score calculating sub-module configured to calculate a maximum matching score between each token in a positive instance with the tokens in another positive instance; a token selecting sub-module configured to select the tokens of which the maximum matching scores are greater than a given threshold, to get the matching tokens; and a token base constructing module configured to, for the matching tokens, calculate weights of their matching the file, and store the tokens and their weights in a token base, wherein the matching tokens identify similar configuration files stored in different locations in the networked computer environment. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product for normalizing non-numeric features of files, the computer program product comprising:
-
a non-transitory storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising; segmenting at least one pair of positive instances of a non-numeric feature of a file into a number of tokens, wherein the non-numeric feature of the file comprises a file storage path of a configuration file stored in a networked computer environment; comparing the tokens in the at least one pair of positive instances to obtain matching tokens, wherein the comparing comprises; calculating a maximum matching score between each token in a positive instance with the tokens in another positive instance; selecting the tokens of which the maximum matching scores are greater than a given threshold, to get the matching tokens; and for each of the matching tokens, calculating weights of their matching the file, and storing the tokens and their weights in a token base, wherein the matching tokens identify similar configuration files stored in different locations in the networked computer environment. - View Dependent Claims (9, 10)
-
Specification