Normalizing non-numeric features of files
First Claim
Patent Images
1. A computer-implemented method for normalizing non-numeric features of files, comprising:
- segmenting at least one pair of positive instances of a non-numeric feature of a file into a number of tokens, wherein the non-numeric feature of the file comprises a file storage path of a configuration file stored in a networked computer environment;
comparing the tokens in the at least one pair of positive instances to obtain matching tokens by;
calculating the maximum matching score between each token in a positive instance with the tokens in another positive instance;
selecting the tokens of which the maximum matching scores are greater than a given threshold, to get the matching tokens; and
for each of the matching tokens, calculating weights of their matching the file, and storing the tokens and their weights in a token base, wherein the matching tokens identify similar configuration files stored in different locations in the networked computer environment.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments include method, computer program products and apparatuses for normalizing non-numeric features of files and corresponding apparatus. Aspects include segmenting at least one pair of positive instances of a non-numeric feature of a file into a number of tokens and -comparing the tokens in the at least one pair of positive instances to obtain matching tokens. Aspects also include calculating weights of their matching the file, for the matching tokens, and storing the tokens and their weights in a token base.
12 Citations
8 Claims
-
1. A computer-implemented method for normalizing non-numeric features of files, comprising:
-
segmenting at least one pair of positive instances of a non-numeric feature of a file into a number of tokens, wherein the non-numeric feature of the file comprises a file storage path of a configuration file stored in a networked computer environment; comparing the tokens in the at least one pair of positive instances to obtain matching tokens by; calculating the maximum matching score between each token in a positive instance with the tokens in another positive instance; selecting the tokens of which the maximum matching scores are greater than a given threshold, to get the matching tokens; and for each of the matching tokens, calculating weights of their matching the file, and storing the tokens and their weights in a token base, wherein the matching tokens identify similar configuration files stored in different locations in the networked computer environment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification