DOCUMENT SIMILARITY EVALUATION SYSTEM, DOCUMENT SIMILARITY EVALUATION METHOD, AND COMPUTER PROGRAM
First Claim
1. A document similarity evaluation system comprising:
- a segment search unit which finds common segments in both a first segment string and a second segment string, counts the number of the common segments that are found, and identifies an appearance range within which the common segments appear; and
a similarity index calculation unit which calculates a first sum that is a sum of the numbers of characters of each segment included in the appearance range identified by the segment search unit, calculates a second sum that is a sum of the numbers of characters of each segment identified as the common segments, and calculates the similarity index indicating the similarity between the first segment string and the second segment string by using the following equation,similarity index=F(NTC)/G(NCC)×
NS(Where, in the above-mentioned equation,NTC is the first sum,NCC is the second sum,NS is the number of the common segments, anda function F and a function G are monotonically increasing functions by which a certain integer value is associated with a positive real value).
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a document similarity evaluation system or the like which can evaluate a degree of concentration and dispersion of parts with high similarity in at least two kinds of documents. The system includes a segment search unit which finds common segments (CS) in first and second segment strings, counts the number of CS, and identifies an appearance range (AR) within CS; and a similarity index (SI) calculation unit which calculates a first sum that is a sum of the numbers of characters of each segment (NCS) in AR and a second sum that is a sum of NCS of CS and calculates SI between the first and second segment strings by the following equation, SI=F(NTC)/G(NCC)×NS (where, NTC is the first sum, NCC is the second sum, NS is the number of the CS, functions F and G monotonically increase at larger than 0).
17 Citations
10 Claims
-
1. A document similarity evaluation system comprising:
-
a segment search unit which finds common segments in both a first segment string and a second segment string, counts the number of the common segments that are found, and identifies an appearance range within which the common segments appear; and a similarity index calculation unit which calculates a first sum that is a sum of the numbers of characters of each segment included in the appearance range identified by the segment search unit, calculates a second sum that is a sum of the numbers of characters of each segment identified as the common segments, and calculates the similarity index indicating the similarity between the first segment string and the second segment string by using the following equation, similarity index=F(NTC)/G(NCC)×
NS(Where, in the above-mentioned equation, NTC is the first sum, NCC is the second sum, NS is the number of the common segments, and a function F and a function G are monotonically increasing functions by which a certain integer value is associated with a positive real value). - View Dependent Claims (2, 3, 4)
-
-
5. A document similarity evaluation method calculating a similarity index indicating a similarity between a first segment string and a second segment string comprising:
-
finding common segments in both the first segment string and the second segment string, counting the number of the common segments that are found; identifying an appearance range within which the common segments appear; calculating a first sum that is a sum of the numbers of characters of each segment included in the appearance range; calculating a second sum that is a sum of the numbers of characters of each segment identified as the common segments; and calculating the similarity index by the following equation, similarity index=F(NTC)/G(NCC)×
NS(Where, in the above-mentioned equation, NTC is the first sum, NCC is the second sum, NS is the number of the common segments, and a function F and a function G are monotonically increasing functions by which a certain integer value is associated with a positive real value). - View Dependent Claims (6, 7)
-
-
8. A non-transitory computer-readable storage medium storing a computer program which causes a computer to realize:
-
a segment search function to find common segments in both a first segment string and a second segment string, count the number of the common segments that are found, and identify an appearance range within which the common segments appears; and a similarity index calculation function to calculate a first sum that is a sum of the numbers of characters of each segment included in the appearance range identified in the segment search function, calculate a second sum that is a sum of the numbers of characters of each segment identified as the common segments, and calculate a similarity index indicating the similarity between the first segment string and the second segment string by using the following equation, similarity index=F(NTC)/G(NCC)×
NS(Where, in the above-mentioned equation, NTC is the first sum, NCC is the second sum, NS is the number of the common segments, and a function F and a function G are monotonically increasing functions by which a certain integer value is associated with a positive real value). - View Dependent Claims (9, 10)
-
Specification