Method and apparatus for measuring similarity between documents
First Claim
1. A dynamic programming method for computing a measure of similarity between a first sequence of symbols and a second sequence of symbols, comprising:
- allocating memory for a computational unit for storing intermediate values that are computed using a recursive formulation that computes the measure of similarity based on matching subsequences of symbols between the first sequence of symbols and the second sequence of symbols;
computing with a processor for the computational unit the intermediate values for the measure of similarity using the recursive formulation within which functions are computed following a computation order defined using nested loops that include;
an outer loop with a first index that ranges over increasing sums of prefix lengths of the first sequence of symbols and the second sequence of symbols, a middle loop with a second index that ranges over increasing prefixes of the first sequence of symbols, for each sum of prefix lengths of the outer loop, and an inner loop with a third index that ranges over increasing subsequence lengths, for each prefix of the first sequence of symbols of the middle loop; and
outputting the measure of similarity computed using the intermediate values.
6 Assignments
0 Petitions
Accused Products
Abstract
A measure of similarity between a first sequence of symbols and a second sequence of symbols is computed. Memory is allocated for a computational unit for storing values that are computed using a recursive formulation that computes the measure of similarity based on matching subsequences of symbols between the first sequence of symbols and the second sequence of symbols. A processor computes for the computational unit the values for the measure of similarity using the recursive formulation within which functions are computed using nested loops. The measure of similarity is output by the computational unit to an information processing application.
64 Citations
21 Claims
-
1. A dynamic programming method for computing a measure of similarity between a first sequence of symbols and a second sequence of symbols, comprising:
-
allocating memory for a computational unit for storing intermediate values that are computed using a recursive formulation that computes the measure of similarity based on matching subsequences of symbols between the first sequence of symbols and the second sequence of symbols;
computing with a processor for the computational unit the intermediate values for the measure of similarity using the recursive formulation within which functions are computed following a computation order defined using nested loops that include;
an outer loop with a first index that ranges over increasing sums of prefix lengths of the first sequence of symbols and the second sequence of symbols, a middle loop with a second index that ranges over increasing prefixes of the first sequence of symbols, for each sum of prefix lengths of the outer loop, and an inner loop with a third index that ranges over increasing subsequence lengths, for each prefix of the first sequence of symbols of the middle loop; and
outputting the measure of similarity computed using the intermediate values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for computing a measure of similarity between a first sequence of symbols and a second sequence of symbols, comprising:
-
a memory for storing;
(a) intermediate values that are computed using a recursive formulation that computes the measure of similarity based on matching subsequences of symbols between the first sequence of symbols and the second sequence of symbols, and (b) processing instructions of a computational unit for carrying out the recursive formulation; and
a processor coupled to the memory for executing the processing instructions of the computational unit;
the processor in executing the processing instructions;
computing the intermediate values for the measure of similarity using the recursive formulation within which functions are computed following a computation order defined using nested loops that include;
an outer loop with a first index that ranges over increasing sums of prefix lengths of the first sequence of symbols and the second sequence of symbols, a middle loop with a second index that ranges over increasing prefixes of the first sequence of symbols, for each sum of prefix lengths of the outer loop, and an inner loop with a third index that ranges over increasing subsequence lengths, for each prefix of the first sequence of symbols of the middle loop; and
outputting the meansure of similarity computed using the intermediate values. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. An article of manufacture for use in a machine comprising:
-
a) a memory;
b) instructions stored in the memory for computing a measure of similarity between a first sequence of symbols and a second sequence of symbols comprising;
allocating memory for a computational unit for storing intermediate values that are computed using a recursive formulation that computes the measure of similarity based on matching subsequences of symbols between the first sequence of symbols and the second sequence of symbols;
computing for the computational unit with a processor the intermediate values for the measure of similarity using the recursive formulation within which functions are computed following a computation order defined using nested loops that include;
an outer loop with a first index that ranges over increasing sums of prefix lengths of the first sequence of symbols and the second sequence of symbols, a middle loop with a second index that ranges over increasing prefixes of the first sequence of symbols, for each sum of prefix lengths of the outer loop, and an inner loop with a third index that ranges over increasing subsequence lengths, for each prefix of the first sequence of symbols of the middle loop; and
outputting the measure of similarity computed using the intermediate values. - View Dependent Claims (19, 20, 21)
-
Specification