Longest-common-subsequence detection for common synonyms
First Claim
Patent Images
1. A method for identifying synonym candidates, the method comprising:
- receiving a first term and a second term;
identifying, using one or more processors, a longest subsequence that is common to the first term and the second term;
determining which of the first term and the second term are longer;
determining a ratio between a length of the longest subsequence and a length of the longer of the first term and the second term; and
determining that the ratio between the length of the longest subsequence and the length of the longer of the first term and the second term meets a first threshold;
computing an edit distance between the first term and the second term;
comparing the edit distance to a second threshold;
determining that the edit distance meets the second threshold; and
designating the first term and the second term as synonym candidates based on determining that the ratio between the length of the longest subsequence and the length of the longer of the first term and the second term meets the first threshold and based on determining that the edit distance meets the second threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system for identifying synonym candidates. During operation, the system receives a first term and a second term. The system then determines a length of the longer one of the first and second terms, and determines a longest common subsequence of the two terms. The system further produces a result to indicate whether the two terms are synonym candidates based on the length of the longer term and a length of the longest common subsequence of the two terms.
62 Citations
15 Claims
-
1. A method for identifying synonym candidates, the method comprising:
-
receiving a first term and a second term; identifying, using one or more processors, a longest subsequence that is common to the first term and the second term; determining which of the first term and the second term are longer; determining a ratio between a length of the longest subsequence and a length of the longer of the first term and the second term; and determining that the ratio between the length of the longest subsequence and the length of the longer of the first term and the second term meets a first threshold; computing an edit distance between the first term and the second term; comparing the edit distance to a second threshold; determining that the edit distance meets the second threshold; and designating the first term and the second term as synonym candidates based on determining that the ratio between the length of the longest subsequence and the length of the longer of the first term and the second term meets the first threshold and based on determining that the edit distance meets the second threshold. - View Dependent Claims (2, 3, 10, 11)
-
-
4. A computer system for identifying synonym candidates, the computer system comprising:
-
one or more processors; and a computer-readable storage device storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; receiving a first term and a second term; identifying a longest subsequence that is common to the first term and the second term; determining which of the first term and the second term are longer; determining a ratio between a length of the longest subsequence and a length of the longer of the first term and the second term; and determining that the ratio between the length of the longest subsequence and the length of the longer of the first term and the second term meets a first threshold; computing an edit distance between the first term and the second term; comparing the edit distance to a second threshold; determining that the edit distance meets the second threshold; and designating the first term and the second term as synonym candidates based on determining that the ratio between the length of the longest subsequence and the length of the longer of the first term and the second term meets the first threshold and based on determining that the edit distance meets the second threshold. - View Dependent Claims (5, 6, 12, 13)
-
-
7. A computer readable storage device storing instructions that, when executed by a computer system, cause the computer system to perform operations comprising:
-
receiving a first term and a second term; identifying a longest subsequence that is common to the first term and the second term; determining which of the first term and the second term are longer; determining a ratio between a length of the longest subsequence and a length of the longer of the first term and the second term; and determining that the ratio between the length of the longest subsequence and the length of the longer of the first term and the second term meets a first threshold; computing an edit distance between the first term and the second term; comparing the edit distance to a second threshold; determining that the edit distance meets the second threshold; and designating the first term and the second term as synonym candidates based on determining that the ratio between the length of the longest subsequence and the length of the longer of the first term and the second term meets the first threshold and based on determining that the edit distance meets the second threshold. - View Dependent Claims (8, 9, 14, 15)
-
Specification