Identifying common co-occurring elements in lists
First Claim
Patent Images
1. A computer-implemented method comprising:
- obtaining, at one or more computers, a pair of terms in a first language, the pair of terms being commonly co-occurring non-synonyms in a corpus of documents, the corpus of documents being in the first language;
determining a set of variations for each term in the pair of terms;
generating a set of known related input pairs based on the sets of variations for each term in the pair of terms;
for each input pair of terms in the set of known related input pairs, translating, by an automatic translation system, each term in the pair of terms into a second language plurality of languages to generate a set of translated terms;
adding, at the one or more computers, the set of translated terms to a blacklist of known non-synonym pairs for at least one of the plurality of languages; and
determining, based on the blacklist of known non-synonym pairs, whether a pair of candidate terms in at least one of the plurality of languages are synonyms.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system for detecting correlations between terms. During operation, the system identifies one or more lists contained in one or more documents and identifies two terms co-occurring in the lists. The system further determines a correlation between the co-occurring terms, and places the co-occurring terms in a correlated-pair list based on the correlation.
-
Citations
11 Claims
-
1. A computer-implemented method comprising:
-
obtaining, at one or more computers, a pair of terms in a first language, the pair of terms being commonly co-occurring non-synonyms in a corpus of documents, the corpus of documents being in the first language; determining a set of variations for each term in the pair of terms; generating a set of known related input pairs based on the sets of variations for each term in the pair of terms; for each input pair of terms in the set of known related input pairs, translating, by an automatic translation system, each term in the pair of terms into a second language plurality of languages to generate a set of translated terms; adding, at the one or more computers, the set of translated terms to a blacklist of known non-synonym pairs for at least one of the plurality of languages; and determining, based on the blacklist of known non-synonym pairs, whether a pair of candidate terms in at least one of the plurality of languages are synonyms. - View Dependent Claims (2, 3, 4)
-
-
5. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining, at one or more computers, a pair of terms in a first language, the pair of terms being commonly co-occurring non-synonyms in a corpus of documents, the corpus of documents being in the first language; determining a set of variations for each term in the pair of terms; generating a set of known related input pairs based on the sets of variations for each term in the pair of terms; for each input pair of terms in the set of known related input pairs, translating, by an automatic translation system, each term in the pair of terms into a plurality of languages to generate a set of translated terms; adding, at the one or more computers, the set of translated terms to a blacklist of known non-synonym pairs for at least one of the plurality of languages; and determining, based on the blacklist of known non-synonym pairs, whether a pair of candidate terms in at least one of the plurality of languages are synonyms. - View Dependent Claims (6, 7, 8)
-
9. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining, at one or more computers, a pair of terms in a first language, the pair of terms being commonly co-occurring non-synonyms in a corpus of documents, the corpus of documents being in the first language; determining a set of variations for each term in the pair of terms; generating a set of known related input pairs based on the sets of variations for each term in the pair of terms; for each input pair of terms in the set of known related input pairs, translating, by an automatic translation system, each term in the pair of terms into a plurality of languages to generate a set of translated terms; adding, at the one or more computers, the set of translated terms to a blacklist of known non-synonym pairs for at least one of the plurality of languages; and determining, based on the blacklist of known non-synonym pairs, whether a pair of candidate terms in at least one of the plurality of languages are synonyms. - View Dependent Claims (10, 11)
-
Specification