Document-based synonym generation
First Claim
1. A computer-implemented method comprising:
- selecting a pair of words;
receiving a document having an associated title or anchor;
determining that a first word of the pair of words occurs in the title or anchor of the document and that a different, second word of the pair of words occurs within the document;
determining, by one or more computers, that the first word and the different, second word are synonyms based at least upon determining that the first word of the pair of words occurs in the title or anchor of the document and that the different, second word of the pair of words occurs within the document; and
generating an alternative search query for a search query that includes the first word or the different, second word using the first word as a substitute for the second word in the alternative search query or using the second word as a substitute for the first word in the alternative search query.
2 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that automatically generates synonyms for words from documents. During operation, this system determines co-occurrence frequencies for pairs of words in the documents. The system also determines closeness scores for pairs of words in the documents, wherein a closeness score indicates whether a pair of words are located so close to each other that the words are likely to occur in the same sentence or phrase. Finally, the system determines whether pairs of words are synonyms based on the determined co-occurrence frequencies and the determined closeness scores. While making this determination, the system can additionally consider correlations between words in a title or an anchor of a document and words in the document as well as word-form scores for pairs of words in the documents.
65 Citations
21 Claims
-
1. A computer-implemented method comprising:
-
selecting a pair of words; receiving a document having an associated title or anchor; determining that a first word of the pair of words occurs in the title or anchor of the document and that a different, second word of the pair of words occurs within the document; determining, by one or more computers, that the first word and the different, second word are synonyms based at least upon determining that the first word of the pair of words occurs in the title or anchor of the document and that the different, second word of the pair of words occurs within the document; and generating an alternative search query for a search query that includes the first word or the different, second word using the first word as a substitute for the second word in the alternative search query or using the second word as a substitute for the first word in the alternative search query. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; selecting a pair of words; receiving a document having an associated title or anchor; determining that a first word of the pair of words occurs in the title or anchor of the document and that a different, second word of the pair of words occurs within the document; determining that the first word and the different, second word are synonyms based at least upon determining that the first word of the pair of words occurs in the title or anchor of the document and that the different, second word of the pair of words occurs within the document; and generating an alternative search query for a search query that includes the first word or the different, second word using the first word as a substitute for the second word in the alternative search query or using the second word as a substitute for the first word in the alternative search query. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
selecting a pair of words; receiving a document having an associated title or anchor; determining that a first word of the pair of words occurs in the title or anchor of the document and that a different, second word of the pair of words occurs within the document; determining that the first word and the different, second word are synonyms based at least upon determining that the first word of the pair of words occurs in the title or anchor of the document and that the different, second word of the pair of words occurs within the document; and generating an alternative search query for a search query that includes the first word or the different, second word using the first word as a substitute for the second word in the alternative search query or using the second word as a substitute for the first word in the alternative search query. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification