Augmenting queries with synonyms from synonyms map
First Claim
1. A computer implemented method, comprising:
- generating, in a system comprising one or more computers, a synonyms map from a corpus of documents, wherein;
each document is associated with a document language, the document language representing a natural language of the document,the synonyms map maps each of a plurality of keys to one or more corresponding variants,each key is a common form word whose characters do not include diacritical marks,each of the one or more corresponding variants is a variant of the common form word each variant (i) being found in the corpus of documents, (ii) being determined based on one or more character conversion maps that each specify one or more output characters to which one or more corresponding input characters are mapped, and (iii) including one or more characters that include diacritical marks; and
the synonyms map associates each variant of a key with;
(i) two or more document languages, and (ii) a respective score for each of the two or more document languages, the score indicating a relative frequency of the variant in documents associated with the document language among all variants of the key for the document language.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer program products, operable to perform operations including receiving through a user interface with an interface language a search query having query terms; using the interface language to select one or more mappings and using the selected mappings to simplify each query term; and applying each simplified query term to a synonyms map to identify possible synonyms with which to augment the search query. In alternative embodiments, the operations include generating a synonyms map from a corpus of documents; where the synonyms map maps each of multiple keys to one or more corresponding variants, where each variant is associated with one or more of document languages. In alternative embodiments, the operations include generating a synonyms map from documents by applying document language-dependent mappings to words in the documents to generate keys for the map.
86 Citations
12 Claims
-
1. A computer implemented method, comprising:
-
generating, in a system comprising one or more computers, a synonyms map from a corpus of documents, wherein; each document is associated with a document language, the document language representing a natural language of the document, the synonyms map maps each of a plurality of keys to one or more corresponding variants, each key is a common form word whose characters do not include diacritical marks, each of the one or more corresponding variants is a variant of the common form word each variant (i) being found in the corpus of documents, (ii) being determined based on one or more character conversion maps that each specify one or more output characters to which one or more corresponding input characters are mapped, and (iii) including one or more characters that include diacritical marks; and the synonyms map associates each variant of a key with;
(i) two or more document languages, and (ii) a respective score for each of the two or more document languages, the score indicating a relative frequency of the variant in documents associated with the document language among all variants of the key for the document language. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer-readable storage device storing instructions that, when executed by a computer system, cause the computer system to perform operations comprising:
-
generating, in a system comprising one or more computers, a synonyms map from a corpus of documents, wherein; each document is associated with a document language, the document language representing a natural language of the document, each document language is a natural language, the synonyms map maps each of a plurality of keys to one or more corresponding variants, each key is a common form word whose characters do not include diacritical marks, each of the one or more corresponding variants is a variant of the common form word each variant (i) being found in the corpus of documents, (ii) being determined based on one or more character conversion maps that each specify one or more output characters to which one or more corresponding input characters are mapped, and (iii) including one or more characters that include diacritical marks; and the synonyms map associates each variant of a key with;
(i) two or more document languages, and (ii) a respective score for each of the two or more document languages, the score indicating a relative frequency of the variant in documents associated with the document language among all variants of the key for the document language. - View Dependent Claims (6, 7, 8)
-
-
9. A system comprising:
-
one or more processors; and a computer-readable storage device storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; generating, in a system comprising one or more computers, a synonyms map from a corpus of documents, wherein; each document is associated with a document language, the document language representing a natural language of the document, each document language is a natural language, the synonyms map maps each of a plurality of keys to one or more corresponding variants, each key is a common form word whose characters do not include diacritical marks, each of the one or more corresponding variants is a variant of the common form word each variant (i) being found in the corpus of documents, (ii) being determined based on one or more character conversion maps that each specify one or more output characters to which one or more corresponding input characters are mapped, and (iii) including one or more characters that include diacritical marks; and the synonyms map associates each variant of a key with;
(i) two or more document languages, and (ii) a respective score for each of the two or more document languages, the score indicating a relative frequency of the variant in documents associated with the document language among all variants of the key for the document language. - View Dependent Claims (10, 11, 12)
-
Specification