Cross-lingual indexing and information retrieval
First Claim
1. A method performed by one or more computers, the method comprising:
- retrieving, by an information retrieval system comprising one or more computers, a group of documents over a network, each document in the group of documents being located at a respective network location;
identifying links occurring in the group of documents, wherein each link occurring in the group of documents has associated anchor text and points to a network location of another document in the group of documents;
translating, by one or more translation engines installed on one or more computers of the informational retrieval system, each document in the group of documents into each of a plurality of target languages using a respective context-specific translation model for each of the target languages to generate a respective translated document in each target language for each of the documents in the group, wherein the context of each respective context-specific translation model depends at least in part on the anchor text of one or more of the identified links pointing to the document;
translating, by one or more translation engines, into each of the target languages the anchor text of the links pointing to documents in the group to generate translated anchor texts;
indexing the translated anchor texts;
indexing the documents in the group and the translated documents;
receiving, over the network by a search engine installed on the one or more computers of the information retrieval system, a query in a first language, the first language being one of the target languages, after the translating and indexing of the documents and the anchor texts; and
searching, by the search engine, the documents in the group that are in the first language, the translated documents that are in the first language, and the translated anchor texts that are in the first language to identify documents responsive to the query.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed for searching across multi-lingual information. A user makes a query in a first language, and a group of documents that were previously machine-translated into the first language are searched for information responsive to the query. Contextual information derived can be used to improve the accuracy of the machine translation. Responsive documents are returned to the user. Alternatively, a query provided in a user'"'"'s language may be translated into one or more other languages. Documents written in these languages can then be searched for information responsive to the appropriate translated query. Responsive documents can be translated into the user'"'"'s language prior to providing them to the user.
29 Citations
12 Claims
-
1. A method performed by one or more computers, the method comprising:
-
retrieving, by an information retrieval system comprising one or more computers, a group of documents over a network, each document in the group of documents being located at a respective network location; identifying links occurring in the group of documents, wherein each link occurring in the group of documents has associated anchor text and points to a network location of another document in the group of documents; translating, by one or more translation engines installed on one or more computers of the informational retrieval system, each document in the group of documents into each of a plurality of target languages using a respective context-specific translation model for each of the target languages to generate a respective translated document in each target language for each of the documents in the group, wherein the context of each respective context-specific translation model depends at least in part on the anchor text of one or more of the identified links pointing to the document; translating, by one or more translation engines, into each of the target languages the anchor text of the links pointing to documents in the group to generate translated anchor texts; indexing the translated anchor texts; indexing the documents in the group and the translated documents; receiving, over the network by a search engine installed on the one or more computers of the information retrieval system, a query in a first language, the first language being one of the target languages, after the translating and indexing of the documents and the anchor texts; and searching, by the search engine, the documents in the group that are in the first language, the translated documents that are in the first language, and the translated anchor texts that are in the first language to identify documents responsive to the query. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer readable media encoded with instructions that are operable, when executed by one or more computers, to cause the one or more computers to perform operations comprising:
-
retrieving, by an information retrieval system comprising one or more computers, a group of documents over a network, each document in the group of documents being located at a respective network location; identifying links occurring in the group of documents, wherein each link occurring in the group of documents has associated anchor text and points to a network location of another document in the group of documents; translating, by one or more translation engines installed on one or more computers of the information retrieval system, each document in the group of documents into each of a plurality of target languages using a respective context-specific translation model for each of the target languages to generate a respective translated document in each target language for each of the documents in the group, wherein the context of each respective context-specific translation model depends at least in part on the anchor text of one or more of the identified links pointing to the document; translating, by one or more translation engines, into each of the target languages the anchor text of the links pointing to documents in the group to generate translated anchor texts; indexing the translated anchor texts; indexing the documents in the group and the translated documents; receiving, over the network by a search engine installed on the one or more computers of the information retrieval system, a query in a first language, the first language being one of the target languages, after the translating and indexing of the documents and the anchor texts; and searching, by the search engine, the documents in the group that are in the first language, the translated documents that are in the first language, and the translated anchor texts that are in the first language to identify documents responsive to the query. - View Dependent Claims (6, 7, 8)
-
-
9. An information retrieval system comprising one or more computers and one or more non-transitory memories storing instructions that when executed causes the information retrieval system to perform operations comprising:
-
retrieving, by the information retrieval system, a group of documents over a network, each document in the group of documents being located at a respective network location; identifying links occurring in the group of documents, wherein each link occurring in the group of documents has associated anchor text and points to a network location of another document in the group of documents; translating, by one or more translation engines installed on one or more computers of the information retrieval system, each document in the group of documents into each of a plurality of target languages using a respective context-specific translation model for each of the target languages to generate a respective translated document in each target language for each of the documents in the group, wherein the context of each respective context-specific translation model depends at least in part on the anchor text of one or more of the identified links pointing to the document; translating, by one or more translation engines, into each of the target languages the anchor text of the links pointing to documents in the group to generate translated anchor texts; indexing the translated anchor texts; indexing the documents in the group and the translated documents; receiving, over the network by a search engine installed on one or more computers of the information retrieval system, a query in a first language, the first language being one of the target languages, after the translating and indexing of the documents and the anchor texts; and searching, by the search engine, the documents in the group that are in the first language, the translated documents that are in the first language, and the translated anchor texts that are in the first language to identify documents responsive to the query. - View Dependent Claims (10, 11, 12)
-
Specification