Systems, methods, software, and interfaces for multilingual information retrieval
First Claim
1. A method comprising:
- defining a set of one or more language-specific indices for a collection of documents, with each index including stemmed and non-stemmed versions of terms contained in the documents;
receiving a query from a user, with the query associated with a set of one or more target languages;
parsing the query into one or more terms, with each term associated with a corresponding language identifier and a stemmed version of the term;
translating the original and stemmed versions of each term into each of the target languages to define respective sets of one or more equivalent query terms; and
identifying a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language.
5 Assignments
0 Petitions
Accused Products
Abstract
The present inventors have a devised one or more novel methods, systems, and interfaces for facilitating multi-lingual searches. One exemplary method entails creating multiple language-specific indices for a collection of documents, with each index including stemmed and non-stemmed versions of terms from the documents. Users submit queries that are associated with a set of one or more target languages. Query processing entails translating original and stemmed versions of each term in a query into each of the target languages, using one or more techniques that each yield a set of potentially equivalent query terms. Each set of potentially equivalent query terms is then processed against the corresponding language-specific index, using a conventional monolingual search technique, such as a Boolean or natural language query, to identify documents from the collection. The resultant documents are presented to the user in language groupings or by computed relevance.
64 Citations
20 Claims
-
1. A method comprising:
-
defining a set of one or more language-specific indices for a collection of documents, with each index including stemmed and non-stemmed versions of terms contained in the documents;
receiving a query from a user, with the query associated with a set of one or more target languages;
parsing the query into one or more terms, with each term associated with a corresponding language identifier and a stemmed version of the term;
translating the original and stemmed versions of each term into each of the target languages to define respective sets of one or more equivalent query terms; and
identifying a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a collection of documents;
a set of one or more language-specific indices for the collection of documents, with each index including stemmed and non-stemmed versions of terms contained in the documents; and
a server for interacting with the collection of documents and the set of language-specific indices, with the server configured;
to receive a query from a user, with the query associated with a set of one or more target languages;
to parse the query into one or more terms, with each term associated with a corresponding language identifier and a stemmed version of the term;
to translate the original and stemmed versions of each term into each of the target languages and thus define respective sets of one or more equivalent query terms; and
to identify a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A server for interacting with a collection of documents and a set of language-specific indices, with the server configured:
-
to receive a query from a user, with the query associated with a set of one or more target languages;
to parse the query into one or more terms, with each term associated with a corresponding language identifier and a stemmed version of the term;
to translate the original and stemmed versions of each term into each of the target languages and thus define respective sets of one or more equivalent query terms; and
to identify a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A machine-readable medium for causing a server to interact with a collection of documents and a set of language-specific indices, with the medium comprising instructions for causing the server:
-
to receive a query from a user, with the query associated with a set of one or more target languages;
to parse the query into one or more terms, with each term associated with a corresponding language identifier and a stemmed version of the term;
to translate the original and stemmed versions of each term into each of the target languages and thus define respective sets of one or more equivalent query terms; and
to identify a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (18, 19, 20)
-
Specification