Systems, methods, software, and interfaces for multilingual information retrieval
First Claim
1. A method computer-implemented comprising:
- defining a set of one or more language-specific indices, in at least one data-storage device, for a collection of documents, with each index including stemmed and non-stemmed versions of terms contained in the documents;
receiving a query from a user, with the query associated with a set of one or more target languages;
parsing the query into one or more terms, using at least one processor, with each term associated with a corresponding language identifier and a stemmed version of the term;
translating the original and stemmed versions of each term, using at least one processor, into each of the target languages, using at least one processor, to define respective sets of one or more equivalent query terms; and
identifying a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language.
5 Assignments
0 Petitions
Accused Products
Abstract
The present inventors have a devised one or more novel methods, systems, and interfaces for facilitating multi-lingual searches. One exemplary method entails creating multiple language-specific indices for a collection of documents, with each index including stemmed and non-stemmed versions of terms from the documents. Users submit queries that are associated with a set of one or more target languages. Query processing entails translating original and stemmed versions of each term in a query into each of the target languages, using one or more techniques that each yield a set of potentially equivalent query terms. Each set of potentially equivalent query terms is then processed against the corresponding language-specific index, using a conventional monolingual search technique, such as a Boolean or natural language query, to identify documents from the collection. The resultant documents are presented to the user in language groupings or by computed relevance.
-
Citations
25 Claims
-
1. A method computer-implemented comprising:
-
defining a set of one or more language-specific indices, in at least one data-storage device, for a collection of documents, with each index including stemmed and non-stemmed versions of terms contained in the documents; receiving a query from a user, with the query associated with a set of one or more target languages; parsing the query into one or more terms, using at least one processor, with each term associated with a corresponding language identifier and a stemmed version of the term; translating the original and stemmed versions of each term, using at least one processor, into each of the target languages, using at least one processor, to define respective sets of one or more equivalent query terms; and identifying a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented system comprising:
-
a collection of documents; a set of one or more language-specific indices for the collection of documents, with each index including stemmed and non-stemmed versions of terms contained in the documents; and a server for interacting with the collection of documents and the set of language-specific indices, with the server configured; to receive a query from a user, with the query associated with a set of one or more target languages; to parse the query into one or more terms, with each term associated with a corresponding language identifier and a stemmed version of the term; to translate the original and stemmed versions of each term into each of the target languages and thus define respective sets of one or more equivalent query terms; and to identify a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A server for interacting with a collection of documents and a set of language-specific indices, with the server configured:
-
to receive a query from a user, with the query associated with a set of one or more target languages; to parse the query into one or more terms, using at least one processor, with each term associated with a corresponding language identifier and a stemmed version of the term; to translate the original and stemmed versions of each term, using at least one processor, into each of the target languages and thus define respective sets of one or more equivalent query terms; and to identify a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A non-transitory machine-readable medium for causing a server to interact with a collection of documents and a set of language-specific indices, with the medium comprising instructions for causing the server:
-
to receive a query from a user, with the query associated with a set of one or more target languages; to parse the query into one or more terms, with each term associated with a corresponding language identifier and a stemmed version of the term; to translate the original and stemmed versions of each term into each of the target languages and thus define respective sets of one or more equivalent query terms; and to identify a set of documents from the collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (18, 19, 20)
-
-
21. A system comprising:
-
a set of one or more language-specific indices for a collection of documents, with each index including stemmed and non-stemmed versions of terms contained in the documents; a computer comprising a processor and a non-transitory memory, the memory comprising instructions when executed by the processor configured to; receive a query from a client access device, with the query associated with a set of one or more target languages; parse the query into one or more original terms, with each term associated with a corresponding language identifier and a stemmed version of the term; translate the original and stemmed versions of each term into each of the target languages to define respective sets of one or more equivalent query terms; and identify a set of documents from a collection of documents for each of the target languages, with each set identified based on the equivalent query terms for the corresponding target language. - View Dependent Claims (22, 23, 24, 25)
-
Specification