Cross-lingual information retrieval
First Claim
1. A method for identifying digital content with a client computer, the method enabling operations, comprising:
- generating an equivalency list with a translations generator in communication with the client computer, wherein the list is based on a secondary-language query term associated with at least one primary-language query term, and wherein each of the at least one primary-language query term is in a pre-selected language and the secondary-language term is in a language that is different from the pre-selected primary language;
receiving the secondary-language query term in a search request for a search engine that is in communication with the client computer;
selecting the at least one primary-language query term from the equivalency list, based on the received secondary-language query term;
identifying digital content that is associated with structured text metadata, if the at least one primary-language query term is included in the structured text metadata; and
identifying digital content that corresponds to unstructured free-text metadata, if the at least one primary-language query term is included in the corresponding unstructured free-text metadata and is not a unique identifier of a defined term in a controlled vocabulary.
3 Assignments
0 Petitions
Accused Products
Abstract
Multi-lingual search and retrieval of digital content. Embodiments are generally directed to methods and systems for creating an English language database that associates non-English terms with English terms in multiple categories of metadata. Language experts use an interface to create equivalencies between non-English terms and English terms, Boolean expressions, synonyms, and other forms of search terms. Language dictionaries and other sources also create equivalencies. The database is used to evaluate non-English search terms submitted by a user, and to determine English search terms that can be used to perform a search for content. The multiple categories of metadata may comprise structured data, such as keywords of a structured vocabulary, and/or unstructured data, such as captions, titles, descriptions, etc. Weighting and/or prioritization can be applied to the search terms, to the process of searching the multiple categories, and/or to the search results, to rank the search results.
96 Citations
29 Claims
-
1. A method for identifying digital content with a client computer, the method enabling operations, comprising:
-
generating an equivalency list with a translations generator in communication with the client computer, wherein the list is based on a secondary-language query term associated with at least one primary-language query term, and wherein each of the at least one primary-language query term is in a pre-selected language and the secondary-language term is in a language that is different from the pre-selected primary language; receiving the secondary-language query term in a search request for a search engine that is in communication with the client computer; selecting the at least one primary-language query term from the equivalency list, based on the received secondary-language query term; identifying digital content that is associated with structured text metadata, if the at least one primary-language query term is included in the structured text metadata; and identifying digital content that corresponds to unstructured free-text metadata, if the at least one primary-language query term is included in the corresponding unstructured free-text metadata and is not a unique identifier of a defined term in a controlled vocabulary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for identifying digital content with a client computer, comprising:
-
a translations generator that is in communication with the client computer, the translations generator is arranged to perform a plurality of operations including; generating an equivalency list based on a secondary-language query term associated with at least one primary-language query term, wherein each of the at least one primary-language query term is in a pre-selected language and the secondary-language term is in a language that is different from the pre-selected primary language; a translating machine that is in communication with the client computer, the translations generator, and a search engine, the translating machine is arranged to perform a plurality of operations including; receiving the secondary-language query term in a search request for the search engine; and selecting the at least one primary-language query term from the equivalency list, based on the secondary-language query term; and the search engine that performs a plurality of operations, including; identifying digital content that is associated with structured text metadata, if the at least one primary-language query term is included in the structured text metadata; and identifying digital content that corresponds to unstructured free-text metadata, if the at least one primary-language query term is included in the corresponding unstructured free-text metadata and is not a unique identifier of a defined term in a controlled vocabulary. - View Dependent Claims (15, 16)
-
-
17. A method for associating terms in an equivalency list for identifying digital content with a client computer in communication with a translations generator, the translations generator is arranged to perform a plurality of operations;
- comprising;
associating a secondary-language term with a controlled vocabulary keyword in a primary-language, if the secondary-language term has a unique meaning depending on a context; indicating that the secondary-language term exists in the primary language, if the secondary-language term is identical in the primary language; associating the secondary-language term with a synonym in the primary language, if the secondary-language term is synonymous with the synonym; designating the secondary-language term as a primary translation based on a primary-language term; and associating the secondary-language term with a Boolean expression, if a meaning of the secondary-language term can be expressed by a combination of primary-language terms. - View Dependent Claims (18, 19)
- comprising;
-
20. A method for generating a list for identifying digital content with a client computer in communication with a translations generator, the translations generator is arranged to perform a plurality of operations, comprising:
-
receiving a subset of equivalencies comprising a plurality of secondary-language terms that are associated with a primary-language term; parsing the subset into a list of equivalencies, wherein each equivalency comprises an association of at least one of the plurality of secondary-language terms with the primary-language term; associating a unique identifier with the primary-language term in at least one equivalency of the list, if at least one of the plurality of secondary-language terms has a limited meaning that is associated with the unique identifier; adding at least one of the secondary-language terms to at least one equivalency in the list, if the primary-language term is identical to the at least one secondary-language term, wherein the at least one secondary-language term is one of the plurality of secondary-language terms; adding a primary-language lead-in term to at least one equivalency in the list, if at least one of the plurality of secondary-language terms is synonymous with the primary-language lead-in term; designating one of the plurality of secondary-language terms as a primary translation based on the primary-language term; and adding a Boolean expression to at least one equivalency in the list, if the least one of the plurality of secondary-language terms is associated with a combination of terms in the primary language. - View Dependent Claims (21, 22)
-
-
23. A system for generating a list for identifying digital content with a client computer, comprising:
-
a parser that is in communication with the client computer, the parser is arranged to perform a plurality of operations, including; receiving a subset of equivalencies comprising a plurality of secondary-language terms that are associated with a primary-language term; and parsing the subset into a list of equivalencies based on each equivalency comprising an association of at least one of the plurality of secondary-language terms with the primary-language term; and a list generator that is in communication with the client computer and the parser, the list generator is arranged to perform a plurality of operations, including; associating a unique identifier with the primary-language term in at least one equivalency of the list, if at least one of the plurality of secondary-language terms has a limited meaning that is associated with the unique identifier; adding a nonprimary-language term to at least one equivalency in the list, if the primary-language term is identical to the nonprimary-language term, wherein the nonprimary-language term is one of the plurality of secondary-language terms; adding a primary-language lead-in term to at least one equivalency in the list, if at least one of the plurality of secondary-language terms is synonymous with the primary-language lead-in term; designating one of the plurality of secondary-language terms as a primary translation based on the primary-language term; and adding a Boolean expression to at least one equivalency in the list, if the least one of the plurality of secondary-language terms is associated with a combination of terms in the primary language. - View Dependent Claims (24)
-
-
25. A method for determining a query to identify digital content with a client computer, the method enabling operations, comprising:
-
receiving a first equivalency between; a primary-language query term in a primary language; and a user-specified secondary-language query term in a secondary language; receiving a second equivalency between the primary-language query term and an alternate secondary-language query term in the secondary language; determining whether to apply a unique identifier to either of the user-specified secondary-language query term or the alternate secondary-language query term with a translations generator in communication with the client computer, wherein the unique identifier refines the meaning of a query term and indicates a structured query term; designating a primary translation as one of the user-specified secondary-language query term and the alternate secondary-language query term based on the primary-language query term; receiving a search query in the secondary language; and determining the primary-language query term with a translating machine in communication with the client computer and the translations generator, wherein the determination is based at least in part on the search query, the user-specified secondary-language query term, and the alternate secondary-language query term. - View Dependent Claims (26, 27, 28, 29)
-
Specification