CROSS-LINGUAL INFORMATION RETRIEVAL
First Claim
1. A method for identifying content, comprising:
- generating an equivalency list that includes a secondary-language query term associated with at least one primary-language query term, wherein each of the at least one primary-language query term is in a pre-selected language and the secondary-language term is in a language that is different from the pre-selected primary language;
receiving the secondary-language query term in a search request;
selecting the at least one primary-language query term from the equivalency list, based on the secondary-language query term;
identifying digital content that is associated with structured metadata, if the at least one primary-language query term is included in the structured metadata; and
identifying digital content that is associated with unstructured metadata, if the at least one primary-language query term is included in the unstructured metadata and is not a unique identifier of a defined term in a controlled vocabulary.
3 Assignments
0 Petitions
Accused Products
Abstract
Multi-lingual search and retrieval of digital content. Embodiments are generally directed to methods and systems for creating an English language database that associates non-English terms with English terms in multiple categories of metadata. Language experts use an interface to create equivalencies between non-English terms and English terms, Boolean expressions, synonyms, and other forms of search terms. Language dictionaries and other sources also create equivalencies. The database is used to evaluate non-English search terms submitted by a user, and to determine English search terms that can be used to perform a search for content. The multiple categories of metadata may comprise structured data, such as keywords of a structured vocabulary, and/or unstructured data, such as captions, titles, descriptions, etc. Weighting and/or prioritization can be applied to the search terms, to the process of searching the multiple categories, and/or to the search results, to rank the search results.
-
Citations
30 Claims
-
1. A method for identifying content, comprising:
-
generating an equivalency list that includes a secondary-language query term associated with at least one primary-language query term, wherein each of the at least one primary-language query term is in a pre-selected language and the secondary-language term is in a language that is different from the pre-selected primary language; receiving the secondary-language query term in a search request; selecting the at least one primary-language query term from the equivalency list, based on the secondary-language query term; identifying digital content that is associated with structured metadata, if the at least one primary-language query term is included in the structured metadata; and identifying digital content that is associated with unstructured metadata, if the at least one primary-language query term is included in the unstructured metadata and is not a unique identifier of a defined term in a controlled vocabulary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for identifying content, comprising:
-
a translations generator that performs a plurality of operations including; generating an equivalency list that includes a secondary-language query term associated with at least one primary-language query term, wherein each of the at least one primary-language query term is in a pre-selected language and the secondary-language term is in a language that is different from the pre-selected primary language; receiving the secondary-language query term in a search request; and selecting the at least one primary-language query term from the equivalency list, based on the secondary-language query term; and a search engine that performs a plurality of operations, including; identifying digital content that is associated with structured metadata, if the at least one primary-language query term is included in the structured metadata; and identifying digital content that is associated with unstructured metadata, if the at least one primary-language query term is included in the unstructured metadata and is not a unique identifier of a defined term in a controlled vocabulary. - View Dependent Claims (15, 16)
-
-
17. A method for associating terms for identifying content;
- comprising;
associating a secondary-language term with a controlled vocabulary keyword in a primary-language, if the secondary-language term has a unique meaning depending on a context; indicating that the secondary-language term exists in the primary language, if the secondary-language term is identical in the primary language; associating the secondary-language term with a synonym in the primary language, if the secondary-language term is synonymous with the synonym; and associating the secondary-language term with a Boolean expression, if a meaning of the secondary-language term can be expressed by a combination of primary-language terms. - View Dependent Claims (18, 19)
- comprising;
-
20. A method for generating a list for identifying content, comprising:
-
receiving a subset of equivalencies comprising a plurality of secondary-language terms that are associated with a primary-language term; parsing the subset into a list of equivalencies, wherein each equivalency comprises an association of at least one of the plurality of secondary-language terms with the primary-language term; associating a unique identifier with the primary-language term in at least one equivalency of the list, if at least one of the plurality of secondary-language terms has a limited meaning that is associated with the unique identifier; adding at least one of the secondary-language terms to at least one equivalency in the list, if the primary-language term is identical to the at least one secondary-language term, wherein the at least one secondary-language term is one of the plurality of secondary-language terms; adding a primary-language lead-in term to at least one equivalency in the list, if at least one of the plurality of secondary-language terms is synonymous with the primary-language lead-in term; and adding a Boolean expression to at least one equivalency in the list, if the least one of the plurality of secondary-language terms is associated with a combination of terms in the primary language. - View Dependent Claims (21, 22, 23)
-
-
24. A system for generating a list for identifying content, comprising:
-
a parser that performs a plurality of operations, including; receiving a subset of equivalencies comprising a plurality of secondary-language terms that are associated with a primary-language term; and parsing the subset into a list of equivalencies, wherein each equivalency comprises an association of at least one of the plurality of secondary-language terms with the primary-language term; and a list generator that performs a plurality of operations, including; associating a unique identifier with the primary-language term in at least one equivalency of the list, if at least one of the plurality of secondary-language terms has a limited meaning that is associated with the unique identifier; adding a nonprimary-language term to at least one equivalency in the list, if the primary-language term is identical to the nonprimary-language term, wherein the nonprimary-language term is one of the plurality of secondary-language terms; adding a primary-language lead-in term to at least one equivalency in the list, if at least one of the plurality of secondary-language terms is synonymous with the primary-language lead-in term; and adding a Boolean expression to at least one equivalency in the list, if the least one of the plurality of secondary-language terms is associated with a combination of terms in the primary language. - View Dependent Claims (25)
-
-
26. A method for determining a query to identify content, comprising:
-
receiving a first equivalency between; a primary-language query term in a primary language; and a user-specified secondary-language query term in a secondary language; receiving a second equivalency between the primary-language query term and an alternate secondary-language query term in the secondary language; determining whether to apply a unique identifier to either of the user-specified secondary-language query term or the alternate secondary-language query term, wherein the unique identifier refines the meaning of a query term and indicates a structured query term; receiving a search query in the secondary language; and determining the primary-language query term based at least in part on the search query, the user-specified secondary-language query term, and the alternate secondary-language query term. - View Dependent Claims (27, 28, 29, 30)
-
Specification