System and method for cross-language knowledge searching
First Claim
1. A computer-based method for cross-language knowledge searching, the method implemented by at least one computer processor accessing at least one knowledge base comprising sources in a first language and sources in a second language and a bilingual dictionary stored in at least one storage device, the method comprising:
- building the bilingual dictionary using parallel corpora, including;
for each sentence in a first source in the first language, generating a first source semantic index in the first language;
for each sentence in a second source in the second language, generating a second source semantic index in the second language, where the second source is a translation of the first source and each first source semantic index and corresponding second source semantic index form parallel semantic indexes having parallel eSAO component pairs; and
recognizing semantic components in an input expression in the first language;
generating a first semantic index in the first language from the semantic components, wherein the first semantic index includes first lexical units, at least one first lexical unit comprising a word with a part of speech (POS) tag;
translating the first semantic index into a second semantic index in the second language using a bilingual dictionary of actions and concepts, including translating the first lexical units into second lexical units in the second language, and translating a first word from the first semantic index into corresponding words in the second language and tagging each of the corresponding words with a POS tag of the first word; and
retrieving information relevant to the input expression from a knowledge base, which includes semantically indexed information in the second language, when the first and second semantic indexes match a subset of semantic indexes of the knowledge base associated with the information.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for cross-language knowledge searching. The system has a Semantic Analyzer, a natural language user request/document search pattern/semantic index Generator, a user request search pattern Translator and a Knowledge Base Searcher. The system also provides automatic semantic analysis and semantic indexing of natural language user requests/documents on knowledge recognition and cross-language relevant to user request knowledge extraction/searching. System functionality is ensured by Linguistic Knowledge Base as well as by a number of unique bilingual dictionaries of concepts/objects and actions.
86 Citations
28 Claims
-
1. A computer-based method for cross-language knowledge searching, the method implemented by at least one computer processor accessing at least one knowledge base comprising sources in a first language and sources in a second language and a bilingual dictionary stored in at least one storage device, the method comprising:
-
building the bilingual dictionary using parallel corpora, including; for each sentence in a first source in the first language, generating a first source semantic index in the first language; for each sentence in a second source in the second language, generating a second source semantic index in the second language, where the second source is a translation of the first source and each first source semantic index and corresponding second source semantic index form parallel semantic indexes having parallel eSAO component pairs; and recognizing semantic components in an input expression in the first language; generating a first semantic index in the first language from the semantic components, wherein the first semantic index includes first lexical units, at least one first lexical unit comprising a word with a part of speech (POS) tag; translating the first semantic index into a second semantic index in the second language using a bilingual dictionary of actions and concepts, including translating the first lexical units into second lexical units in the second language, and translating a first word from the first semantic index into corresponding words in the second language and tagging each of the corresponding words with a POS tag of the first word; and retrieving information relevant to the input expression from a knowledge base, which includes semantically indexed information in the second language, when the first and second semantic indexes match a subset of semantic indexes of the knowledge base associated with the information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable medium having computer-executable instructions for performing a method for cross-language knowledge searching when executed by at least one processor, the method comprising:
-
accessing at least one knowledge base comprising sources in a source language and sources in a target language and a bilingual dictionary stored in at least one storage device; building the bilingual dictionary using parallel corpora, including; for each sentence in a first source in the source language, generating a first source semantic index in the source language; for each sentence in a second source in the target language, generating a second source semantic index in the target language, where the second source is a translation of the first source and each first source semantic index and corresponding second source semantic index form parallel semantic indexes having parallel eSAO component pairs; recognizing semantic components in a user request received in a source language; generating a first semantic index in the source language from the semantic components, the first semantic index including first lexical units, at least one lexical unit comprising a word with a part of speech (POS) tag; translating the first semantic index into a second semantic index in a target language using a bilingual dictionary of actions and concepts, including translating the first lexical units into second lexical units in the target language, and translating a first word from the first semantic index into corresponding words in the target language and tagging each of the corresponding words with a POS tag of the first word; and retrieving information relevant to the user request from the knowledge base that includes semantically indexed information in the target language, when the first and second semantic indexes match a subset of semantic indexes of the knowledge base associated with the information.
-
-
16. A computerized cross-language knowledge search system, comprising:
-
a bilingual dictionary builder that uses parallel corpora, comprising; a first semantic analyzer configured to generate, for each sentence in a first document in a first language, a first document semantic index including eSAO components in the first language; a second semantic analyzer configured to generate, for each sentence in a second document in a second language, a second document semantic index including eSAO components in the second language, where the second document is a translation of the first document and each first document semantic index and corresponding second document semantic index form parallel semantic indexes having parallel eSAO component pairs; the first semantic analyzer also configured to recognize semantic components in a user request received in the first language, a request pattern index generator configured to generate a first semantic index in the first language from the semantic components of the user request, the first semantic index including first lexical units, at least one lexical unit comprising a word with a part of speech (POS) tag; a request pattern translator that accesses a bilingual dictionary of actions and concepts to translate the first semantic index into a second semantic index in the second language, including translating the first lexical units into second lexical units in the second language, and translating a first word from the first semantic index into corresponding words in the second language and tagging each of the corresponding words with a POS tag of the first word; and a knowledge base searcher configured to retrieve information from a knowledge base that includes semantically indexed information in the second language, when the first and second semantic indexes match a subset of semantic indexes of the knowledge base associated with the information. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification