Machine translation using vector space representations
First Claim
1. A method for automatically translating text, comprising:
- (a) generating a conceptual representation space based on source-language documents and target-language documents, wherein respective terms from the source-language documents and the target-language documents have a representation in the conceptual representation space;
(b) representing a new source-language document in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space; and
(c) automatically translating a term in the new source-language document into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.
4 Assignments
0 Petitions
Accused Products
Abstract
An embodiment of the present invention provides a method for automatically translating text. First, a conceptual representation space is generated based on source-language documents and target-language documents, wherein respective terms from the source-language and target-language documents have a representation in the conceptual representation space. Second, a new source-language document is represented in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space. Then, a term in the new source-language document is automatically translated into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term.
98 Citations
40 Claims
-
1. A method for automatically translating text, comprising:
-
(a) generating a conceptual representation space based on source-language documents and target-language documents, wherein respective terms from the source-language documents and the target-language documents have a representation in the conceptual representation space;
(b) representing a new source-language document in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space; and
(c) automatically translating a term in the new source-language document into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for automatically translating text based on a disambiguation of text at a dictionary-level, comprising:
-
generating a conceptual representation space based on source-language documents and target-language documents;
providing a plurality of dictionaries;
generating a representation of each dictionary in the conceptual representation space;
representing a new source-language document in the conceptual representation space;
selecting a first dictionary from the collection of dictionaries based on a similarity between the representation of the first dictionary and the representation of the new source-language document; and
automatically translating at least one term in the new source-language document into a corresponding target-language term based on the first dictionary. - View Dependent Claims (13, 14)
-
-
15. A method for producing a machine translation of a text passage based on a combination of a plurality of translations of the text passage, comprising:
-
(a) generating a conceptual representation space based on a collection of source-language documents and a collection of target-language documents;
(b) providing a plurality of translations of a text passage;
(c) generating a representation of each translation in the conceptual representation space; and
(d) automatically translating the text passage based on similarity comparisons among the representations of the translations. - View Dependent Claims (16)
-
-
17. A method for generating a parallel corpus of documents, comprising:
-
(a) generating a conceptual representation space based on a collection of source-language documents and a collection of target-language documents, wherein each target-language document in the collection of target-language documents comprises a translation of a source-language document in the collection of source-language documents;
(b) providing a new collection of documents, including both source-language documents and target-language documents;
(c) generating a representation of each document in the new collection of documents in the conceptual representation space;
(d) identifying a collection of parallel documents based on similarity comparisons among the representations in the conceptual representation space; and
(e) combining the collection of source-language documents and the collection of target-language documents with the collection of parallel documents identified in step (d) resulting in a combined collection of documents, and generating a new conceptual representation space based on the combined collection of documents, wherein the new conceptual representation space is stored in an electronic format. - View Dependent Claims (18, 19)
-
-
20. A method for automatically translating text, comprising:
-
(a) generating a conceptual representation space based on source-language documents and target-language documents, wherein respective terms from the source-language documents and the target-language documents have a representation in the conceptual representation space;
(b) measuring a similarity between at least one pair of terms based on the representations of terms included in the at least one pair of terms, wherein the at least one pair of terms includes a term from at least one of the source-language documents and a term from at least one of the target-language documents;
(c) converting the similarity to an association probability; and
(d) using the association probability as an estimate of a parameter in a statistical translation algorithm.
-
-
21. A computer program product comprising a computer usable medium having control logic stored therein for automatically translating text, the control logic comprising:
-
computer readable first program code that causes the computer to generate a conceptual representation space based on source-language documents and target-language documents, wherein respective terms from the source-language documents and the target-language documents have a representation in the conceptual representation space;
computer readable second program code that causes the computer to represent a new source-language document in the conceptual representation space, wherein a subset of terms in the new source-language document is represented in the conceptual representation space, such that each term in the subset has a representation in the conceptual representation space; and
computer readable third program code that causes the computer to automatically translate a term in the new source-language document into a corresponding target-language term based on a similarity between the representation of the term and the representation of the corresponding target-language term. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer program product comprising a computer usable medium having control logic stored therein for automatically translating text based on a disambiguation of text at a dictionary-level, the control logic comprising:
-
computer readable first program code that causes the computer to generate a conceptual representation space based on source-language documents and target-language documents;
computer readable second program code that causes the computer to provide a plurality of dictionaries;
computer readable third program code that causes the computer to generate a representation of each dictionary in the conceptual representation space;
computer readable fourth program code that causes the computer to represent a new source-language document in the conceptual representation space;
computer readable fifth program code that causes the computer to select a first dictionary from the collection of dictionaries based on a similarity between the representation of the first dictionary and the representation of the new source-language document; and
computer readable sixth program code that causes the computer to automatically translate a term in the new source-language document into a corresponding target-language term based on the first dictionary. - View Dependent Claims (33, 34)
-
-
35. A computer program product comprising a computer usable medium having control logic stored therein for producing a machine translation of a text passage based on a combination of a plurality of translations of the text passage, the control logic comprising:
-
computer readable first program code that causes the computer to generate a conceptual representation space based on a collection of source-language documents and a collection of target-language documents;
computer readable second program code that causes the computer to provide a plurality of translations of a text passage;
computer readable third program code that causes the computer to generate a representation of each translation in the conceptual representation space; and
computer readable fourth program code that causes the computer to automatically translate the text passage based on similarity comparisons among the representations of the translations. - View Dependent Claims (36)
-
-
37. A computer program product comprising a computer usable medium having control logic stored therein for generating a parallel corpus of documents, the control logic comprising:
-
computer readable first program code that causes the computer to generate a conceptual representation space based on a collection of source-language documents and a collection of target-language documents, wherein each target-language document in the collection of target-language documents comprises a translation of a source-language document in the collection of source-language documents;
computer readable second program code that causes the computer to provide a new collection of documents, including both source-language documents and target-language documents;
computer readable third program code that causes the computer to generate a representation of each document in the new collection of documents in the conceptual representation space;
computer readable fourth program code that causes the computer to identify a collection of parallel documents based on similarity comparisons among the representations in the conceptual representation space; and
computer readable fifth program code that causes the computer to combine the collection of source-language documents and the collection of target-language documents with the collection of parallel documents identified by the computer readable fourth program code resulting in a combined collection of documents, and that causes the computer to generate a new conceptual representation space based on the combined collection of documents, wherein the new conceptual representation space is stored in an electronic format. - View Dependent Claims (38, 39)
-
-
40. A computer program product comprising a computer usable medium having control logic stored therein for automatically translating text, the control logic comprising:
-
computer readable first program code that causes the computer to generate a conceptual representation space based on source-language documents and target-language documents, wherein respective terms from the source-language documents and the target-language documents have a representation in the conceptual representation space;
computer readable second program code that causes the computer to measure a similarity between at least one pair of terms based on the representations of terms included in the at least one pair of terms, wherein the at least one pair of terms includes a term from at least one of the source-language documents and a term from at least one of the target-language documents;
computer readable third program code that causes the computer to convert the similarity to an association probability; and
computer readable fourth program code that causes the computer to use the association probability as an estimate of a parameter in a statistical translation algorithm.
-
Specification