Translating terms using numeric representations
First Claim
1. A method comprising:
- maintaining data that associates each term in a vocabulary of terms in a first language with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein positions of high-dimensional representations of terms from the vocabulary of terms in the first language in the high-dimensional space reflect syntactic similarities, semantic similarities, or both between the terms from the vocabulary of terms in the first language;
maintaining data that associates each term in a vocabulary of terms in a second language with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in the high-dimensional space, and wherein positions of high-dimensional representations of terms from the vocabulary of terms in the second language in the high-dimensional space reflect syntactic similarities, semantic similarities, or both between the terms from the vocabulary of terms in the second language;
receiving a first language term, wherein the first language term is a term from the vocabulary of terms in the first language; and
determining a translation into the second language of the first language term from the high-dimensional representation of the first language term and the high-dimensional representations of terms in the vocabulary of terms in the second language, wherein determining the translation into the second language of the first language term comprises;
identifying a high-dimensional representation of the first language term;
applying a transformation to the high-dimensional representation of the first language term to generate a transformed representation, wherein applying the transformation to the high-dimensional representation of the first language term comprises applying the transformation in accordance with trained values of a set of parameters, the trained values of the set of parameters having been determined through applying a machine learning training procedure on training terms in the first language and a respective translation of each of the training terms into the second language;
selecting, from the high-dimensional representations of the terms in the vocabulary of terms in the second language, a closest high-dimensional representation to the transformed representation; and
selecting the term in the second language that is associated with the closest high-dimensional representation as the translation into the second language of the first language term.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for translating terms using numeric representations. One of the methods includes obtaining data that associates each term in a vocabulary of terms in a first language with a respective high-dimensional representation of the term; obtaining data that associates each term in a vocabulary of terms in a second language with a respective high-dimensional representation of the term; receiving a first language term; and determining a translation into the second language of the first language term from the high-dimensional representation of the first language term and the high-dimensional representations of terms in the vocabulary of terms in the second language.
58 Citations
16 Claims
-
1. A method comprising:
-
maintaining data that associates each term in a vocabulary of terms in a first language with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein positions of high-dimensional representations of terms from the vocabulary of terms in the first language in the high-dimensional space reflect syntactic similarities, semantic similarities, or both between the terms from the vocabulary of terms in the first language; maintaining data that associates each term in a vocabulary of terms in a second language with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in the high-dimensional space, and wherein positions of high-dimensional representations of terms from the vocabulary of terms in the second language in the high-dimensional space reflect syntactic similarities, semantic similarities, or both between the terms from the vocabulary of terms in the second language; receiving a first language term, wherein the first language term is a term from the vocabulary of terms in the first language; and determining a translation into the second language of the first language term from the high-dimensional representation of the first language term and the high-dimensional representations of terms in the vocabulary of terms in the second language, wherein determining the translation into the second language of the first language term comprises; identifying a high-dimensional representation of the first language term; applying a transformation to the high-dimensional representation of the first language term to generate a transformed representation, wherein applying the transformation to the high-dimensional representation of the first language term comprises applying the transformation in accordance with trained values of a set of parameters, the trained values of the set of parameters having been determined through applying a machine learning training procedure on training terms in the first language and a respective translation of each of the training terms into the second language; selecting, from the high-dimensional representations of the terms in the vocabulary of terms in the second language, a closest high-dimensional representation to the transformed representation; and selecting the term in the second language that is associated with the closest high-dimensional representation as the translation into the second language of the first language term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising:
-
maintaining data that associates each term in a vocabulary of terms in a first language with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein positions of high-dimensional representations of terms from the vocabulary of terms in the first language in the high-dimensional space reflect syntactic similarities, semantic similarities, or both between the terms from the vocabulary of terms in the first language; maintaining data that associates each term in a vocabulary of terms in a second language with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in the high-dimensional space, and wherein positions of high-dimensional representations of terms from the vocabulary of terms in the second language in the high-dimensional space reflect syntactic similarities, semantic similarities, or both between the terms from the vocabulary of terms in the second language; receiving a first language term, wherein the first language term is a term from the vocabulary of terms in the first language; and determining a translation into the second language of the first language term from the high-dimensional representation of the first language term and the high-dimensional representations of terms in the vocabulary of terms in the second language, wherein determining the translation into the second language of the first language term comprises; identifying a high-dimensional representation of the first language term; applying a transformation to the high-dimensional representation of the first language term to generate a transformed representation, wherein applying the transformation to the high-dimensional representation of the first language term comprises applying the transformation in accordance with trained values of a set of parameters, the trained values of the set of parameters having been determined through applying a machine learning training procedure on training terms in the first language and a respective translation of each of the training terms into the second language; selecting, from the high-dimensional representations of the terms in the vocabulary of terms in the second language, a closest high-dimensional representation to the transformed representation; and selecting the term in the second language that is associated with the closest high-dimensional representation as the translation into the second language of the first language term. - View Dependent Claims (13, 14, 15)
-
-
16. A computer program product encoded on one or more non-transitory storage media, the computer program product comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
maintaining data that associates each term in a vocabulary of terms in a first language with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein positions of high-dimensional representations of terms from the vocabulary of terms in the first language in the high-dimensional space reflect syntactic similarities, semantic similarities, or both between the terms from the vocabulary of terms in the first language; maintaining data that associates each term in a vocabulary of terms in a second language with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in the high-dimensional space, and wherein positions of high-dimensional representations of terms from the vocabulary of terms in the second language in the high-dimensional space reflect syntactic similarities, semantic similarities, or both between the terms from the vocabulary of terms in the second language; receiving a first language term, wherein the first language term is a term from the vocabulary of terms in the first language; and determining a translation into the second language of the first language term from the high-dimensional representation of the first language term and the high-dimensional representations of terms in the vocabulary of terms in the second language, wherein determining the translation into the second language of the first language term comprises; identifying a high-dimensional representation of the first language term; applying a transformation to the high-dimensional representation of the first language term to generate a transformed representation, wherein applying the transformation to the high-dimensional representation of the first language term comprises applying the transformation in accordance with trained values of a set of parameters, the trained values of the set of parameters having been determined through applying a machine learning training procedure on training terms in the first language and a respective translation of each of the training terms into the second language; selecting, from the high-dimensional representations of the terms in the vocabulary of terms in the second language, a closest high-dimensional representation to the transformed representation; and selecting the term in the second language that is associated with the closest high-dimensional representation as the translation into the second language of the first language term.
-
Specification