Bootstrapping named entity canonicalizers from English using alignment models
First Claim
Patent Images
1. A computer-implemented method, the method comprising:
- receiving a set of acceptable expressions, each acceptable expression being a string that identifies a value of a variable entity in a first natural language, each acceptable expression being associated with a canonical representation of the value identified by that expression;
performing, a first machine translator that translates expressions from the first natural language to a second natural language, machine translation on each acceptable expression in the first natural language to obtain a translated expression of the acceptable expression in the second natural language;
associating the canonical representation associated with each acceptable expression with the corresponding translated expression in the second natural language;
providing a set of training data for training a second machine translator that translates expressions in the second natural language that each include a respective translated expression to expressions in the second natural language that each include a respective canonical representation, the set of training data comprising the translated expressions and the canonical representations that are associated with the translated expressions; and
using the second machine translator to translate a particular expression that includes a particular translated expression into a particular translated expression that includes a particular canonical representation.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training recognition canonical representations corresponding to named-entity phrases in a second natural language based on translating a set of allowable expressions with canonical representations from a first natural language, which may be generated by expanding a context-free grammar for the allowable expressions for the first natural language.
-
Citations
20 Claims
-
1. A computer-implemented method, the method comprising:
-
receiving a set of acceptable expressions, each acceptable expression being a string that identifies a value of a variable entity in a first natural language, each acceptable expression being associated with a canonical representation of the value identified by that expression; performing, a first machine translator that translates expressions from the first natural language to a second natural language, machine translation on each acceptable expression in the first natural language to obtain a translated expression of the acceptable expression in the second natural language; associating the canonical representation associated with each acceptable expression with the corresponding translated expression in the second natural language; providing a set of training data for training a second machine translator that translates expressions in the second natural language that each include a respective translated expression to expressions in the second natural language that each include a respective canonical representation, the set of training data comprising the translated expressions and the canonical representations that are associated with the translated expressions; and using the second machine translator to translate a particular expression that includes a particular translated expression into a particular translated expression that includes a particular canonical representation. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a data processing apparatus; and a data store storing instructions executable by the data processing apparatus that upon execution by the data processing apparatus cause the data processing apparatus to perform operations comprising; receiving a set of acceptable expressions, each acceptable expression being a string that identifies a value of a variable entity in a first natural language, each acceptable expression being associated with a canonical representation of the value identified by that expression; performing, using a first machine translator that translates expressions from the first natural language to a second natural language, machine translation on each acceptable expression in the first natural language to obtain a translated expression of the acceptable expression in the second natural language; associating the canonical representation associated with each acceptable expression with the corresponding translated expression in the second natural language; providing a set of training data for training a second machine translator that translates expressions in the second natural language that each include a respective translated expression to expressions in the second natural language that each include a respective canonical representation, the set of training data comprising the translated expressions and the canonical representations that are associated with the translated expressions; and using the second machine translator to translate a particular expression that includes a particular translated expression into a particular translated expression that includes a particular canonical representation. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer storage medium encoded with a computer program, the program comprising instructions that when executed by a data processing apparatus cause the data processing apparatus to perform operations comprising:
-
receiving a set of acceptable expressions, each acceptable expression being a string that identifies a value of a variable entity in a first natural language, each acceptable expression being associated with a canonical representation of the value identified by that expression; performing, sing a first machine translator that translates expressions from the first natural language to a second natural language, machine translation on each acceptable expression in the first natural language to obtain a translated expression of the acceptable expression in the second natural language; associating the canonical representation associated with each acceptable expression with the corresponding translated expression in the second natural language; providing a set of training data for training a second machine translator that translates expressions in the second natural language that each include a respective translated expression to expressions in the second natural language that each include a respective canonical representation, the set of training data comprising the translated expressions and the canonical representations that are associated with the translated expressions; and using the second machine translator to translate a particular expression that includes a particular translated expression into a particular translated expression that includes a particular canonical representation. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification