Named entity translation
First Claim
Patent Images
1. A method comprising:
- obtaining a named entity from text input of a source language;
generating potential translations of the named entity from the source language to a target language using a pronunciation-based and spelling-based transliteration model using a first probabilistic model to generate words in the target language and first transliteration scores for the words based on language pronunciation characteristics, using a second probabilistic model to generate second transliteration scores for the words based on a mapping of letter sequences from the target language into the source language, and combining the first transliteration scores and the second transliteration scores into third transliteration scores for the words;
searching a monolingual resource in the target language for information relating to usage frequency; and
providing output comprising at least one of the potential translations based on the usage frequency information.
1 Assignment
0 Petitions
Accused Products
Abstract
Translating named entities from a source language to a target language. In general, in one implementation, the technique includes: generating potential translations of a named entity from a source language to a target language using a pronunciation-based and spelling-based transliteration model, searching a monolingual resource in the target language for information relating to usage frequency, and providing output including at least one of the potential translations based on the usage frequency information.
116 Citations
27 Claims
-
1. A method comprising:
-
obtaining a named entity from text input of a source language; generating potential translations of the named entity from the source language to a target language using a pronunciation-based and spelling-based transliteration model using a first probabilistic model to generate words in the target language and first transliteration scores for the words based on language pronunciation characteristics, using a second probabilistic model to generate second transliteration scores for the words based on a mapping of letter sequences from the target language into the source language, and combining the first transliteration scores and the second transliteration scores into third transliteration scores for the words; searching a monolingual resource in the target language for information relating to usage frequency; and providing output comprising at least one of the potential translations based on the usage frequency information. - View Dependent Claims (2, 3)
-
-
4. A method comprising:
-
obtaining a named entity from text input of a source language by obtaining phrase boundaries of the named entity and by obtaining a category of the named entity; generating potential translations of the named entity from the source language to a target language using a pronunciation-based and spelling-based transliteration model, and selectively using a bilingual resource based on the category of the named entity; searching a monolingual resource in the target language for information relating to usage frequency; and providing output comprising at least one of the potential translations based on the usage frequency information. - View Dependent Claims (5, 6, 7)
-
-
8. A method comprising:
-
obtaining a named entity from text input of a source language; generating potential translations of the named entity from the source language to a target language using a pronunciation-based and spelling-based transliteration model; searching a monolingual resource in the target language for information relating to usage frequency; and providing output comprising at least one of the potential translations based on the usage frequency information and adjusting probability scores of the potential translations based on the usage frequency, wherein adjusting the probability scores comprises comparing the named entity with other named entities of a common type in the text input and, if the named entity is a sub-phrase of one of the other named entities, adjusting the probability scores based on normalized full-phrase hit counts corresponding to the one other named entity. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method comprising:
-
obtaining a named entity from text input of a source language; identifying contextual information in the text input; generating potential translations of the named entity from the source language to a target language using a pronunciation-based and spelling-based transliteration model;
by discovering documents in the target language that include the contextual information, identifying named entities in the documents, generating transliteration scores for the named entities in the documents, in relation to the named entity in the text input, using a probabilistic model that uses language pronunciation characteristics and a mapping of letter sequences from the target language into the source language, and adding the scored named entities to the potential translations;searching a monolingual resource in the target language for information relating to usage frequency; and providing output comprising at least one of the potential translations based on the usage frequency information.
-
-
18. A method comprising:
-
obtaining a named entity from text input of a source language; generating potential translations of the named entity from the source language to a target language using a pronunciation-based and spelling-based transliteration model by generating phrases in the target language and corresponding transliteration scores with a probabilistic model that uses language pronunciation characteristics and a mapping of letter sequences from the target language into the source language, the potential translations comprising the scored phrases, identifying sub-phrases in the generated phrases, discovering documents in the target language using the sub-phrases, identifying, in the discovered documents, named entities that include one or more of the sub-phrases, generating transliteration scores for the identified named entities in the discovered documents using the probabilistic model, and adding the scored named entities to the potential translations; searching a monolingual resource in the target language for information relating to usage frequency; and providing output comprising at least one of the potential translations based on the usage frequency information.
-
-
19. A system comprising:
-
an input/output (I/O) system comprising a network interface configured to provide access to a monolingual resource; a potential translations generator coupled with the I/O system, the potential translations generator incorporating a combined pronunciation-based and spelling-based transliteration model used to generate translation candidates for a named entity; a re-ranker module configured to adjust scores of the translation candidates based on usage frequency information discovered in the monolingual resource using the network interface; and a bilingual resource, wherein the potential translations generator selectively uses the bilingual resource based on a category of the named entity. - View Dependent Claims (20, 21, 22, 23)
-
-
24. A system comprising:
-
an input/output (I/O) system; and a potential translations generator coupled with the I/O system, the potential translations generator incorporating a combined pronunciation-based and spelling-based transliteration model used to generate translation candidates for a named entity based at least in part on sub-phrases identified in an initial set of translation candidates. - View Dependent Claims (25)
-
-
26. A system comprising:
-
means for generating potential translations of a named entity from a source language to a target language using spelling-based transliteration the means for generating comprising means for selectively using a bilingual dictionary and a news corpus; and means for adjusting probability scores of the generated potential translations based on usage frequency information discovered in a monolingual resource. - View Dependent Claims (27)
-
Specification