Universal translation
First Claim
1. A method for identifying a most likely source language of a snippet, the method comprising:
- receiving an indication of the snippet, wherein the snippet is a digital representation of words or character groups;
determining two or more possible source languages for the snippet;
generating, by one or more machine translation engines, two or more translations of the snippet, each translation of the snippet corresponding to one source language in the two or more possible source languages;
computing, by one or more translation scoring models trained using one or more neural networks, accuracy scores for at least two of the generated two or more translations of the snippet;
based on one or more of the computed accuracy scores, producing a confidence factor for each of at least two selected possible source languages, of the two or more possible source languages, for the snippet; and
selecting, as the most likely source language, the possible source language for the snippet that is associated with a highest confidence factor.
2 Assignments
0 Petitions
Accused Products
Abstract
A likely source language of a media item can be identified by attempting an initial language identification of the media item based on intrinsic or extrinsic factors, such as words in the media item and languages known by the media item author. This initial identification can generate a list of most likely source languages with corresponding likelihood factors. Translations can then be performed presuming each of the most likely source languages. The translations can be performed for multiple output languages. Each resulting translation can receive a corresponding score based on a number of factors. The scores can be combined where they have a common source language. These combined scores can be used to weight the previously identified likelihood factors for the source languages of the media item.
-
Citations
20 Claims
-
1. A method for identifying a most likely source language of a snippet, the method comprising:
-
receiving an indication of the snippet, wherein the snippet is a digital representation of words or character groups; determining two or more possible source languages for the snippet; generating, by one or more machine translation engines, two or more translations of the snippet, each translation of the snippet corresponding to one source language in the two or more possible source languages; computing, by one or more translation scoring models trained using one or more neural networks, accuracy scores for at least two of the generated two or more translations of the snippet; based on one or more of the computed accuracy scores, producing a confidence factor for each of at least two selected possible source languages, of the two or more possible source languages, for the snippet; and selecting, as the most likely source language, the possible source language for the snippet that is associated with a highest confidence factor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer-readable storage medium storing instructions that, when executed by a computing system, cause the computing system to perform operations for identifying a most likely source language of a snippet, the operations comprising:
-
receiving an indication of the snippet, wherein the snippet is a digital representation of words or character groups; determining two or more possible source languages for the snippet; generating, by one or more machine translation engines, two or more translations of the snippet, each translation of the snippet corresponding to one source language in the two or more possible source languages; computing, by one or more trained translation scoring models, accuracy scores for at least two of the generated two or more translations of the snippet; based on one or more of the computed accuracy scores, producing a confidence factor for each of at least two selected possible source languages, of the two or more possible source languages, for the snippet; and selecting, based on the confidence factor, one of the possible source languages for the snippet as the most likely source language. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A system for identifying a most likely source language of a snippet, the system comprising:
-
one or more processors; an interface configured to receive an indication of the snippet, wherein the snippet is a digital representation of words or character groups; and a memory storing instructions that, when executed by the one or more processors, cause system to perform operations comprising; determining two or more possible source languages for the snippet; generating, by one or more machine translation engines, two or more translations of the snippet, each translation of the snippet corresponding to one source language in the two or more possible source languages; computing, by one or more trained translation scoring models, accuracy scores for at least two of the generated two or more translations of the snippet; based on one or more of the computed accuracy scores, producing a confidence factor for each of at least two selected possible source languages, of the two or more possible source languages, for the snippet; and selecting, based on the confidence factor, one of the possible source languages for the snippet as the most likely source language. - View Dependent Claims (17, 18, 19, 20)
-
Specification