LANGUAGE TRANSLATION OF VISUAL AND AUDIO INPUT
First Claim
1. A system configured to translate, comprising:
- a visual capture component configured to receive visual input of a target scene;
a visual analysis component configured to analyze the visual input to identify one or more locations within the visual input that comprise a textual element associated with a first language;
a text translator component configured to translate the textual element into a translated textual element associated with a second language based at least in part on a first contextual hint;
a visual rendering component configured to;
substitute the translated textual element for the textual element in an image of the target scene or add the translated textual element to the textual element in the image; and
display the image of the target scene comprising at least one of the substituted or added translated textual element; and
an audio capture component that is voice-activated and is configured to;
receive audio input associated with the first language;
translate the audio input into translated audio associated with the second language based at least in part on;
hidden Markov model based speech synthesis;
one or more pauses comprised within the audio input;
a sentence structure associated with at least some of the audio input;
a number of syllables of the audio input; and
a second contextual hint determined based at least in part on the visual input; and
play the translated audio via a speaker.
2 Assignments
0 Petitions
Accused Products
Abstract
The present translation system translates visual input and/or audio input from one language into another language. Some implementations incorporate a context-based translation that uses information obtained from visual input or audio input to aid in the translation of the other input. Other implementations combine the visual and audio translation. The translation system includes visual components and/or audio components. The visual components analyze visual input to identify a textual element and translate the textual element into a translated textual element. The visual image represents a captured image of a target scene. The visual components may further substitute the translated textual element for the textual element in the captured image. The audio components convert audio input into translated audio.
-
Citations
20 Claims
-
1. A system configured to translate, comprising:
-
a visual capture component configured to receive visual input of a target scene; a visual analysis component configured to analyze the visual input to identify one or more locations within the visual input that comprise a textual element associated with a first language; a text translator component configured to translate the textual element into a translated textual element associated with a second language based at least in part on a first contextual hint; a visual rendering component configured to; substitute the translated textual element for the textual element in an image of the target scene or add the translated textual element to the textual element in the image; and display the image of the target scene comprising at least one of the substituted or added translated textual element; and an audio capture component that is voice-activated and is configured to; receive audio input associated with the first language; translate the audio input into translated audio associated with the second language based at least in part on; hidden Markov model based speech synthesis; one or more pauses comprised within the audio input; a sentence structure associated with at least some of the audio input; a number of syllables of the audio input; and a second contextual hint determined based at least in part on the visual input; and play the translated audio via a speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable media comprising computer-executable instructions that when executed, perform a method comprising:
-
receiving visual input of a target scene; analyzing the visual input to identify one or more locations within the visual input that comprise a textual element associated with a first language; translating the textual element into a translated textual element associated with a second language based at least in part on a first contextual hint determined based at least in part on an audio input; and translating the audio input into translated audio associated with the second language based at least in part on; hidden Markov model based speech synthesis; one or more pauses comprised within the audio input; a sentence structure associated with at least some of the audio input; a number of syllables of the audio input; and a second contextual hint determined based at least in part on the visual input. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A method, comprising:
-
receiving visual input of a target scene, the visual input representative of a captured image of the target scene; analyzing the visual input to identify one or more locations within the visual input that comprise a textual element associated with a first language; translating the textual element into a translated textual element associated with a second language based at least in part on a first contextual hint determined based at least in part on an audio input; and translating the audio input into a transcribed audio segment based at least in part on at least one of; hidden Markov model based speech synthesis; one or more pauses comprised within the audio input; a sentence structure associated with at least some of the audio input; a number of syllables of the audio input;
ora second contextual hint determined based at least in part on the visual input. - View Dependent Claims (19, 20)
-
Specification