LANGUAGE TRANSLATION OF VISUAL AND AUDIO INPUT

US 20130338997A1
Filed: 08/20/2013
Published: 12/19/2013
Est. Priority Date: 03/29/2007
Status: Active Grant

First Claim

Patent Images

1. A system configured to translate, comprising:

a visual capture component configured to receive visual input of a target scene;

a visual analysis component configured to analyze the visual input to identify one or more locations within the visual input that comprise a textual element associated with a first language;

a text translator component configured to translate the textual element into a translated textual element associated with a second language based at least in part on a first contextual hint;

a visual rendering component configured to;

substitute the translated textual element for the textual element in an image of the target scene or add the translated textual element to the textual element in the image; and

display the image of the target scene comprising at least one of the substituted or added translated textual element; and

an audio capture component that is voice-activated and is configured to;

receive audio input associated with the first language;

translate the audio input into translated audio associated with the second language based at least in part on;

hidden Markov model based speech synthesis;

one or more pauses comprised within the audio input;

a sentence structure associated with at least some of the audio input;

a number of syllables of the audio input; and

a second contextual hint determined based at least in part on the visual input; and

play the translated audio via a speaker.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present translation system translates visual input and/or audio input from one language into another language. Some implementations incorporate a context-based translation that uses information obtained from visual input or audio input to aid in the translation of the other input. Other implementations combine the visual and audio translation. The translation system includes visual components and/or audio components. The visual components analyze visual input to identify a textual element and translate the textual element into a translated textual element. The visual image represents a captured image of a target scene. The visual components may further substitute the translated textual element for the textual element in the captured image. The audio components convert audio input into translated audio.

Citations

20 Claims

1. A system configured to translate, comprising:
- a visual capture component configured to receive visual input of a target scene;
  
  a visual analysis component configured to analyze the visual input to identify one or more locations within the visual input that comprise a textual element associated with a first language;
  
  a text translator component configured to translate the textual element into a translated textual element associated with a second language based at least in part on a first contextual hint;
  
  a visual rendering component configured to;
  
  substitute the translated textual element for the textual element in an image of the target scene or add the translated textual element to the textual element in the image; and
  
  display the image of the target scene comprising at least one of the substituted or added translated textual element; and
  
  an audio capture component that is voice-activated and is configured to;
  
  receive audio input associated with the first language;
  
  translate the audio input into translated audio associated with the second language based at least in part on;
  
  hidden Markov model based speech synthesis;
  
  one or more pauses comprised within the audio input;
  
  a sentence structure associated with at least some of the audio input;
  
  a number of syllables of the audio input; and
  
  a second contextual hint determined based at least in part on the visual input; and
  
  play the translated audio via a speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, the audio capture component comprising a microphone.
  - 3. The system of claim 1, the first contextual hint based at least in part on the translated audio.
  - 4. The system of claim 1, the visual rendering component configured to display the translated textual element at a location corresponding to the textual element in the received visual input.
  - 5. The system of claim 1, at least some of at least one of the visual capture component, the visual analysis component, the text translator component, the visual rendering component, or the audio capture component implemented at least in part via a remote computing component operatively coupled to the system.
  - 6. The system of claim 5, the remote computing component coupled to the system via a network.
  - 7. The system of claim 1, the visual capture component configured to receive the visual input from streaming media.
  - 8. The system of claim 1, the audio capture component configured to play the translated audio concurrently with the display of the image of the target scene.
  - 9. The system of claim 1, the audio capture component configured to capture audio input from a spoken communication.
  - 10. The system of claim 1, the visual rendering component configured to display the translated textual element on a bottom of a display associated with the system.

11. A computer-readable media comprising computer-executable instructions that when executed, perform a method comprising:
- receiving visual input of a target scene;
  
  analyzing the visual input to identify one or more locations within the visual input that comprise a textual element associated with a first language;
  
  translating the textual element into a translated textual element associated with a second language based at least in part on a first contextual hint determined based at least in part on an audio input; and
  
  translating the audio input into translated audio associated with the second language based at least in part on;
  
  hidden Markov model based speech synthesis;
  
  one or more pauses comprised within the audio input;
  
  a sentence structure associated with at least some of the audio input;
  
  a number of syllables of the audio input; and
  
  a second contextual hint determined based at least in part on the visual input.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The computer-readable media of claim 11, comprising translating the audio input based at least in part on a transcribed audio segment.
  - 13. The computer-readable media of claim 12, at least one of the transcribed audio segment or the translated audio providing the first contextual hint.
  - 14. The computer-readable media of claim 11, the visual input associated with one or more file types.
  - 15. The computer-readable media of claim 11, the method performed by a kiosk-type device.
  - 16. The computer-readable media of claim 15, the kiosk-type device configured to scan a printed version of a digital photograph to obtain the visual input.
  - 17. The computer-readable media of claim 11, comprising:
    - substituting the translated textual element for the textual element in an image of the visual input.

18. A method, comprising:
- receiving visual input of a target scene, the visual input representative of a captured image of the target scene;
  
  analyzing the visual input to identify one or more locations within the visual input that comprise a textual element associated with a first language;
  
  translating the textual element into a translated textual element associated with a second language based at least in part on a first contextual hint determined based at least in part on an audio input; and
  
  translating the audio input into a transcribed audio segment based at least in part on at least one of;
  
  hidden Markov model based speech synthesis;
  
  one or more pauses comprised within the audio input;
  
  a sentence structure associated with at least some of the audio input;
  
  a number of syllables of the audio input;
  
  ora second contextual hint determined based at least in part on the visual input.
- View Dependent Claims (19, 20)
- - 19. The method of claim 18, the visual input originating from a printed picture that is scanned.
  - 20. The method of claim 18 performed by a kiosk-type device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Boyd, Jonathan J., Pathak, Binay K.

Granted Patent

US 9,298,704 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/3
CPC Class Codes

G06F 40/40 Processing or translation o...

G06F 40/58 Use of machine translation,...

LANGUAGE TRANSLATION OF VISUAL AND AUDIO INPUT

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

LANGUAGE TRANSLATION OF VISUAL AND AUDIO INPUT

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links