×

Language translation of visual and audio input

  • US 8,515,728 B2
  • Filed: 03/29/2007
  • Issued: 08/20/2013
  • Est. Priority Date: 03/29/2007
  • Status: Active Grant
First Claim
Patent Images

1. Glasses configured to translate, comprising:

  • a language selection component configured to determine at least one of a first language or a second language based at least in part on at least one of a selection by a user or a logic for automatically determining a language;

    a visual capture component configured to receive visual input of a target scene, the visual input originating from at least one of streaming video or captured video;

    a visual analysis component configured to analyze, using neural network based optical character recognition, the visual input to identify one or more locations within the visual input that comprise a textual element associated with the first language;

    a text translator component configured to translate the textual element into a translated textual element associated with the second language based at least in part on a first contextual hint determined based at least in part on an audio input associated with the first language;

    a visual rendering component configured to;

    at least one of substitute the translated textual element for the textual element in an image of the target scene or add the translated textual element to the textual element in the image; and

    display the image of the target scene comprising at least one of the substituted or added translated textual element to a user wearing the glasses; and

    an audio capture component that is voice-activated and is configured to;

    receive the audio input associated with the first language;

    translate the audio input into translated audio associated with the second language based at least in part on;

    hidden Markov model based speech synthesis;

    one or more pauses comprised within the audio input;

    a sentence structure associated with at least some of the audio input;

    a number of syllables of the audio input; and

    a second contextual hint determined based at least in part on the visual input; and

    play the translated audio to the user via a speaker.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×