System and method for providing optimal braille output based on spoken and sign language
First Claim
1. A system for determining text for an optimal text stream corresponding to a series of words presented in spoken language and sign language, the system comprising:
- a camera configured to detect image data corresponding to the series of words in sign language;
a microphone configured to detect audio data corresponding to the series of words in spoken language; and
a processor configured to;
determine, for each word in the series of words in sign language, an image-based text word and a confidence value for the image-based text word based on the image data,determine, for each word in the series of words in spoken language, an audio-based text word and a confidence value for the audio-based text word based on the audio data,when a corresponding image-based text word and a corresponding audio-based text word do not match, for a particular word in the series of words;
select the image-based text word or the audio-based text word to be included in the optimal text stream based on an image stream confidence value and an audio stream confidence value, the image stream confidence value having an initial value corresponding to a confidence value of a first determined image-based text word, and the audio stream confidence value having an initial value corresponding to a confidence value of a first determined audio-based text word,update the image stream confidence value by an amount proportional to a confidence value of the corresponding image-based text word and whether the corresponding image-based text word was selected, andupdate the audio stream confidence value by an amount proportional to a confidence value of the corresponding audio-based text word and whether the corresponding audio-based text word was selected, and provide the optimal text stream to an output device.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for determining output text based on spoken language and sign language includes a camera configured to detect image data corresponding to a word in sign language. The system also includes a microphone configured to detect audio data corresponding to the word in spoken language. The system also includes a processor configured to receive the image data from the camera and convert the image data into an image based text word. The processor is also configured to receive the audio data from the microphone and convert the audio data into an audio based text word. The processor is also configured to determine an optimal word by selecting one of the image based text word or the audio based text word based on a comparison of the image based text word and the audio based text word.
-
Citations
19 Claims
-
1. A system for determining text for an optimal text stream corresponding to a series of words presented in spoken language and sign language, the system comprising:
-
a camera configured to detect image data corresponding to the series of words in sign language; a microphone configured to detect audio data corresponding to the series of words in spoken language; and a processor configured to; determine, for each word in the series of words in sign language, an image-based text word and a confidence value for the image-based text word based on the image data, determine, for each word in the series of words in spoken language, an audio-based text word and a confidence value for the audio-based text word based on the audio data, when a corresponding image-based text word and a corresponding audio-based text word do not match, for a particular word in the series of words; select the image-based text word or the audio-based text word to be included in the optimal text stream based on an image stream confidence value and an audio stream confidence value, the image stream confidence value having an initial value corresponding to a confidence value of a first determined image-based text word, and the audio stream confidence value having an initial value corresponding to a confidence value of a first determined audio-based text word, update the image stream confidence value by an amount proportional to a confidence value of the corresponding image-based text word and whether the corresponding image-based text word was selected, and update the audio stream confidence value by an amount proportional to a confidence value of the corresponding audio-based text word and whether the corresponding audio-based text word was selected, and provide the optimal text stream to an output device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for outputting text based on spoken language and sign language into an optimal text stream, the system comprising:
-
a camera configured to detect image data corresponding to words in sign language; a microphone configured to detect audio data corresponding to the words in spoken language; and a processor configured to; receive the image data from the camera, receive the audio data from the microphone, convert the image data into multiple image-based text words and the audio data into multiple audio-based text words, determine, for each word in the multiple image-based text words, an image confidence value based on the image data, create an image stream confidence value having an initial value corresponding to a confidence value of a first image-based text word, update the image stream confidence value by an amount proportional to a confidence value of each subsequent image-based text word of the multiple image-based text words and whether image-based text words were selected to form the optimal stream of words, determine, for each word in the multiple audio-based text words, an audio confidence value based on the audio data, create an audio stream confidence value having an initial value corresponding to a confidence value of a first audio-based text word, update the audio stream confidence value by an amount proportional to a confidence value of each subsequent audio-based text word of the multiple audio-based text words and whether audio-based text words were selected to form the optimal stream of words, select a combination of one or more image-based text words of the multiple image-based text words or one or more audio-based text words of the multiple audio-based text words to form the optimal stream of words based on the comparison of the image stream confidence value with the audio stream confidence value, and provide the optimal stream of words to an output device. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method of determining text for an optimal text stream corresponding to a series of words presented in spoken language and sign language, the method comprising:
-
detecting, using a camera, image data corresponding to the series of words in sign language; detecting, using a microphone, audio data corresponding to the series of words in spoken language; determining, by a processor for each word in the series of words in sign language, an image-based text word and a confidence value for the image-based text word based on the image data; determining, by the processor for each word in the series of words in spoken language, an audio-based text word and a confidence value for the audio-based text word based on the audio data; for a particular word in the series of words, the particular word having a corresponding image-based text word and a corresponding audio-based text word; selecting, by the processor, the image-based text word or the audio-based text word to be included in the optimal text stream based on an image stream confidence value and an audio stream confidence value, the image stream confidence value having an initial value corresponding to a confidence value of a first determined image-based text word, and the audio stream confidence value having an initial value corresponding to a confidence value of a first determined audio-based text word, updating, by the processor, the image stream confidence value by an amount proportional to a confidence value of the corresponding image-based text word and whether the corresponding image-based text word was selected, and updating, by the processor, the audio stream confidence value by an amount proportional to a confidence value of the corresponding audio-based text word and whether the corresponding audio-based text word was selected; and providing, by the processor, the optimal text stream to an output device as each word in the optimal text stream is selected. - View Dependent Claims (16, 17, 18, 19)
-
Specification