System and method for providing optimal braille output based on spoken and sign language

US 10,395,555 B2
Filed: 03/30/2015
Issued: 08/27/2019
Est. Priority Date: 03/30/2015
Status: Active Grant

First Claim

Patent Images

1. A system for determining text for an optimal text stream corresponding to a series of words presented in spoken language and sign language, the system comprising:

a camera configured to detect image data corresponding to the series of words in sign language;

a microphone configured to detect audio data corresponding to the series of words in spoken language; and

a processor configured to;

determine, for each word in the series of words in sign language, an image-based text word and a confidence value for the image-based text word based on the image data,determine, for each word in the series of words in spoken language, an audio-based text word and a confidence value for the audio-based text word based on the audio data,when a corresponding image-based text word and a corresponding audio-based text word do not match, for a particular word in the series of words;

select the image-based text word or the audio-based text word to be included in the optimal text stream based on an image stream confidence value and an audio stream confidence value, the image stream confidence value having an initial value corresponding to a confidence value of a first determined image-based text word, and the audio stream confidence value having an initial value corresponding to a confidence value of a first determined audio-based text word,update the image stream confidence value by an amount proportional to a confidence value of the corresponding image-based text word and whether the corresponding image-based text word was selected, andupdate the audio stream confidence value by an amount proportional to a confidence value of the corresponding audio-based text word and whether the corresponding audio-based text word was selected, and provide the optimal text stream to an output device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for determining output text based on spoken language and sign language includes a camera configured to detect image data corresponding to a word in sign language. The system also includes a microphone configured to detect audio data corresponding to the word in spoken language. The system also includes a processor configured to receive the image data from the camera and convert the image data into an image based text word. The processor is also configured to receive the audio data from the microphone and convert the audio data into an audio based text word. The processor is also configured to determine an optimal word by selecting one of the image based text word or the audio based text word based on a comparison of the image based text word and the audio based text word.

Citations

19 Claims

1. A system for determining text for an optimal text stream corresponding to a series of words presented in spoken language and sign language, the system comprising:
- a camera configured to detect image data corresponding to the series of words in sign language;
  
  a microphone configured to detect audio data corresponding to the series of words in spoken language; and
  
  a processor configured to;
  
  determine, for each word in the series of words in sign language, an image-based text word and a confidence value for the image-based text word based on the image data,determine, for each word in the series of words in spoken language, an audio-based text word and a confidence value for the audio-based text word based on the audio data,when a corresponding image-based text word and a corresponding audio-based text word do not match, for a particular word in the series of words;
  
  select the image-based text word or the audio-based text word to be included in the optimal text stream based on an image stream confidence value and an audio stream confidence value, the image stream confidence value having an initial value corresponding to a confidence value of a first determined image-based text word, and the audio stream confidence value having an initial value corresponding to a confidence value of a first determined audio-based text word,update the image stream confidence value by an amount proportional to a confidence value of the corresponding image-based text word and whether the corresponding image-based text word was selected, andupdate the audio stream confidence value by an amount proportional to a confidence value of the corresponding audio-based text word and whether the corresponding audio-based text word was selected, and provide the optimal text stream to an output device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1 wherein the confidence value for the audio-based text word is further based on at least one of a commonality of potential matches or a quality of the audio data.
  - 3. The system of claim 1 further comprising a refreshable braille display configured to receive the optimal text stream and to output the optimal text stream in braille.
  - 4. The system of claim 3 wherein the processor is further configured to determine that an optimal word of the optimal text stream that is incorrect and to determine a correct word, and wherein the refreshable braille display is further configured to output the correct word.
  - 5. The system of claim 3 wherein the refreshable braille display is further configured to output the image-based text word and the audio-based text word.
  - 6. The system of claim 1 wherein the processor is further configured to determine that the image-based text word is the same as the audio-based text word.
  - 7. The system of claim 1 wherein the image data corresponds to sign language of each letter of the words.
  - 8. The system of claim 1 wherein the image stream confidence value is updated by increasing the image stream confidence value when the confidence value of the corresponding image-based text word is greater than the image stream confidence value and by increasing the image stream confidence value when the corresponding image-based text word was selected, andwherein the audio stream confidence value is updated by increasing the audio stream confidence value when the confidence value of the corresponding audio-based text word is greater than the audio stream confidence value and by increasing the audio stream confidence value when the corresponding audio-based text word was selected.
  - 9. The system of claim 1 wherein the image stream confidence value is updated by decreasing the image stream confidence value when the confidence value of the corresponding image-based text word is less than the image stream confidence value and by decreasing the image stream confidence value when the corresponding image-based text word was not selected, andwherein the audio stream confidence value is updated by decreasing the audio stream confidence value when the confidence value of the corresponding audio-based text word is less than the audio stream confidence value and by decreasing the audio stream confidence value when the corresponding audio-based text word was not selected.

10. A system for outputting text based on spoken language and sign language into an optimal text stream, the system comprising:
- a camera configured to detect image data corresponding to words in sign language;
  
  a microphone configured to detect audio data corresponding to the words in spoken language; and
  
  a processor configured to;
  
  receive the image data from the camera,receive the audio data from the microphone,convert the image data into multiple image-based text words and the audio data into multiple audio-based text words,determine, for each word in the multiple image-based text words, an image confidence value based on the image data,create an image stream confidence value having an initial value corresponding to a confidence value of a first image-based text word,update the image stream confidence value by an amount proportional to a confidence value of each subsequent image-based text word of the multiple image-based text words and whether image-based text words were selected to form the optimal stream of words,determine, for each word in the multiple audio-based text words, an audio confidence value based on the audio data,create an audio stream confidence value having an initial value corresponding to a confidence value of a first audio-based text word,update the audio stream confidence value by an amount proportional to a confidence value of each subsequent audio-based text word of the multiple audio-based text words and whether audio-based text words were selected to form the optimal stream of words,select a combination of one or more image-based text words of the multiple image-based text words or one or more audio-based text words of the multiple audio-based text words to form the optimal stream of words based on the comparison of the image stream confidence value with the audio stream confidence value, andprovide the optimal stream of words to an output device.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The system of claim 10 wherein the audio confidence value is further based on at least one of a commonality of potential matches or a quality of the audio data.
  - 12. The system of claim 10 further comprising a refreshable braille display configured to receive the optimal text stream and to output the optimal text stream in braille.
  - 13. The system of claim 12 wherein the processor is further configured to determine that an optimal word of the optimal text stream that is incorrect and to determine a correct word, and wherein the refreshable braille display is further configured to output the correct word.
  - 14. The system of claim 12 wherein the refreshable braille display is further configured to output the image-based text word and the audio-based text word.

15. A method of determining text for an optimal text stream corresponding to a series of words presented in spoken language and sign language, the method comprising:
- detecting, using a camera, image data corresponding to the series of words in sign language;
  
  detecting, using a microphone, audio data corresponding to the series of words in spoken language;
  
  determining, by a processor for each word in the series of words in sign language, an image-based text word and a confidence value for the image-based text word based on the image data;
  
  determining, by the processor for each word in the series of words in spoken language, an audio-based text word and a confidence value for the audio-based text word based on the audio data;
  
  for a particular word in the series of words, the particular word having a corresponding image-based text word and a corresponding audio-based text word;
  
  selecting, by the processor, the image-based text word or the audio-based text word to be included in the optimal text stream based on an image stream confidence value and an audio stream confidence value, the image stream confidence value having an initial value corresponding to a confidence value of a first determined image-based text word, and the audio stream confidence value having an initial value corresponding to a confidence value of a first determined audio-based text word,updating, by the processor, the image stream confidence value by an amount proportional to a confidence value of the corresponding image-based text word and whether the corresponding image-based text word was selected, andupdating, by the processor, the audio stream confidence value by an amount proportional to a confidence value of the corresponding audio-based text word and whether the corresponding audio-based text word was selected; and
  
  providing, by the processor, the optimal text stream to an output device as each word in the optimal text stream is selected.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The method of claim 15 wherein the confidence value for the audio-based text word is further based on at least one of a commonality of potential matches or a quality of the audio data.
  - 17. The method of claim 15 further comprising outputting, by a refreshable braille display, the optimal text stream in braille.
  - 18. The method of claim 15 further comprising outputting, by a refreshable braille display, the audio based text word and the image based text word.
  - 19. The method of claim 15 further comprising determining, using the processor, that the image based text word is the same as the audio based text word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Toyota Motor Engineering & Manufacturing North America Incorporated (Toyota Motor Corporation)
Original Assignee
Toyota Motor Engineering & Manufacturing North America Incorporated (Toyota Motor Corporation)
Inventors
Djugash, Joseph M. A., Dayal, Rajiv
Primary Examiner(s)
Saint-Vil, Eddy
Assistant Examiner(s)
Ermlick, William D

Application Number

US14/673,303
Publication Number

US 20160293051A1
Time in Patent Office

1,611 Days
Field of Search
US Class Current
CPC Class Codes

G09B 21/006   using audible presentation ...

G09B 21/009   Teaching or communicating w...

G09B 21/02   Devices for Braille writing...

System and method for providing optimal braille output based on spoken and sign language

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for providing optimal braille output based on spoken and sign language

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links