Method and apparatus for recognizing speech by lip reading
First Claim
1. A dictation device comprising:
- an audio input device that receives an audio signal representing the voice utterance;
a video input device that receives a video signal representative of movement of a user; and
a controller configured according to instructions stored in a memory to;
generate first dictation based on the audio signal;
generate a feature signal parameter sequence based on the video signal;
generate configured dictation based on the first dictation and the feature signal parameter sequence;
determine a location associated with the dictation device; and
assign machine code for controlling a home appliance based on thefirst dictation or the configured dictation based upon the location associated with the dictation device.
0 Assignments
0 Petitions
Accused Products
Abstract
A dictation device includes: an audio input device configured to receive a voice utterance including a plurality of words; a video input device configured to receive video of lip motion during the voice utterance; a memory portion; a controller configured according to instructions in the memory portion to generate first data packets including an audio stream representative of the voice utterance and a video stream representative of the lip motion; and a transceiver for sending the first data packets to a server end device and receiving second data packets including combined dictation based upon the audio stream and the video stream from the server end device. In the combined dictation, first dictation generated based upon the audio stream has been corrected by second dictation generated based upon the video stream.
32 Citations
19 Claims
-
1. A dictation device comprising:
-
an audio input device that receives an audio signal representing the voice utterance; a video input device that receives a video signal representative of movement of a user; and a controller configured according to instructions stored in a memory to; generate first dictation based on the audio signal; generate a feature signal parameter sequence based on the video signal; generate configured dictation based on the first dictation and the feature signal parameter sequence; determine a location associated with the dictation device; and assign machine code for controlling a home appliance based on the first dictation or the configured dictation based upon the location associated with the dictation device. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A dictation device comprising:
-
an audio input device that receives an audio signal representing a voice utterance; a video input device that receives a video signal representative of movement of a user; and a controller configured according to instructions stored in a memory to; generate first dictation based on the audio signal; generate a feature signal parameter sequence based on the video signal; generate configured dictation based on the first dictation and the feature signal parameter sequence; determine a location associated with the dictation device based on positional data obtained by the dictation device; and assign machine code for controlling a vehicle component based on the first dictation or the configured dictation based upon the location associated with the dictation device. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A dictation device comprising:
-
an audio input device that receives an audio signal representing a voice utterance; a video input device that receives a video signal representative of movement of a user; and a controller configured according to instructions stored in a memory to; generate first dictation based on the audio signal; generate a feature signal parameter sequence based on the video signal; generate configured dictation based on the first dictation and the feature signal parameter sequence; determine a location associated with the dictation device based on geographical data obtained by the dictation device; and assign machine code for controlling an external device based on the first dictation or the configured dictation based upon the location associated with the dictation device determined based on the geographical data obtained by the dictation device. - View Dependent Claims (16, 17, 18, 19)
-
Specification