Correlating video images of lip movements with audio signals to improve speech recognition
First Claim
Patent Images
1. A method of speech recognition, comprising:
- determining if video images of a speech source are detected;
indicating if the video images are not detected;
receiving audio signals from the speech source;
receiving video signals from the speech source;
detecting if the audio signals can be processed;
processing the audio signals if it is detected that the audio signals can be processed;
processing the video signals based on a detection that at least a portion of the audio signal cannot be processed;
converting at least one of the audio signals and the video signals into recognizable information; and
implementing a task based on the recognizable information.
6 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition device can include an audio signal receiver configured to receive audio signals from a speech source, a video signal receiver configured to receive video signals from the speech source, and a processing unit configured to process the audio signals and the video signals. In addition, the speech recognition device can include a conversion unit configured to convert the audio signals and the video signals to recognizable speech, and an implementation unit configured to implement a task based on the recognizable speech.
61 Citations
40 Claims
-
1. A method of speech recognition, comprising:
-
determining if video images of a speech source are detected; indicating if the video images are not detected; receiving audio signals from the speech source; receiving video signals from the speech source; detecting if the audio signals can be processed; processing the audio signals if it is detected that the audio signals can be processed; processing the video signals based on a detection that at least a portion of the audio signal cannot be processed; converting at least one of the audio signals and the video signals into recognizable information; and implementing a task based on the recognizable information. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A speech recognition device, comprising:
-
an audio signal receiver configured to receive audio signals from a speech source; a video signal receiver configured to receive video signals from the speech source; a processing unit configured to detect if the audio signals can be processed and if so, to process the audio signals and process the video signals based on the detection that at least a portion of the audio signals cannot be processed; a conversion unit configured to convert at lease one of the audio signals and the video signals to recognizable information; and an implementation unit configured to implement a task based on the recognizable information, wherein the processing unit is configured to determine if the video image of a user is detected and, if the video image of the user is not detected, to indicate to the user that the video image is not detected. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A system for speech recognition, comprising:
-
a first receiver that receives audio signals from a speech source; a second receiver that receives video signals from the speech source; a processor that detects if the audio signals can be processed and that processes the audio signals if the audio signals can be processed, the processor processing the video signals based on the detection that at least a portion of the audio signals can not be processed; a converter that converts at least one of the audio signals and the video signals to recognizable information; and an implementor that implements a task based on the recognizable information, wherein the processor determines if the video image of a user is detected and, if the user'"'"'s video image is not detected, indicates to the user that the video image is not detected. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
Specification