Correlating video images of lip movements with audio signals to improve speech recognition
First Claim
Patent Images
1. A method of speech recognition, said method comprising the steps of:
- receiving audio signals from a speech source;
receiving video signals from the speech source;
processing the audio signals and the video signals;
converting the audio signals and the video signals into recognizable information;
implementing a task based on the recognizable information.
6 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition device can include an audio signal receiver configured to receive audio signals from a speech source, a video signal receiver configured to receive video signals from the speech source, and a processing unit configured to process the audio signals and the video signals. In addition, the speech recognition device can include a conversion unit configured to convert the audio signals and the video signals to recognizable speech, and an implementation unit configured to implement a task based on the recognizable speech.
57 Citations
15 Claims
-
1. A method of speech recognition, said method comprising the steps of:
-
receiving audio signals from a speech source;
receiving video signals from the speech source;
processing the audio signals and the video signals;
converting the audio signals and the video signals into recognizable information;
implementing a task based on the recognizable information. - View Dependent Claims (2, 3, 4)
-
-
5. A speech recognition device, said device comprising:
-
an audio signal receiver configured to receive audio signals from a speech source;
a video signal receiver configured to receive video signals from the speech source;
a processing unit configured to process the audio signals and the video signals;
a conversion unit configured to convert the audio signals and the video signals to recognizable information;
an implementation unit configured to implement a task based on the recognizable information. - View Dependent Claims (6, 7, 8)
-
-
9. A system for speech recognition, said system comprising:
-
a first receiving means for receiving audio signals from a speech source;
a second receiving means for receiving video signals from the speech source;
a processing means for processing the audio signals and the video signals;
a converting means for converting the audio signals and the video signals to recognizable information;
an implementing means for implementing a task based on the recognizable information. - View Dependent Claims (10, 11, 12)
-
-
13. A method of speech recognition, said method comprising the steps of:
-
receiving audio signals from a speech source;
receiving video signals from the speech source;
processing the audio signals;
converting the audio signals into recognizable information;
processing the video signals when a segment of the audio signals can not be converted into the recognizable information, wherein the video signals coincide with the segment of the audio signals that cannot be converted into the recognizable information;
converting the processed video signals into the recognizable information; and
implementing a task based on the recognizable information.
-
-
14. A speech recognition device, said device comprising:
-
an audio signal receiver configured to receive audio signals from a speech source;
a video signal receiver configured to receive video signals from the speech source;
a first processing unit configured to process the audio signals;
a first conversion unit configured to convert the audio signals to recognizable information;
a second processing unit configured to process the video signals when the audio signals cannot be converted into the recognizable information, wherein the video signals coincide with the segment of the audio signals that cannot be converted into the recognizable information;
a second conversion unit configured to convert the processed video signals into the recognizable information; and
an implementation unit configured to implement a task based on the recognizable information.
-
-
15. A system for speech recognition, said system comprising:
-
a first receiving means for receiving audio signals from a speech source;
a second receiving means for receiving video signals from the speech source;
a first processing means for processing the audio signals;
a first converting means for converting the audio signals into recognizable information;
a second processing means for processing the video signals when a segment of the audio signals can not be converted into the recognizable information, wherein the video signals coincide with the segment of the audio signals that cannot be converted into the recognizable information;
a second converting means for converting the processed video signals into the recognizable information; and
an implementing means for implementing a task based on the recognizable information.
-
Specification