Methods and apparatus for audio-visual speaker recognition and utterance verification
First Claim
1. A method of performing speaker recognition, the method comprising the steps of:
- processing a video signal associated with an arbitrary content video source;
processing an audio signal associated with the video signal; and
making at least one of an identification and verification decision based on the processed audio signal and the processed video signal.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification. In a supervised utterance verification embodiment, the processed video signal is compared with a script representing an audio signal associated with the video signal to determine a level of correlation between the signals.
-
Citations
61 Claims
-
1. A method of performing speaker recognition, the method comprising the steps of:
-
processing a video signal associated with an arbitrary content video source;
processing an audio signal associated with the video signal; and
making at least one of an identification and verification decision based on the processed audio signal and the processed video signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method of verifying a speech utterance, the method comprising the steps of:
-
processing a video signal associated with a video source;
processing an audio signal associated with the video signal; and
comparing the processed audio signal with the processed video signal to determine a level of correlation between the signals. - View Dependent Claims (27, 28, 29, 30, 31)
-
-
32. A method of verifying a speech utterance, the method comprising the steps of:
-
processing a video signal associated with a video source; and
comparing the processed video signal with a script representing an audio signal associated with the video signal to determine a level of correlation between the signals.
-
-
33. Apparatus for performing speaker recognition, the apparatus comprising:
at least one processor operable to;
process a video signal associated with an arbitrary content video source, (ii) process an audio signal associated with the video signal, and (iii) make at least one of an identification and verification decision based on the processed audio signal and the processed video signal.- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57)
-
58. Apparatus for verifying a speech utterance, the apparatus comprising:
at least one processor operable to;
(i) process a video signal associated with a video source, (ii) process an audio signal associated with the video signal, and (iii) compare the processed audio signal with the processed video signal to determine a level of correlation between the signals.
-
59. Apparatus for verifying a speech utterance, the apparatus comprising:
at least one processor operable to;
(i) process a video signal associated with a video source, and (ii) compare the processed video signal with a script representing an audio signal associated with the video signal to determine a level of correlation between the signals.
-
60. A method of performing speaker recognition, the method comprising the steps of:
-
processing an image signal associated with an arbitrary content image source;
processing an audio signal associated with the image signal; and
making at least one of an identification and verification decision based on the processed audio signal and the processed image signal.
-
-
61. Apparatus for performing speaker recognition, the apparatus comprising:
at least one processor operable to;
(i) process an image signal associated with an arbitrary content image source, (ii) process an audio signal associated with the image signal, and (iii) make at least one of an identification and verification decision based on the processed audio signal and the processed image signal.
Specification