User authentication using audiovisual synchrony detection
First Claim
Patent Images
1. A method for preventing a replay attack, comprising:
- receiving, at a first time, first video and first audio signals generated in response to a user uttering a passphrase;
receiving, at a second time subsequent to the first time, second video and second audio signals generated respectively by a camera and a microphone in response the user uttering the passphrase;
extracting, from the received audio signals, speech-based features;
extracting, from the received video signals, visual-based features;
computing, by a processor, an audio temporal alignment between the first and the second audio signals, by computing a dynamic time warping on the audio-based features extracted from the first and second audio signals, the audio temporal alignment comprising a first registration that synchronizes the first and the second audio signals;
computing, by the processor, a video temporal alignment between the first and the second video signals, by computing a dynamic time warping on the video-based features extracted from the first and second video signals, the video temporal alignment comprising a second registration that synchronizes the first and the second video signals;
comparing the audio temporal alignment between the first and the second audio signals to the video temporal alignment between the first and the second video signals; and
successfully authenticating the user upon detecting, as a result of the comparing, that the audio and the video temporal alignments are synchronized; and
failing the authentication of the user upon detecting, as a result of the comparison, that the audio and the video temporal alignments are not synchronized.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, computing systems and computer program products implement embodiments of the present invention that include receiving, at a first time, first video and first audio signals generated in response to a user uttering a passphrase, and receiving, at a second time subsequent to the first time, second video and second audio signals generated in response the user uttering the passphrase. Upon computing an audio temporal alignment between the first and the second audio signals and computing a video temporal alignment between the first and the second video signal, the user can be authenticated by comparing the audio temporal alignment to the video temporal alignment.
30 Citations
13 Claims
-
1. A method for preventing a replay attack, comprising:
-
receiving, at a first time, first video and first audio signals generated in response to a user uttering a passphrase; receiving, at a second time subsequent to the first time, second video and second audio signals generated respectively by a camera and a microphone in response the user uttering the passphrase; extracting, from the received audio signals, speech-based features; extracting, from the received video signals, visual-based features; computing, by a processor, an audio temporal alignment between the first and the second audio signals, by computing a dynamic time warping on the audio-based features extracted from the first and second audio signals, the audio temporal alignment comprising a first registration that synchronizes the first and the second audio signals; computing, by the processor, a video temporal alignment between the first and the second video signals, by computing a dynamic time warping on the video-based features extracted from the first and second video signals, the video temporal alignment comprising a second registration that synchronizes the first and the second video signals; comparing the audio temporal alignment between the first and the second audio signals to the video temporal alignment between the first and the second video signals; and successfully authenticating the user upon detecting, as a result of the comparing, that the audio and the video temporal alignments are synchronized; and failing the authentication of the user upon detecting, as a result of the comparison, that the audio and the video temporal alignments are not synchronized. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus for preventing a replay attack, comprising:
-
a microphone; a display configured to present a passphrase; and a processor configured; to receive, at a first time, first video and first audio signals generated in response to a user uttering the passphrase, to receive, at a second time subsequent to the first time, second video and second audio signals generated respectively by a camera and the microphone in response to the user uttering the passphrase, to extract, from the received audio signals, speech-based features, to extract, from the received video signals, visual-based features, to compute an audio temporal alignment between the first and the second audio signals, by computing a dynamic time warping on the audio-based features extracted from the first and second audio signals, the audio temporal alignment comprising a first registration that synchronizes the first and the second audio signals, to compute a video temporal alignment between the first and the second video signals, by computing a dynamic time warping on the video-based features extracted from the first and second video signals, the video temporal alignment comprising a second registration that synchronizes the first and the second video signals, to compare the audio temporal alignment between the first and the second audio signals to the video temporal alignment between the first and the second video signals, and to successfully authenticate the user upon detecting, as a result of the comparing, that the audio and video temporal alignments are synchronized, and to fail the authentication of the user upon detecting, as a result of the comparison that the audio and the video temporal alignments are not synchronized. - View Dependent Claims (9, 10, 11)
-
-
12. A computer program product for preventing a replay attack, the computer program product comprising:
-
a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code configured to receive, at a first time, first video and first audio signals generated in response to a user uttering a passphrase; computer readable program code configured to receive, at a second time subsequent to the first time, second video and second audio signals generated respectively by a camera and a microphone in response the user uttering the passphrase; computer readable program code configured to extract, from the received audio signals, speech-based features; computer readable program code configured to extract, from the received video signals; computer readable program code configured to compute an audio temporal alignment between the first and the second audio signals, by computing a dynamic time warping on the audio-based features extracted from the first and second audio signals, the audio temporal alignment comprising a first registration that synchronizes the first and the second audio signals; computer readable program code configured to compute, by the processor, a video temporal alignment between the first and the second video signals, by computing a dynamic time warping on the video-based features extracted from the first and second video signals, the video temporal alignment comprising a second registration that synchronizes the first and the second video signals; and computer readable program code configured to compare the audio temporal alignment between the first and the second audio signals to the video temporal alignment between the first and the second video signals; and computer readable program code configured to successfully authenticate the user upon detecting, as a result of the comparing, that the audio and the video temporal alignments are synchronized; and computer readable program code configured to fail the authentication of the user upon detecting, as a result of the comparison that the audio and the video temporal alignments are not synchronized. - View Dependent Claims (13)
-
Specification