Speaker recognition with assessment of audio frame contribution
First Claim
1. An apparatus for use in biometric speaker recognition, wherein the apparatus is configured to receive digital audio data derived from an audio signal output by a microphone, the apparatus comprising:
- an analyzer for analyzing each frame of a sequence of frames of digital audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame; and
an assessment module for determining for each frame of audio data a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined at least one characteristic of the speech sound;
wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a specific phoneme or as one of a plurality of predefined classes of phonemes, andwherein the contribution indicator varies based on the number of previous instances of the same phoneme or class of phoneme in previous frames of audio data.
2 Assignments
0 Petitions
Accused Products
Abstract
This application describes methods and apparatus for speaker recognition. An apparatus according to an embodiment has an analyzer (202) for analyzing each frame of a sequence of frames of audio data (AIN) which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame. An assessment module (203) determines, for each frame of audio data, a contribution indicator of the extent to which the frame of audio data should be used for speaker recognition processing based on the determined characteristic of the speech sound. In this way frames which correspond to speech sounds that are of most use for speaker discrimination may be emphasized and/or frames which correspond to speech sounds that are of least use for speaker discrimination may be de-emphasized.
58 Citations
25 Claims
-
1. An apparatus for use in biometric speaker recognition, wherein the apparatus is configured to receive digital audio data derived from an audio signal output by a microphone, the apparatus comprising:
-
an analyzer for analyzing each frame of a sequence of frames of digital audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame; and an assessment module for determining for each frame of audio data a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined at least one characteristic of the speech sound; wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a specific phoneme or as one of a plurality of predefined classes of phonemes, and wherein the contribution indicator varies based on the number of previous instances of the same phoneme or class of phoneme in previous frames of audio data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. An apparatus for use in biometric speaker recognition, wherein the apparatus is configured to receive digital audio data derived from an audio signal output by a microphone, the apparatus comprising:
-
an assessment module for determining for a sequence of frames of digital audio data which correspond to speech sounds uttered by a user a contribution indicator of the extent to which a frame of audio data should be used for speaker recognition processing based on at least one characteristic of the speech sound to which the frame relates; wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a specific phoneme or as one of a plurality of predefined classes of phonemes, and wherein the contribution indicator varies based on the number of previous instances of the same phoneme or class of phoneme in previous frames of audio data.
-
-
24. A method of speaker recognition, comprising:
-
analyzing each frame of a sequence of frames of digital audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame, wherein the digital audio data derived from an audio signal output by a microphone; and determining for the each frame of audio data a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined at least one characteristic of the speech sound; wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a specific phoneme or as one of a plurality of predefined classes of phonemes, and wherein the contribution indicator varies based on the number of previous instances of the same phoneme or class of phoneme in previous frames of audio data. - View Dependent Claims (25)
-
Specification