Hierarchical real-time speaker recognition for biometric VoIP verification and targeting
First Claim
1. A method for real-time speaker recognition, comprising:
- obtaining speech data of a speaker to identify the speaker from a plurality of speakers;
extracting, using a processor of a computer, a coarse feature of the speaker from the speech data;
identifying the speaker as belonging to a pre-determined speaker cluster that is one of a plurality of partitions of the plurality of speakers and corresponds to a subset of a plurality of biometric signatures of the plurality of speakers, wherein identifying the speaker as belonging to the pre-determined speaker cluster is based on comparing the coarse feature of the speaker to a speaker independent parameter representing the subset of the plurality of biometric signatures;
further identifying, in response to identifying the speaker as belonging to the pre-determined speaker cluster, the speaker as belonging to a second level pre-determined speaker cluster that is one of a plurality of second level partitions of the pre-determined speaker cluster and corresponds to a second level subset of the subset of the plurality of biometric signatures, wherein identifying the speaker as belonging to the second level pre-determined speaker cluster is based on comparing the coarse feature of the speaker to a second level speaker independent parameter representing the second level subset of the subset of the plurality of biometric signatures;
extracting, using the processor of the computer, a plurality of Mel-Frequency Cepstral Coefficients (MFCC) and a plurality of Gaussian Mixture Model (GMM) components from the speech data;
determining a biometric signature of the speaker based on the plurality of MFCC and the plurality of GMM components; and
determining in real time, using the processor of the computer, an identity of the speaker by comparing the biometric signature of the speaker to the second level subset of the subset of the plurality of biometric signatures, wherein each of the plurality of biometric signatures is specific to one of the plurality of speakers.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for real-time speaker recognition including obtaining speech data of a speaker, extracting, using a processor of a computer, a coarse feature of the speaker from the speech data, identifying the speaker as belonging to a pre-determined speaker cluster based on the coarse feature of the speaker, extracting, using the processor of the computer, a plurality of Mel-Frequency Cepstral Coefficients (MFCC) and a plurality of Gaussian Mixture Model (GMM) components from the speech data, determining a biometric signature of the speaker based on the plurality of MFCC and the plurality of GMM components, and determining in real time, using the processor of the computer, an identity of the speaker by comparing the biometric signature of the speaker to one of a plurality of biometric signature libraries associated with the pre-determined speaker cluster.
-
Citations
22 Claims
-
1. A method for real-time speaker recognition, comprising:
-
obtaining speech data of a speaker to identify the speaker from a plurality of speakers; extracting, using a processor of a computer, a coarse feature of the speaker from the speech data; identifying the speaker as belonging to a pre-determined speaker cluster that is one of a plurality of partitions of the plurality of speakers and corresponds to a subset of a plurality of biometric signatures of the plurality of speakers, wherein identifying the speaker as belonging to the pre-determined speaker cluster is based on comparing the coarse feature of the speaker to a speaker independent parameter representing the subset of the plurality of biometric signatures; further identifying, in response to identifying the speaker as belonging to the pre-determined speaker cluster, the speaker as belonging to a second level pre-determined speaker cluster that is one of a plurality of second level partitions of the pre-determined speaker cluster and corresponds to a second level subset of the subset of the plurality of biometric signatures, wherein identifying the speaker as belonging to the second level pre-determined speaker cluster is based on comparing the coarse feature of the speaker to a second level speaker independent parameter representing the second level subset of the subset of the plurality of biometric signatures; extracting, using the processor of the computer, a plurality of Mel-Frequency Cepstral Coefficients (MFCC) and a plurality of Gaussian Mixture Model (GMM) components from the speech data; determining a biometric signature of the speaker based on the plurality of MFCC and the plurality of GMM components; and determining in real time, using the processor of the computer, an identity of the speaker by comparing the biometric signature of the speaker to the second level subset of the subset of the plurality of biometric signatures, wherein each of the plurality of biometric signatures is specific to one of the plurality of speakers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer readable medium, embodying instructions when executed by the computer to perform real-time speaker recognition, the instructions comprising functionality for:
-
obtaining speech data of a speaker to identify the speaker from a plurality of speakers; extracting a coarse feature of the speaker from the speech data; identifying the speaker as belonging to a pre-determined speaker cluster that is one of a plurality of partitions of the plurality of speakers and corresponds to a subset of a plurality of biometric signatures of the plurality of speakers, wherein identifying the speaker as belonging to the pre-determined speaker cluster is based on comparing the coarse feature of the speaker to a speaker independent parameter representing the subset of the plurality of biometric signatures; further identifying, in response to identifying the speaker as belonging to the pre-determined speaker cluster, the speaker as belonging to a second level pre-determined speaker cluster that is one of a plurality of second level partitions of the pre-determined speaker cluster and corresponds to a second level subset of the subset of the plurality of biometric signatures, wherein identifying the speaker as belonging to the second level pre-determined speaker cluster is based on comparing the coarse feature of the speaker to a second level speaker independent parameter representing the second level subset of the subset of the plurality of biometric signatures; extracting a plurality of Mel-Frequency Cepstral Coefficients (MFCC) and a plurality of Gaussian Mixture Model (GMM) components from the speech data; determining a biometric signature of the speaker based on the plurality of MFCC and the plurality of GMM components; and determining in real time, using the processor of the computer, an identity of the speaker by comparing the biometric signature of the speaker to the second level subset of the subset of the plurality of biometric signatures, wherein each of the plurality of biometric signatures is specific to one of the plurality of speakers. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system for speaker recognition, comprising:
-
a repository storing a plurality of biometric signature libraries; a processor; and memory storing instructions when executed by the processor comprising functionalities for;
obtaining speech data of a speaker to identify the speaker from a plurality of speakers;extracting a coarse feature of the speaker from the speech data; identifying the speaker as belonging to a pre-determined speaker cluster that is one of a plurality of partitions of the plurality of speakers and corresponds to a subset of a plurality of biometric signatures of the plurality of speakers, wherein identifying the speaker as belonging to the pre-determined speaker cluster is based on comparing the coarse feature of the speaker to a speaker independent parameter representing the subset of the plurality of biometric signatures; further identifying, in response to identifying the speaker as belonging to the pre-determined speaker cluster, the speaker as belonging to a second level pre-determined speaker cluster that is one of a plurality of second level partitions of the pre-determined speaker cluster and corresponds to a second level subset of the subset of the plurality of biometric signatures, wherein identifying the speaker as belonging to the second level pre-determined speaker cluster is based on comparing the coarse feature of the speaker to a second level speaker independent parameter representing the second level subset of the subset of the plurality of biometric signatures; extracting a plurality of Mel-Frequency Cepstral Coefficients (MFCC) for a Gaussian Mixture Model (GMM) from the speech data; determining a biometric signature of the speaker based on the plurality of MFCC and the GMM; and determining in real time, using the processor of the computer, an identity of the speaker by comparing the biometric signature of the speaker to the second level subset of the subset of the plurality of biometric signatures, wherein each of the plurality of biometric signatures is specific to one of the plurality of speakers. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
Specification