Detection of target and non-target users using multi-session information
First Claim
1. A method for maintaining speaker recognition performance, comprising:
- training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions;
receiving a voice signal of a speaker seeking access to an environment via at least one network;
extracting one or more speech statistics of the voice signal for determining a speaker recognition score of the speaker seeking access;
using the plurality of models to conclude whether the speaker seeking access is a non-ideal target speaker that is authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a failure to recognize the non-ideal target speaker as being authorized to access the environment, and prevents access to the environment, or a non-ideal non-target speaker that is not authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a misidentification of the non-ideal non-target speaker as being authorized to access the environment, and allows access to the environment, wherein using the plurality of models to conclude comprises;
calculating a first probability that the speaker seeking access is the non-ideal target speaker;
calculating a second probability that the speaker seeking access is the non-ideal non-target speaker; and
determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold; and
restricting the speaker seeking access from accessing the environment upon determining that the first probability, second probability or the sum of the first probability and the second probability is above the probability threshold;
wherein the plurality of speakers comprise known non-ideal target speakers and known non-ideal non-target speakers;
wherein the known non-ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective first speaker recognition scores within a predetermined value below a speaker recognition threshold that prevent access to the environment;
wherein the known non-ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective second speaker recognition scores within a predetermined value above the speaker recognition threshold that allow access to the environment;
wherein the plurality of speakers further comprise ideal target speakers and ideal non-target speakers;
wherein the ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective third speaker recognition scores greater than the predetermined value above the speaker recognition threshold that allow access to the environment;
wherein the ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective fourth speaker recognition scores less than the predetermined value below the speaker recognition threshold that prevent access to the environment; and
wherein the training, receiving, extracting, using and determining steps are performed by a computer system comprising a memory and at least one processor coupled to the memory.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for maintaining speaker recognition performance are provided. A method for maintaining speaker recognition performance, comprises training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions, and using the plurality of models to conclude whether a speaker seeking access to an environment is a non-ideal target speaker or a non-ideal non-target speaker. Using the plurality of models to conclude comprises calculating a first probability that the speaker seeking access is the non-ideal target speaker, calculating a second probability that the speaker seeking access is the non-ideal non-target speaker, and determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold.
-
Citations
20 Claims
-
1. A method for maintaining speaker recognition performance, comprising:
-
training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions; receiving a voice signal of a speaker seeking access to an environment via at least one network; extracting one or more speech statistics of the voice signal for determining a speaker recognition score of the speaker seeking access; using the plurality of models to conclude whether the speaker seeking access is a non-ideal target speaker that is authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a failure to recognize the non-ideal target speaker as being authorized to access the environment, and prevents access to the environment, or a non-ideal non-target speaker that is not authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a misidentification of the non-ideal non-target speaker as being authorized to access the environment, and allows access to the environment, wherein using the plurality of models to conclude comprises; calculating a first probability that the speaker seeking access is the non-ideal target speaker; calculating a second probability that the speaker seeking access is the non-ideal non-target speaker; and determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold; and restricting the speaker seeking access from accessing the environment upon determining that the first probability, second probability or the sum of the first probability and the second probability is above the probability threshold; wherein the plurality of speakers comprise known non-ideal target speakers and known non-ideal non-target speakers; wherein the known non-ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective first speaker recognition scores within a predetermined value below a speaker recognition threshold that prevent access to the environment; wherein the known non-ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective second speaker recognition scores within a predetermined value above the speaker recognition threshold that allow access to the environment; wherein the plurality of speakers further comprise ideal target speakers and ideal non-target speakers; wherein the ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective third speaker recognition scores greater than the predetermined value above the speaker recognition threshold that allow access to the environment; wherein the ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective fourth speaker recognition scores less than the predetermined value below the speaker recognition threshold that prevent access to the environment; and wherein the training, receiving, extracting, using and determining steps are performed by a computer system comprising a memory and at least one processor coupled to the memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for maintaining speaker recognition performance, comprising:
-
a training module capable of training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions; an analysis module capable of; receiving a voice signal of a speaker seeking access to an environment via at least one network; extracting one or more speech statistics of the voice signal for determining a speaker recognition score of the speaker seeking access; using the plurality of models to conclude whether the speaker seeking access is a non-ideal target speaker that is authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a failure to recognize the non-ideal target speaker as being authorized to access the environment, and prevents access to the environment, or a non-ideal non-target speaker that is not authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a misidentification of the non-ideal non-target speaker as being authorized to access the environment, and allows access to the environment; calculating a first probability that the speaker seeking access is the non-ideal target speaker; calculating a second probability that the speaker seeking access is the non-ideal non-target speaker; and determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold; and an access module capable of restricting the speaker seeking access from accessing the environment upon determining by the analysis module that the first probability, second probability or the sum of the first probability and the second probability is above the probability threshold; wherein the plurality of speakers comprise known non-ideal target speakers and known non-ideal non-target speakers; wherein the known non-ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective first speaker recognition scores within a predetermined value below a speaker recognition threshold that prevent access to the environment; wherein the known non-ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective second speaker recognition scores within a predetermined value above the speaker recognition threshold that allow access to the environment; wherein the plurality of speakers further comprise ideal target speakers and ideal non-target speakers; wherein the ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective third speaker recognition scores greater than the predetermined value above the speaker recognition threshold that allow access to the environment; and wherein the ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective fourth speaker recognition scores less than the predetermined value below the speaker recognition threshold that prevent access to the environment. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer program product for maintaining speaker recognition performance, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising:
-
training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions; receiving a voice signal of a speaker seeking access to an environment via at least one network; extracting one or more speech statistics of the voice signal for determining a speaker recognition score of the speaker seeking access; using the plurality of models to detect whether the speaker seeking access is a non-ideal target speaker that is authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a failure to recognize the non-ideal target speaker as being authorized to access the environment, and prevents access to the environment, or a non-ideal non-target speaker that is not authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a misidentification of the non-ideal non-target speaker as being authorized to access the environment, and allows access to the environment, wherein using the plurality of models comprises; calculating a first probability that the speaker seeking access is the non-ideal target speaker; calculating a second probability that the speaker seeking access is the non-ideal non-target speaker; and determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold; and restricting the speaker seeking access from accessing the environment upon determining that the first probability, second probability or the sum of the first probability and the second probability is above the probability threshold; wherein the plurality of speakers comprise known non-ideal target speakers and known non-ideal non-target speakers; wherein the known non-ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective first speaker recognition scores within a predetermined value below a speaker recognition threshold that prevent access to the environment; wherein the known non-ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective second speaker recognition scores within a predetermined value above the speaker recognition threshold that allow access to the environment; wherein the plurality of speakers further comprise ideal target speakers and ideal non-target speakers; wherein the ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective third speaker recognition scores greater than the predetermined value above the speaker recognition threshold that allow access to the environment; and wherein the ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective fourth speaker recognition scores less than the predetermined value below the speaker recognition threshold that prevent access to the environment. - View Dependent Claims (17, 18, 19, 20)
-
Specification