Detection of target and non-target users using multi-session information

US 9,837,080 B2
Filed: 08/21/2014
Issued: 12/05/2017
Est. Priority Date: 08/21/2014
Status: Active Grant

First Claim

Patent Images

1. A method for maintaining speaker recognition performance, comprising:

training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions;

receiving a voice signal of a speaker seeking access to an environment via at least one network;

extracting one or more speech statistics of the voice signal for determining a speaker recognition score of the speaker seeking access;

using the plurality of models to conclude whether the speaker seeking access is a non-ideal target speaker that is authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a failure to recognize the non-ideal target speaker as being authorized to access the environment, and prevents access to the environment, or a non-ideal non-target speaker that is not authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a misidentification of the non-ideal non-target speaker as being authorized to access the environment, and allows access to the environment, wherein using the plurality of models to conclude comprises;

calculating a first probability that the speaker seeking access is the non-ideal target speaker;

calculating a second probability that the speaker seeking access is the non-ideal non-target speaker; and

determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold; and

restricting the speaker seeking access from accessing the environment upon determining that the first probability, second probability or the sum of the first probability and the second probability is above the probability threshold;

wherein the plurality of speakers comprise known non-ideal target speakers and known non-ideal non-target speakers;

wherein the known non-ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective first speaker recognition scores within a predetermined value below a speaker recognition threshold that prevent access to the environment;

wherein the known non-ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective second speaker recognition scores within a predetermined value above the speaker recognition threshold that allow access to the environment;

wherein the plurality of speakers further comprise ideal target speakers and ideal non-target speakers;

wherein the ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective third speaker recognition scores greater than the predetermined value above the speaker recognition threshold that allow access to the environment;

wherein the ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective fourth speaker recognition scores less than the predetermined value below the speaker recognition threshold that prevent access to the environment; and

wherein the training, receiving, extracting, using and determining steps are performed by a computer system comprising a memory and at least one processor coupled to the memory.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for maintaining speaker recognition performance are provided. A method for maintaining speaker recognition performance, comprises training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions, and using the plurality of models to conclude whether a speaker seeking access to an environment is a non-ideal target speaker or a non-ideal non-target speaker. Using the plurality of models to conclude comprises calculating a first probability that the speaker seeking access is the non-ideal target speaker, calculating a second probability that the speaker seeking access is the non-ideal non-target speaker, and determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold.

Citations

20 Claims

1. A method for maintaining speaker recognition performance, comprising:
- training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions;
  
  receiving a voice signal of a speaker seeking access to an environment via at least one network;
  
  extracting one or more speech statistics of the voice signal for determining a speaker recognition score of the speaker seeking access;
  
  using the plurality of models to conclude whether the speaker seeking access is a non-ideal target speaker that is authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a failure to recognize the non-ideal target speaker as being authorized to access the environment, and prevents access to the environment, or a non-ideal non-target speaker that is not authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a misidentification of the non-ideal non-target speaker as being authorized to access the environment, and allows access to the environment, wherein using the plurality of models to conclude comprises;
  
  calculating a first probability that the speaker seeking access is the non-ideal target speaker;
  
  calculating a second probability that the speaker seeking access is the non-ideal non-target speaker; and
  
  determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold; and
  
  restricting the speaker seeking access from accessing the environment upon determining that the first probability, second probability or the sum of the first probability and the second probability is above the probability threshold;
  
  wherein the plurality of speakers comprise known non-ideal target speakers and known non-ideal non-target speakers;
  
  wherein the known non-ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective first speaker recognition scores within a predetermined value below a speaker recognition threshold that prevent access to the environment;
  
  wherein the known non-ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective second speaker recognition scores within a predetermined value above the speaker recognition threshold that allow access to the environment;
  
  wherein the plurality of speakers further comprise ideal target speakers and ideal non-target speakers;
  
  wherein the ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective third speaker recognition scores greater than the predetermined value above the speaker recognition threshold that allow access to the environment;
  
  wherein the ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective fourth speaker recognition scores less than the predetermined value below the speaker recognition threshold that prevent access to the environment; and
  
  wherein the training, receiving, extracting, using and determining steps are performed by a computer system comprising a memory and at least one processor coupled to the memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, wherein calculating the first and second probabilities comprises using speaker recognition scores for the models corresponding to the known non-ideal target speakers and the known non-ideal non-target speakers.
  - 3. The method according to claim 1, wherein calculating the first probability is performed using the following equation:
  - 4. The method according to claim 3, wherein p(S|goat) is calculated as
  - 5. The method according to claim 1, wherein calculating the second probability is performed using the following equation:
  - 6. The method according to claim 5, wherein p (S|wolf) is calculated as
  - 7. The method according to claim 1, further comprising routing the speaker seeking access to a human operator to perform person to person verification upon determining that the first probability, second probability or the sum of the first probability and the second probability are above the probability threshold.
  - 8. The method according to claim 1, further comprising checking the speaker recognition score of the speaker seeking access against the speaker recognition threshold upon determining that none of the first probability, second probability and the sum of the first probability and the second probability are above the probability threshold, and permitting access to the environment if the speaker recognition score of the speaker seeking access is above the speaker recognition threshold.

9. A system for maintaining speaker recognition performance, comprising:
- a training module capable of training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions;
  
  an analysis module capable of;
  
  receiving a voice signal of a speaker seeking access to an environment via at least one network;
  
  extracting one or more speech statistics of the voice signal for determining a speaker recognition score of the speaker seeking access;
  
  using the plurality of models to conclude whether the speaker seeking access is a non-ideal target speaker that is authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a failure to recognize the non-ideal target speaker as being authorized to access the environment, and prevents access to the environment, or a non-ideal non-target speaker that is not authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a misidentification of the non-ideal non-target speaker as being authorized to access the environment, and allows access to the environment;
  
  calculating a first probability that the speaker seeking access is the non-ideal target speaker;
  
  calculating a second probability that the speaker seeking access is the non-ideal non-target speaker; and
  
  determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold; and
  
  an access module capable of restricting the speaker seeking access from accessing the environment upon determining by the analysis module that the first probability, second probability or the sum of the first probability and the second probability is above the probability threshold;
  
  wherein the plurality of speakers comprise known non-ideal target speakers and known non-ideal non-target speakers;
  
  wherein the known non-ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective first speaker recognition scores within a predetermined value below a speaker recognition threshold that prevent access to the environment;
  
  wherein the known non-ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective second speaker recognition scores within a predetermined value above the speaker recognition threshold that allow access to the environment;
  
  wherein the plurality of speakers further comprise ideal target speakers and ideal non-target speakers;
  
  wherein the ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective third speaker recognition scores greater than the predetermined value above the speaker recognition threshold that allow access to the environment; and
  
  wherein the ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective fourth speaker recognition scores less than the predetermined value below the speaker recognition threshold that prevent access to the environment.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system according to claim 9, wherein the analysis module is further capable of calculating the first and second probabilities by using speaker recognition scores for the models corresponding to the known non-ideal target speakers and the known non-ideal non-target speakers.
  - 11. The system according to claim 9, wherein the analysis module is further capable of calculating the first probability using the following equation:
  - 12. The system according to claim 11, wherein p(S|goat) is calculated as
  - 13. The system according to claim 9, wherein the analysis module is further capable of calculating the second probability using the following equation:
  - 14. The system according to claim 9, wherein the access module is further capable of routing the speaker seeking access to a human operator to perform person to person verification upon determining by the analysis module that the first probability, second probability or the sum of the first probability and the second probability are above the probability threshold.
  - 15. The system according to claim 9, wherein:
    - the analysis module is further capable of checking the speaker recognition score of the speaker seeking access against the speaker recognition threshold upon determining that none of the first probability, second probability and the sum of the first probability and the second probability are above the probability threshold; and
      
      the access module is further capable of permitting access to the environment if the speaker recognition score of the speaker seeking access is above the speaker recognition threshold.

16. A computer program product for maintaining speaker recognition performance, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to perform a method comprising:
- training a plurality of models respectively corresponding to speaker recognition scores from a plurality of speakers over a plurality of sessions;
  
  receiving a voice signal of a speaker seeking access to an environment via at least one network;
  
  extracting one or more speech statistics of the voice signal for determining a speaker recognition score of the speaker seeking access;
  
  using the plurality of models to detect whether the speaker seeking access is a non-ideal target speaker that is authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a failure to recognize the non-ideal target speaker as being authorized to access the environment, and prevents access to the environment, or a non-ideal non-target speaker that is not authorized to access the environment, but provides a voice signal which yields a speaker recognition score that results in a misidentification of the non-ideal non-target speaker as being authorized to access the environment, and allows access to the environment, wherein using the plurality of models comprises;
  
  calculating a first probability that the speaker seeking access is the non-ideal target speaker;
  
  calculating a second probability that the speaker seeking access is the non-ideal non-target speaker; and
  
  determining whether the first probability, the second probability or a sum of the first probability and the second probability is above a probability threshold; and
  
  restricting the speaker seeking access from accessing the environment upon determining that the first probability, second probability or the sum of the first probability and the second probability is above the probability threshold;
  
  wherein the plurality of speakers comprise known non-ideal target speakers and known non-ideal non-target speakers;
  
  wherein the known non-ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective first speaker recognition scores within a predetermined value below a speaker recognition threshold that prevent access to the environment;
  
  wherein the known non-ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective second speaker recognition scores within a predetermined value above the speaker recognition threshold that allow access to the environment;
  
  wherein the plurality of speakers further comprise ideal target speakers and ideal non-target speakers;
  
  wherein the ideal target speakers comprise authorized speakers each having a right to access the environment and yielding respective third speaker recognition scores greater than the predetermined value above the speaker recognition threshold that allow access to the environment; and
  
  wherein the ideal non-target speakers comprise unauthorized speakers each not having a right to access the environment and yielding respective fourth speaker recognition scores less than the predetermined value below the speaker recognition threshold that prevent access to the environment.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer program product according to claim 16, wherein calculating the first probability is performed using the following equation:
  - 18. The computer program product according to claim 17, wherein p(S|goat) is calculated as
  - 19. The computer program product according to claim 16, wherein calculating the second probability is performed using the following equation:
  - 20. The computer program product according to claim 19, wherein p(S|wolf) is calculated as

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Aronowitz, Hagai, Ben-David, Shay, Nahamoo, David, Pelecanos, Jason W., Toledo-Ronen, Orith
Primary Examiner(s)
Santiago Cordero, Marivelisse
Assistant Examiner(s)
HARRIS, KEARA S

Application Number

US14/465,415
Publication Number

US 20160055844A1
Time in Patent Office

1,202 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 17/06 Decision making techniques;...

Detection of target and non-target users using multi-session information

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Detection of target and non-target users using multi-session information

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links