Text independent speaker recognition with simultaneous speech recognition for transparent command ambiguity resolution and continuous access control

US 6,477,500 B2
Filed: 04/12/2000
Issued: 11/05/2002
Est. Priority Date: 02/02/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method of performing speaker recognition in connection with access control, comprising:

performing text-independent speaker recognition on a speech signal to identify a speaker;

recognizing a spoken utterance from said speech signal, said spoken utterance being uttered in connection with an access request;

determining if said identified speaker is authorized to obtain access;

granting said access if said identified speaker is indeed authorized; and

continuously verifying the recognition of said speaker in said speaker recognition step as said speaker inputs additional speech signals.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Feature vectors representing each of a plurality of overlapping frames of an arbitrary, text independent speech signal are computed and compared to vector parameters and variances stored as codewords in one or more codebooks corresponding to each of one or more enrolled users to provide speaker dependent information for speech recognition and/or ambiguity resolution. Other information such as aliases and preferences of each enrolled user may also be enrolled and stored, for example, in a database. Correspondence of the feature vectors may be ranked by closeness of correspondence to a codeword entry and the number of frames corresponding to each codebook are accumulated or counted to identify a potential enrolled speaker. The differences between the parameters of the feature vectors and codewords in the codebooks can be used to identify a new speaker and an enrollment procedure can be initiated. Continuous authorization and access control can be carried out based on any utterance either by verification of the authorization of a speaker of a recognized command or comparison with authorized commands for the recognized speaker. Text independence also permits coherence checks to be carried out for commands to validate the recognition process.

115 Citations

View as Search Results

32 Claims

1. A method of performing speaker recognition in connection with access control, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  recognizing a spoken utterance from said speech signal, said spoken utterance being uttered in connection with an access request;
  
  determining if said identified speaker is authorized to obtain access;
  
  granting said access if said identified speaker is indeed authorized; and
  
  continuously verifying the recognition of said speaker in said speaker recognition step as said speaker inputs additional speech signals.

2. A method of performing speaker recognition, comprising:
- simultaneously performing text-independent speaker recognition on a speech signal to simultaneously is identify a speaker;
  
  recognizing a command from said speech signal;
  
  interpreting said command based on said speaker identified in said text-independent speaker recognition step;
  
  retrieving enrolled information from a database corresponding to interpretation of said command in said interpreting step; and
  
  performing said command based on said enrolled information.
- View Dependent Claims (3)
- - 3. The method of claim 2, wherein said enrolled information includes at least one of information corresponding to a preference of said speaker, a word corresponding to a macro of said speaker, a procedural short-cut, and a speaker-specific vocabulary word.

4. A method of performing speaker recognition, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  recognizing a command from said speech signal;
  
  simultaneously interpreting said command based on said speaker identified in said text-independent speaker recognition step;
  
  performing coherence checking for said command;
  
  retrieving enrolled information corresponding to said speaker identified in said text-independent speaker recognition step; and
  
  interpreting said command in accordance with said enrolled information retrieved corresponding to said speaker identified in said text-independent speaker recognition step.

5. A method of performing speaker recognition, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  simultaneously recognizing a command from said speech signal;
  
  interpreting said command based on said speaker identified in said text-independent speaker recognition step, said interpreting of said command being based on alias information corresponding to said speaker identified in said text-independent speaker recognition step, said alias information including at least one of information corresponding to a preference of the speaker, a word corresponding to a macro of the speaker, a procedural short-cut, and a speaker-specific vocabulary word.

6. A method of performing speaker recognition, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  simultaneously recognizing a command from said speech signal;
  
  interpreting said command based on said speaker identified in said text-independent speaker recognition step, said interpreting step in turn comprising the sub-steps of;
  
  determining if said recognized speaker is authorized to issue said command; and
  
  performing said command if said identified speaker is indeed authorized.
- View Dependent Claims (7, 8)
- - 7. The method of claim 6, wherein:
8. The method of claim 6, wherein:
- said determining sub-step comprises comparing said command with a list of commands which are authorized to be performed by said identified speaker; and
  
  said performing sub-step comprises performing said command if a match exists between said command and said list.

9. A method of performing speaker recognition in connection with access control, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  simultaneously recognizing a spoken utterance from said speech signal, said spoken utterance being uttered in connection with an access request;
  
  determining if said identified speaker is authorized to obtain access; and
  
  granting said access if said identified speaker is indeed authorized.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9, wherein:
11. The method of claim 10, wherein:
- said access request is directed to a call center;
  
  said determining step comprises verifying said identified speaker as a customer having a verified customer identity; and
  
  said granting step comprises granting said access if said identified speaker is indeed a customer having a verified customer identity.
12. The method of claim 9, wherein said step of performing text-independent speaker recognition includes:
- (a) sampling overlapping frames of said speech signal;
  
  (b) computing a feature vector for each said frame of said speech signal;
  
  (c) comparing each said feature with vector parameters and variances stored in a plurality of codebooks corresponding to enrolled speakers;
  
  (d) accumulating a number of frames for which the corresponding feature vector corresponds to vector parameters and variances in said codebooks; and
  
  (e) identifying said speaker in response to results of at least one of said accumulating step and said comparing step.
13. The method of claim 12, wherein said spoken utterance is a command, and wherein said recognizing step includes:
- recognizing said command using a speech processing model formed from information independent from information used to form said codebooks.

14. A method of performing speaker recognition, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  simultaneously recognizing a command from said speech signal;
  
  interpreting said command based on said speaker identified in said text-independent speaker recognition step, said interpreting step in turn comprising the sub-steps of;
  
  comparing said command with a list of authorized commands; and
  
  performing said command based on results obtained from said comparing step.

15. A method of performing speaker recognition, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  simultaneously recognizing a spoken utterance from said speech signal to obtain a recognized spoken utterance; and
  
  resolving ambiguity in said recognized spoken utterance based on said identity of said speaker identified in said text-independent speaker recognition step.
- View Dependent Claims (16)
- - 16. The method of claim 15, wherein said spoken utterance is a command which has a meaning that varies depending on who utters said utterance.

17. A method of performing speaker recognition, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker, said performing in turn comprising identification of said speaker as a new user;
  
  recognizing a spoken utterance from said speech signal; and
  
  enrolling said new user in real time in response to said identification of said speaker as said new user.

18. A method of performing speaker recognition in connection with access control, comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker, said performing in turn comprising the sub-steps of;
  
  (a) sampling overlapping frames of said speech signal;
  
  (b) computing a feature vector for each said frame of said speech signal (c) comparing each said feature vector with vector parameters and variances stored in a plurality of codebooks corresponding to enrolled speakers;
  
  (d) accumulating a number of frames for which the corresponding feature vector corresponds to vector parameters and variances in said codebooks; and
  
  (e) identifying said speaker in response to results of at least one of said accumulating step and said comparing step;
  
  recognizing a spoken utterance from said speech signal, said spoken utterance being uttered in connection with an access request;
  
  determining if said identified speaker is authorized to obtain access;
  
  granting said access if said identified speaker is indeed authorized; and
  
  continuously verifying the recognition of said speaker in said identifying step as said speaker inputs additional speech signals.

19. An apparatus for speaker recognition, comprising:
- a speaker recognition processor which performs text-independent speaker recognition on a speech signal to identify a speaker;
  
  a speech recognizer which simultaneously recognizes a command from said speech signal;
  
  a controller-interpreter which interprets said command based on said speaker identified by said speaker recognition processor, said controller-interpreter being configured to determine if said identified speaker is authorized to issue said command, and to authorize performance of said command if said identified speaker is indeed authorized.
- View Dependent Claims (20, 21)
- - 20. The apparatus of claim 19, further comprising a speaker database containing a list of speakers who are authorized to perform said command, wherein said controller-interpreter and said speaker recognition processor are configured to compare said identified speaker with said list of speakers and to authorize performance of said command if a match exists between said identified speaker and said list.
  - 21. The apparatus of claim 19, wherein said controller-interpreter stores a list of commands which are authorized to be performed by said identified speaker, and is configured to compare said command with said list of commands and to authorize performance of said command if a match exists between said command and said list.

22. An apparatus for speaker recognition in connection with access control, comprising:
- a speaker recognition processor which performs text-independent speaker recognition on a speech signal to identify a speaker, said speaker recognition processor being configured to determine if said identified speaker is authorized to obtain access;
  
  a speech recognizer which simultaneously recognizes a spoken utterance from said speech signal, said spoken utterance being uttered in connection with an access request; and
  
  a controller-interpreter which is configured to authorize granting of said access if said identified speaker is indeed authorized.
- View Dependent Claims (23, 24, 25)
- - 23. The apparatus of claim 22, further comprising a speaker database containing a list of speakers who are authorized to obtain access, wherein:
24. The apparatus of claim 23, wherein said access request is directed to a call center and said speaker database includes customer identity information.
25. The apparatus of claim 22, further comprising:
- an acoustic front end which is configured to sample overlapping frames of said speech signal and to compute a feature vector for each of said frames of said speech signal; and
  
  a speaker database which stores vector parameters and variances in a plurality of codebooks corresponding to enrolled speakers;
  
  wherein said speaker recognition processor is configured to;
  
  compare each said feature vector with said vector parameters and variances to obtain a comparison;
  
  accumulate a number of frames for which the corresponding feature vector corresponds to said vector parameters and said variances in said codebooks to obtain an accumulation; and
  
  identify said speaker in response to at least one of said accumulation and said comparison.

26. An apparatus for speaker recognition, comprising:
- a speaker recognition processor which performs text-independent speaker recognition on a speech signal to identify a speaker;
  
  a speech recognizer which simultaneously recognizes a spoken utterance from said speech signal to obtain a recognized spoken utterance; and
  
  a controller-interpreter which is configured to resolve ambiguity in said recognized spoken utterance based on said identity of said speaker identified by said speaker recognition processor.
- View Dependent Claims (27)
- - 27. The apparatus of claim 26, wherein said spoken utterance is a command which has a meaning that varies depending on who utters said utterance.

28. An apparatus for speaker recognition, comprising:
- a speaker recognition processor which performs text-independent speaker recognition on a speech signal to identify a speaker, said speaker recognition processor being configured to identify said speaker as a new user;
  
  a speech recognizer which simultaneously recognizes a spoken utterance from said speech signal; and
  
  a controller-interpreter, which, together with said speaker recognition processor, is configured to enroll said new user in real time in response to said identification of said speaker as said new user.

29. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for performing speaker recognition in connection with access control, said method steps comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  recognizing a spoken utterance from said speech signal, said spoken utterance being uttered in connection with an access request;
  
  determining if said identified speaker is authorized to obtain access;
  
  granting said access if said identified speaker is indeed authorized; and
  
  continuously verifying the recognition of said speaker in said speaker recognition step as said speaker inputs additional speech signals.

30. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for performing speaker recognition, said method steps comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker, said performing in turn comprising identification of said speaker as a new user;
  
  simultaneously recognizing a spoken utterance from said speech signal; and
  
  enrolling said new user in real time in response to said identification of said speaker as said new user.

31. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for performing speaker recognition, said method steps comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  simultaneously recognizing a command from said speech signal; and
  
  interpreting said command based on said speaker identified in said textindependent speaker recognition step, said interpreting method step in turn comprising the sub-steps of;
  
  determining if said identified speaker is authorized to issue said command; and
  
  performing said command if said identified speaker is indeed authorized.

32. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for performing speaker recognition in connection with access control, said method steps comprising:
- performing text-independent speaker recognition on a speech signal to identify a speaker;
  
  simultaneously recognizing a spoken utterance from said speech signal, said spoken utterance being uttered in connection with an access request;
  
  determining if said identified speaker is authorized to obtain access; and
  
  granting said access if said identified speaker is indeed authorized.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Maes, Stephane Herman
Primary Examiner(s)
SMITS, TALIVALDIS IVARS

Application Number

US09/548,016
Publication Number

US 20020002465A1
Time in Patent Office

937 Days
Field of Search

704/246, 704/251, 704/275, 704/273
US Class Current

704/275
CPC Class Codes

G10L 15/065   Adaptation

G10L 17/04   Training, enrolment or mode...

G10L 17/14   Use of phonemic categorisat...

Text independent speaker recognition with simultaneous speech recognition for transparent command ambiguity resolution and continuous access control

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

115 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Text independent speaker recognition with simultaneous speech recognition for transparent command ambiguity resolution and continuous access control

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

115 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links