Combining results from first and second speaker recognition processes

US 10,379,810 B2
Filed: 06/02/2017
Issued: 08/13/2019
Est. Priority Date: 06/06/2016
Status: Active Grant

First Claim

Patent Images

1. A method of processing a received signal representing a user'"'"'s speech, the method comprising:

performing a first speaker recognition process on a first portion of the received signal, to obtain a first output result;

performing a second speaker recognition process on a second portion of the received signal that is different from the first portion of the received signal, to obtain a second output result, wherein the second speaker recognition process is different from the first speaker recognition process;

applying respective weighting values to the first and second output results to form first and second weighted results respectively;

combining the first and second weighted results to obtain a combined output result indicating a likelihood that the user is a registered user; and

performing an antispoofing process on at least one of the first and second portions of the received signal to obtain an antispoofing score;

wherein the weighting value applied to the second output result is determined by;

excluding fragments of the second portion of the received signal that do not contain speech, and determining a total length of fragments of the second portion of the received signal that do contain speech; and

setting the weighting value applied to the second output result based on the total length of fragments of the second portion of the received signal that do contain speech; and

wherein at least one of the respective weighting values applied to the first and second output results is based on the respective antispoofing score obtained from the respective portion of the received signal.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A received signal represents a user'"'"'s speech. A first speaker recognition process is performed on a first portion of the received signal, to obtain a first output result. A second speaker recognition process is performed on a second portion of the received signal that is different from the first portion of the received signal, to obtain a second output result. The second speaker recognition process is different from the first speaker recognition process. The first and second output results are combined to obtain a combined output result indicating a likelihood that the user is a registered user.

Citations

18 Claims

1. A method of processing a received signal representing a user'"'"'s speech, the method comprising:
- performing a first speaker recognition process on a first portion of the received signal, to obtain a first output result;
  
  performing a second speaker recognition process on a second portion of the received signal that is different from the first portion of the received signal, to obtain a second output result, wherein the second speaker recognition process is different from the first speaker recognition process;
  
  applying respective weighting values to the first and second output results to form first and second weighted results respectively;
  
  combining the first and second weighted results to obtain a combined output result indicating a likelihood that the user is a registered user; and
  
  performing an antispoofing process on at least one of the first and second portions of the received signal to obtain an antispoofing score;
  
  wherein the weighting value applied to the second output result is determined by;
  
  excluding fragments of the second portion of the received signal that do not contain speech, and determining a total length of fragments of the second portion of the received signal that do contain speech; and
  
  setting the weighting value applied to the second output result based on the total length of fragments of the second portion of the received signal that do contain speech; and
  
  wherein at least one of the respective weighting values applied to the first and second output results is based on the respective antispoofing score obtained from the respective portion of the received signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. A method according to claim 1, wherein the method is performed in response to determining that the received signal represents a predetermined trigger phrase.
  - 3. A method according to claim 1, wherein said first portion of the received signal represents the predetermined trigger phrase.
  - 4. A method according to claim 1, wherein at least one of the respective weighting values applied to the first and second output results is based on one or more of:
    - a measure of a degree of fit of the respective portion of the received signal to a background model of the respective speaker recognition process;
      
      a measure of a signal-to-noise ratio of the respective portion of the received signal;
      
      a measure of a signal-to-interference ratio of the respective portion of the received signal;
      
      a measure of a direct-to-reflected ratio of the respective portion of the received signal;
      
      a measure of a direction from which the respective portion of the received signal was received; and
      
      a measure of a range from which the respective portion of the received signal was received.
  - 5. A method according to claim 1, wherein the first and second speaker recognition processes use different models of the user'"'"'s speech.
  - 6. A method according to claim 1, wherein the first and second speaker recognition processes use different background models.
  - 7. A method according to claim 1, wherein the first portion of the received signal comprises a trigger phrase and the second portion of the received signal comprises a command.
  - 8. A method according to claim 1, wherein the first portion of the received signal corresponds to a first time window and the second portion of the received signal corresponds to a second time window, and wherein the first time window does not overlap the second time window.
  - 9. A method according to claim 1, wherein the first portion of the received signal corresponds to a first time window and the second portion of the received signal corresponds to a second time window, and wherein the first time window at least partially overlaps the second time window.
  - 10. A method according to claim 1, comprising performing at least one further speaker recognition process on at least one further portion of the received signal to obtain at least one respective further output result;
    - andcombining the at least one further output result with the first and second output results to obtain the combined output result indicating a likelihood that the user is a registered user.
  - 11. A method according to claim 1, further comprising performing speech recognition on at least the first portion of the received signal.
  - 12. A method according to claim 1, comprising allowing or preventing a further action by the user based on the combined output result.
  - 13. A method according to claim 1, wherein the weighting value applied to the second output result is increased relative to the weighting value applied to the first output result, as an amount of net speech present in the second portion of the received signal increases.

14. A device for processing a received signal representing a user'"'"'s speech, for performing speaker recognition, wherein the device is configured to:
- perform a first speaker recognition process on a first portion of the received signal, to obtain a first output result;
  
  perform a second speaker recognition process on a second portion of the received signal that is different from the first portion of the received signal, to obtain a second output result, wherein the second speaker recognition process is different from the first speaker recognition process;
  
  apply respective weighting values to the first and second output results to form first and second weighted results respectively;
  
  combine the first and second weighted results to obtain a combined output result indicating a likelihood that the user is a registered user; and
  
  perform an antispoofing process on at least one of the first and second portions of the received signal to obtain an antispoofing score;
  
  wherein the weighting value applied to the second output result is determined by;
  
  excluding fragments of the second portion of the received signal that do not contain speech, and determining a total length of fragments of the second portion of the received signal that do contain speech; and
  
  setting the weighting value applied to the second output result based on the total length of fragments of the second portion of the received signal that do contain speech; and
  
  wherein at least one of the respective weighting values applied to the first and second output results is based on the respective antispoofing score obtained from the respective portion of the received signal.
- View Dependent Claims (15)
- - 15. A device as claimed in claim 14, wherein the device comprises a mobile telephone, an audio player, a video player, a PDA, a mobile computing platform, a games device, a remote controller device, a toy, a machine, or a home automation controller.

16. An integrated circuit device for processing a received signal representing a user'"'"'s speech, for performing speaker recognition, wherein the integrated circuit device is configured to:
- perform a first speaker recognition process on a first portion of the received signal, to obtain a first output result;
  
  perform a second speaker recognition process on a second portion of the received signal that is different from the first portion of the received signal, to obtain a second output result, wherein the second speaker recognition process is different from the first speaker recognition process;
  
  apply respective weighting values to the first and second output results to form first and second weighted results respectively;
  
  combine the first and second weighted results to obtain a combined output result indicating a likelihood that the user is a registered user; and
  
  perform an antispoofing process on at least one of the first and second portions of the received signal to obtain an antispoofing score;
  
  wherein the weighting value applied to the second output result is determined by;
  
  excluding fragments of the second portion of the received signal that do not contain speech, and determining a total length of fragments of the second portion of the received signal that do contain speech; and
  
  setting the weighting value applied to the second output result based on the total length of fragments of the second portion of the received signal that do contain speech; and
  
  wherein at least one of the respective weighting values applied to the first and second output results is based on the respective antispoofing score obtained from the respective portion of the received signal.
- View Dependent Claims (17)
- - 17. An integrated circuit device as claimed in claim 16, wherein the first and second speaker recognition processes use at least one user or background model stored in said device.

18. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method comprising:
- performing a first speaker recognition process on a first portion of the received signal, to obtain a first output result;
  
  performing a second speaker recognition process on a second portion of the received signal that is different from the first portion of the received signal, to obtain a second output result, wherein the second speaker recognition process is different from the first speaker recognition process;
  
  apply respective weighting values to the first and second output results to form first and second weighted results respectively;
  
  combining the first and second weighted results to obtain a combined output result indicating a likelihood that the user is a registered user; and
  
  performing an antispoofing process on at least one of the first and second portions of the received signal to obtain an antispoofing score;
  
  wherein the weighting value applied to the second output result is determined by;
  
  excluding fragments of the second portion of the received signal that do not contain speech, and determining a total length of fragments of the second portion of the received signal that do contain speech; and
  
  setting the weighting value applied to the second output result based on the total length of fragments of the second portion of the received signal that do contain speech; and
  
  wherein at least one of the respective weighting values applied to the first and second output results is based on the respective antispoofing score obtained from the respective portion of the received signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cirrus Logic Incorporated
Original Assignee
Cirrus Logic Incorporated
Inventors
Vaquero Aviles-Casco, Carlos, Garcia Gomar, Marta, Martinez Gonzalez, David
Primary Examiner(s)
Lerner, Martin

Application Number

US15/612,606
Publication Number

US 20170351487A1
Time in Patent Office

802 Days
Field of Search

704246, 704247, 704249, 704250, 704273
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/22   Procedures used during a sp...

G10L 17/00   Speaker identification or v...

G10L 17/10   Multimodal systems, i.e. ba...

G10L 17/22   Interactive procedures; Man...

Combining results from first and second speaker recognition processes

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Combining results from first and second speaker recognition processes

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links