Speaker recognition with assessment of audio frame contribution

US 10,726,849 B2
Filed: 08/01/2017
Issued: 07/28/2020
Est. Priority Date: 08/03/2016
Status: Active Grant

First Claim

Patent Images

1. An apparatus for use in biometric speaker recognition, wherein the apparatus is configured to receive digital audio data derived from an audio signal output by a microphone, the apparatus comprising:

an analyzer for analyzing each frame of a sequence of frames of digital audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame; and

an assessment module for determining for each frame of audio data a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined at least one characteristic of the speech sound;

wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a specific phoneme or as one of a plurality of predefined classes of phonemes, andwherein the contribution indicator varies based on the number of previous instances of the same phoneme or class of phoneme in previous frames of audio data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This application describes methods and apparatus for speaker recognition. An apparatus according to an embodiment has an analyzer (202) for analyzing each frame of a sequence of frames of audio data (A_IN) which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame. An assessment module (203) determines, for each frame of audio data, a contribution indicator of the extent to which the frame of audio data should be used for speaker recognition processing based on the determined characteristic of the speech sound. In this way frames which correspond to speech sounds that are of most use for speaker discrimination may be emphasized and/or frames which correspond to speech sounds that are of least use for speaker discrimination may be de-emphasized.

58 Citations

View as Search Results

25 Claims

1. An apparatus for use in biometric speaker recognition, wherein the apparatus is configured to receive digital audio data derived from an audio signal output by a microphone, the apparatus comprising:
- an analyzer for analyzing each frame of a sequence of frames of digital audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame; and
  
  an assessment module for determining for each frame of audio data a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined at least one characteristic of the speech sound;
  
  wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a specific phoneme or as one of a plurality of predefined classes of phonemes, andwherein the contribution indicator varies based on the number of previous instances of the same phoneme or class of phoneme in previous frames of audio data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 2. The apparatus as claimed in claim 1 comprising a speaker recognition module configured to apply speaker recognition processing to said frames of audio data, wherein the speaker recognition module is configured to process the frames of audio data according to the contribution indicator for each frame.
  - 3. The apparatus as claimed in claim 1 wherein said contribution indicator comprises a weighting to be applied to the each frame in the speaker recognition processing.
  - 4. The apparatus as claimed in claim 1 wherein said contribution indicator comprises a selection of frames of audio data not to be used in the speaker recognition processing.
  - 5. The apparatus as claimed in claim 1 where the speaker recognition processing comprises processing the frames of audio data for speaker enrollment.
  - 6. The apparatus as claimed in claim 1 where the speaker recognition processing comprises processing the frames of audio data for speaker verification.
  - 7. The apparatus as claimed in claim 1 where the speaker recognition processing comprises processing the frames of audio data for generation of a generalized model of a population of speakers.
  - 8. The apparatus as claimed in claim 1 wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a voiced sound or an unvoiced sound.
  - 9. The apparatus as claimed in claim 1 where the at least one characteristic of the speech sound comprises at least one characteristic of one or more formants in the speech sound.
  - 10. The apparatus as claimed in claim 9 wherein said at least one characteristic comprises an indication of at least one formant peak or at least one formant null.
  - 11. The apparatus as claimed in claim 1 wherein the assessment module is configured to receive an indication of acoustic environment in which the speech sound was uttered by the user and wherein the contribution indicator is also based on the indication of acoustic environment.
  - 12. The apparatus as claimed in claim 11 wherein the indication of acoustic environment comprises an indication of noise in the audio data.
  - 13. The apparatus as claimed in claim 12 wherein the at least one characteristic of the speech sound comprises identification of the speech sound as one of a plurality of predefined categories of phonemes and wherein, for at least one of said predefined categories of phonemes, the assessment modules applies a transfer function between a value of contribution indicator and noise level.
  - 14. The apparatus as claimed in claim 12 wherein the analyzer is configured to analyze the audio data to determine said indication of noise.
  - 15. The apparatus as claimed in claim 11 wherein the indication of acoustic environment comprises an indication of reverberation in the audio data.
  - 16. The apparatus as claimed in claim 1 wherein the assessment module is configured to receive an indication of a parameter of an acoustic channel for generating the audio data and wherein the contribution indicator is also based on said indication of the parameter of the acoustic channel.
  - 17. The apparatus as claimed in claim 16 wherein the indication of a parameter of the acoustic channel comprises an indication of a parameter of a microphone used to receive the speech sound uttered by a user.
  - 18. The apparatus as claimed in claim 1 wherein the assessment module is configured to receive an indication of a speech characteristic derived from speech sounds previously uttered by the user and wherein the contribution indicator is also based on the indication of the speech characteristic.
  - 19. The apparatus as claimed in claim 18 wherein the indication of the speech characteristic comprises an indication of the pitch of the user or an indication of the nasality of the user.
  - 20. The apparatus as claimed in claim 1 wherein the assessment module is configured to receive an indication of at least one enrolled user profile and wherein the contribution indicator is also based on said indication of the enrolled user profile.
  - 21. The apparatus as claimed in claim 1 wherein the assessment module is configured such that the contribution indicator for a frame of audio data is based on the determined at least one characteristic of the speech sound and on the number of previous frames of audio data where the determined at least one characteristic was similar.
  - 22. The electronic device comprising an apparatus as claimed in claim 1 wherein the electronic device is at least one of:
    - a portable device;
      
      a communication device;
      
      a mobile telephone;
      
      a computing device;
      
      a laptop, notebook or table computer;
      
      a gaming device;
      
      a wearable device;
      
      a voice controllable device;
      
      an identity verification device;
      
      a wearable device;
      
      or a domestic appliance.

23. An apparatus for use in biometric speaker recognition, wherein the apparatus is configured to receive digital audio data derived from an audio signal output by a microphone, the apparatus comprising:
- an assessment module for determining for a sequence of frames of digital audio data which correspond to speech sounds uttered by a user a contribution indicator of the extent to which a frame of audio data should be used for speaker recognition processing based on at least one characteristic of the speech sound to which the frame relates;
  
  wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a specific phoneme or as one of a plurality of predefined classes of phonemes, andwherein the contribution indicator varies based on the number of previous instances of the same phoneme or class of phoneme in previous frames of audio data.

24. A method of speaker recognition, comprising:
- analyzing each frame of a sequence of frames of digital audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame, wherein the digital audio data derived from an audio signal output by a microphone; and
  
  determining for the each frame of audio data a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined at least one characteristic of the speech sound;
  
  wherein the at least one characteristic of the speech sound comprises identification of the speech sound as a specific phoneme or as one of a plurality of predefined classes of phonemes, andwherein the contribution indicator varies based on the number of previous instances of the same phoneme or class of phoneme in previous frames of audio data.
- View Dependent Claims (25)
- - 25. A non-transitory computer-readable storage medium having machine readable instructions stored thereon that when executed by a processor, cause the processor to perform the method of claim 24.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cirrus Logic Incorporated
Original Assignee
Cirrus Logic Incorporated
Inventors
Lesso, John Paul, Melanson, John Laurence
Primary Examiner(s)
Washburn, Daniel C
Assistant Examiner(s)
Ogunbiyi, Oluwadamilola M

Application Number

US15/666,280
Publication Number

US 20180040323A1
Time in Patent Office

1,092 Days
Field of Search
US Class Current
CPC Class Codes

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 17/20   Pattern transformations or ...

G10L 17/22   Interactive procedures; Man...

G10L 25/15   the extracted parameters be...

G10L 25/84   for discriminating voice fr...

G10L 25/90   Pitch determination of spee...

Speaker recognition with assessment of audio frame contribution

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

58 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker recognition with assessment of audio frame contribution

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

58 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links