Estimating fractional chirp rate with multiple frequency representations

US 9,922,668 B2
Filed: 12/15/2015
Issued: 03/20/2018
Est. Priority Date: 02/06/2015
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for automatic speaker recognition, the method comprising:

obtaining a first portion of a speech signal;

computing a first frequency representation from the first portion of the speech signal using a first fractional chirp rate;

computing a first score using an auto-correlation of the first frequency representation;

computing a second frequency representation from the first portion of the speech signal using a second fractional chirp rate;

computing a second score using an auto-correlation of the second frequency representation;

comparing the first score and the second score;

determining a first estimated fractional chirp rate of the first portion of the speech signal corresponding to a highest score of the first score and the second score;

determining a first estimated pitch of the first portion of the speech signal using the first estimated fractional chirp rate;

obtaining a second portion of the speech signal, the second portion of the speech signal being at least partially non-overlapping with the first portion of the speech signal;

computing a third frequency representation from the second portion of the speech signal using a third fractional chirp rate;

computing a third score using an auto-correlation of the third frequency representation;

computing a fourth frequency representation from the second portion of the speech signal using a fourth fractional chirp rate;

computing a fourth score using an auto-correlation of the fourth frequency representation;

comparing the third score and the fourth score;

determining a second estimated fractional chirp rate of the second portion of the speech signal corresponding to a highest score of the third score and the fourth score;

determining a second estimated pitch of the second portion of the speech signal using the second estimated fractional chirp rate;

computing a sequence of pitch estimates, the sequence of pitch estimates comprising the first estimated pitch and the second estimated pitch; and

applying the sequence of pitch estimates to recognize a speaker as a source of the speech signal.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An estimate of a fractional chirp rate of a signal may be computed by using multiple frequency representations of the signal. A first frequency representation may be computed using a first fractional chirp rate and a first score may be computed using the first frequency representation that indicates a match between the first fractional chirp rate and a fractional chirp rate of the signal. A second frequency representation may be computed using a second fractional chirp rate and a second score may be computed using the second frequency representation that indicates a match between the second fractional chirp rate and the fractional chirp rate of the signal. The fractional chirp rate of the signal may be estimated using the first score and the second score, for example, by selecting a fractional chirp rate corresponding to a highest score.

Citations

14 Claims

1. A computer-implemented method for automatic speaker recognition, the method comprising:
- obtaining a first portion of a speech signal;
  
  computing a first frequency representation from the first portion of the speech signal using a first fractional chirp rate;
  
  computing a first score using an auto-correlation of the first frequency representation;
  
  computing a second frequency representation from the first portion of the speech signal using a second fractional chirp rate;
  
  computing a second score using an auto-correlation of the second frequency representation;
  
  comparing the first score and the second score;
  
  determining a first estimated fractional chirp rate of the first portion of the speech signal corresponding to a highest score of the first score and the second score;
  
  determining a first estimated pitch of the first portion of the speech signal using the first estimated fractional chirp rate;
  
  obtaining a second portion of the speech signal, the second portion of the speech signal being at least partially non-overlapping with the first portion of the speech signal;
  
  computing a third frequency representation from the second portion of the speech signal using a third fractional chirp rate;
  
  computing a third score using an auto-correlation of the third frequency representation;
  
  computing a fourth frequency representation from the second portion of the speech signal using a fourth fractional chirp rate;
  
  computing a fourth score using an auto-correlation of the fourth frequency representation;
  
  comparing the third score and the fourth score;
  
  determining a second estimated fractional chirp rate of the second portion of the speech signal corresponding to a highest score of the third score and the fourth score;
  
  determining a second estimated pitch of the second portion of the speech signal using the second estimated fractional chirp rate;
  
  computing a sequence of pitch estimates, the sequence of pitch estimates comprising the first estimated pitch and the second estimated pitch; and
  
  applying the sequence of pitch estimates to recognize a speaker as a source of the speech signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the first frequency representation is computed using a frequency chirp distribution, a pitch-velocity transform, or an inner product of the portion of the signal with a chirplet.
  - 3. The method of claim 1, wherein the method further comprises computing a log-likelihood ratio for a plurality of frequencies of the first frequency representation, and wherein the log-likelihood ratio is a ratio of a log-likelihood that a harmonic is present at a frequency and a log-likelihood that a harmonic is not present at the frequency.
  - 4. The method of claim 1, wherein the first score is computed using the Fisher information of the auto-correlation of the first frequency representation.
  - 5. The method of claim 1, wherein computing the first estimated fractional chirp rate comprises selecting a fractional chirp rate corresponding to a highest score.
  - 6. The method of claim 1, wherein the third fractional chirp rate is substantially equal to the first fractional chirp rate.
  - 7. The method of claim 1, wherein the fourth fractional chirp rate is substantially equal to the second fractional chirp rate.

8. A system for automatic speech recognition, the system comprising one or more computing devices comprising at least one processor and at least one memory, the one or more computing devices configured to:
- obtain a first portion of a speech signal;
  
  compute a first frequency representation from the first portion of the speech signal using a first fractional chirp rate;
  
  compute a first score using an auto-correlation of the first frequency representation;
  
  compute a second frequency representation from the first portion of the speech signal using a second fractional chirp rate;
  
  compute a second score using an autocorrelation of the second frequency representation;
  
  compare the first score and the second score;
  
  determine a first estimated fractional chirp rate of the first portion of the speech signal corresponding to a highest score of the first score and the second score;
  
  determine a first estimated pitch of the first portion of the speech signal using the first estimated fractional chirp rate;
  
  obtain a second portion of the speech signal, the second portion of the speech signal being at least partially non-overlapping with the first portion of the speech signal;
  
  compute a third frequency representation from the second portion of the speech signal using a third fractional chirp rate;
  
  compute a third score using an auto-correlation of the third frequency representation;
  
  compute a fourth frequency representation from the second portion of the speech signal using a fourth fractional chirp rate;
  
  compute a fourth score using an auto-correlation of the fourth frequency representation;
  
  compare the third score and the fourth score;
  
  determine a second estimated fractional chirp rate of the second portion of the speech signal corresponding to a highest score of the third score and the fourth score;
  
  determine a second estimated pitch of the second portion of the speech signal using the second estimated fractional chirp rate;
  
  compute a sequence of pitch estimates, the sequence of pitch estimates comprising the first estimated pitch and the second estimated pitch;
  
  apply the sequence of pitch estimates to perform automatic speech recognition on the speech signal.
- View Dependent Claims (9, 10, 11)
- - 9. The system of claim 8, wherein the one or more computing devices are further configured to compute a log-likelihood ratio for a plurality of frequencies of the first frequency representation, and wherein the log-likelihood ratio is a ratio of a log-likelihood that a harmonic is present at a frequency and a log-likelihood that a harmonic is not present at the frequency.
  - 10. The system of claim 8, wherein the first score is computed using the Fisher information of the auto-correlation of the first frequency representation.
  - 11. The system of claim 8, wherein the first score indicates a match between the first fractional chirp rate and a fractional chirp rate of the first portion of the speech signal.

12. One or more non-transitory computer-readable media comprising computer executable instructions that, when executed, cause at least one processor to perform actions comprising:
- obtaining a first portion of a speech signal;
  
  computing a first frequency representation from the first portion of the speech signal using a first fractional chirp rate;
  
  computing a first score using an auto-correlation of the first frequency representation;
  
  computing a second frequency representation from the first portion of the speech signal using a second fractional chirp rate;
  
  computing a second score using an auto-correlation of the second frequency representation;
  
  comparing the first score and the second score;
  
  determining a first estimated fractional chirp rate of the first portion of the speech signal corresponding to a highest score of the first score and the second score;
  
  determining a first estimated pitch of the first portion of the speech signal using the first estimated fractional chirp rate;
  
  obtaining a second portion of the speech signal, the second portion of the speech signal being at least partially non-overlapping with the first portion of the speech signal;
  
  computing a third frequency representation from the second portion of the speech signal using a third fractional chirp rate;
  
  computing a third score using an auto-correlation of the third frequency representation;
  
  computing a fourth frequency representation from the second portion of the speech signal using a fourth fractional chirp rate;
  
  computing a fourth score using an auto-correlation of the fourth frequency representation;
  
  comparing the third score and the fourth score;
  
  determining a second estimated fractional chirp rate of the second portion of the speech signal corresponding to a highest score of the third score and the fourth score;
  
  determining a second estimated pitch of the second portion of the speech signal using the second estimated fractional chirp rate;
  
  computing a sequence of pitch estimates, the sequence of pitch estimates comprising the first estimated pitch and the second estimated pitch; and
  
  applying the sequence of pitch estimates to recognize a speaker to perform signal reconstruction on the speech signal.
- View Dependent Claims (13, 14)
- - 13. The one or more non-transitory computer-readable media of claim 12, wherein:
    - the first frequency representation is created by modifying a fifth frequency representation using the first fractional chirp rate; and
      
      the second frequency representation is created by modifying the fifth frequency representation using the second fractional chirp rate.
  - 14. The one or more non-transitory computer-readable media of claim 13, wherein the fifth frequency representation corresponds to a Fourier transform of the first portion of the signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Friday Harbor LLC
Original Assignee
Knuedge, Inc.
Inventors
Bradley, David C., Morin, Yao Huang, Intoy, Janis, O'Connor, Sean, Hilton, Nick, Mascaro, Massimo
Primary Examiner(s)
AZAD, ABUL K

Application Number

US14/969,036
Publication Number

US 20160232924A1
Time in Patent Office

826 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 17/02   Preprocessing operations, e...

G10L 21/0208   Noise filtering

G10L 25/03   characterised by the type o...

G10L 25/06   the extracted parameters be...

G10L 25/18   the extracted parameters be...

G10L 25/27   characterised by the analys...

G10L 25/51   for comparison or discrimin...

G10L 25/90   Pitch determination of spee...

Estimating fractional chirp rate with multiple frequency representations

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Estimating fractional chirp rate with multiple frequency representations

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links