Estimating fractional chirp rate with multiple frequency representations
First Claim
1. A computer-implemented method for automatic speaker recognition, the method comprising:
- obtaining a first portion of a speech signal;
computing a first frequency representation from the first portion of the speech signal using a first fractional chirp rate;
computing a first score using an auto-correlation of the first frequency representation;
computing a second frequency representation from the first portion of the speech signal using a second fractional chirp rate;
computing a second score using an auto-correlation of the second frequency representation;
comparing the first score and the second score;
determining a first estimated fractional chirp rate of the first portion of the speech signal corresponding to a highest score of the first score and the second score;
determining a first estimated pitch of the first portion of the speech signal using the first estimated fractional chirp rate;
obtaining a second portion of the speech signal, the second portion of the speech signal being at least partially non-overlapping with the first portion of the speech signal;
computing a third frequency representation from the second portion of the speech signal using a third fractional chirp rate;
computing a third score using an auto-correlation of the third frequency representation;
computing a fourth frequency representation from the second portion of the speech signal using a fourth fractional chirp rate;
computing a fourth score using an auto-correlation of the fourth frequency representation;
comparing the third score and the fourth score;
determining a second estimated fractional chirp rate of the second portion of the speech signal corresponding to a highest score of the third score and the fourth score;
determining a second estimated pitch of the second portion of the speech signal using the second estimated fractional chirp rate;
computing a sequence of pitch estimates, the sequence of pitch estimates comprising the first estimated pitch and the second estimated pitch; and
applying the sequence of pitch estimates to recognize a speaker as a source of the speech signal.
4 Assignments
0 Petitions
Accused Products
Abstract
An estimate of a fractional chirp rate of a signal may be computed by using multiple frequency representations of the signal. A first frequency representation may be computed using a first fractional chirp rate and a first score may be computed using the first frequency representation that indicates a match between the first fractional chirp rate and a fractional chirp rate of the signal. A second frequency representation may be computed using a second fractional chirp rate and a second score may be computed using the second frequency representation that indicates a match between the second fractional chirp rate and the fractional chirp rate of the signal. The fractional chirp rate of the signal may be estimated using the first score and the second score, for example, by selecting a fractional chirp rate corresponding to a highest score.
-
Citations
14 Claims
-
1. A computer-implemented method for automatic speaker recognition, the method comprising:
-
obtaining a first portion of a speech signal; computing a first frequency representation from the first portion of the speech signal using a first fractional chirp rate; computing a first score using an auto-correlation of the first frequency representation; computing a second frequency representation from the first portion of the speech signal using a second fractional chirp rate; computing a second score using an auto-correlation of the second frequency representation; comparing the first score and the second score; determining a first estimated fractional chirp rate of the first portion of the speech signal corresponding to a highest score of the first score and the second score; determining a first estimated pitch of the first portion of the speech signal using the first estimated fractional chirp rate; obtaining a second portion of the speech signal, the second portion of the speech signal being at least partially non-overlapping with the first portion of the speech signal; computing a third frequency representation from the second portion of the speech signal using a third fractional chirp rate; computing a third score using an auto-correlation of the third frequency representation; computing a fourth frequency representation from the second portion of the speech signal using a fourth fractional chirp rate; computing a fourth score using an auto-correlation of the fourth frequency representation; comparing the third score and the fourth score; determining a second estimated fractional chirp rate of the second portion of the speech signal corresponding to a highest score of the third score and the fourth score; determining a second estimated pitch of the second portion of the speech signal using the second estimated fractional chirp rate; computing a sequence of pitch estimates, the sequence of pitch estimates comprising the first estimated pitch and the second estimated pitch; and applying the sequence of pitch estimates to recognize a speaker as a source of the speech signal. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for automatic speech recognition, the system comprising one or more computing devices comprising at least one processor and at least one memory, the one or more computing devices configured to:
-
obtain a first portion of a speech signal; compute a first frequency representation from the first portion of the speech signal using a first fractional chirp rate; compute a first score using an auto-correlation of the first frequency representation; compute a second frequency representation from the first portion of the speech signal using a second fractional chirp rate; compute a second score using an autocorrelation of the second frequency representation; compare the first score and the second score; determine a first estimated fractional chirp rate of the first portion of the speech signal corresponding to a highest score of the first score and the second score; determine a first estimated pitch of the first portion of the speech signal using the first estimated fractional chirp rate; obtain a second portion of the speech signal, the second portion of the speech signal being at least partially non-overlapping with the first portion of the speech signal; compute a third frequency representation from the second portion of the speech signal using a third fractional chirp rate; compute a third score using an auto-correlation of the third frequency representation; compute a fourth frequency representation from the second portion of the speech signal using a fourth fractional chirp rate; compute a fourth score using an auto-correlation of the fourth frequency representation; compare the third score and the fourth score; determine a second estimated fractional chirp rate of the second portion of the speech signal corresponding to a highest score of the third score and the fourth score; determine a second estimated pitch of the second portion of the speech signal using the second estimated fractional chirp rate; compute a sequence of pitch estimates, the sequence of pitch estimates comprising the first estimated pitch and the second estimated pitch; apply the sequence of pitch estimates to perform automatic speech recognition on the speech signal. - View Dependent Claims (9, 10, 11)
-
-
12. One or more non-transitory computer-readable media comprising computer executable instructions that, when executed, cause at least one processor to perform actions comprising:
-
obtaining a first portion of a speech signal; computing a first frequency representation from the first portion of the speech signal using a first fractional chirp rate; computing a first score using an auto-correlation of the first frequency representation; computing a second frequency representation from the first portion of the speech signal using a second fractional chirp rate; computing a second score using an auto-correlation of the second frequency representation; comparing the first score and the second score; determining a first estimated fractional chirp rate of the first portion of the speech signal corresponding to a highest score of the first score and the second score; determining a first estimated pitch of the first portion of the speech signal using the first estimated fractional chirp rate; obtaining a second portion of the speech signal, the second portion of the speech signal being at least partially non-overlapping with the first portion of the speech signal; computing a third frequency representation from the second portion of the speech signal using a third fractional chirp rate; computing a third score using an auto-correlation of the third frequency representation; computing a fourth frequency representation from the second portion of the speech signal using a fourth fractional chirp rate; computing a fourth score using an auto-correlation of the fourth frequency representation; comparing the third score and the fourth score; determining a second estimated fractional chirp rate of the second portion of the speech signal corresponding to a highest score of the third score and the fourth score; determining a second estimated pitch of the second portion of the speech signal using the second estimated fractional chirp rate; computing a sequence of pitch estimates, the sequence of pitch estimates comprising the first estimated pitch and the second estimated pitch; and applying the sequence of pitch estimates to recognize a speaker to perform signal reconstruction on the speech signal. - View Dependent Claims (13, 14)
-
Specification