Estimating pitch using peak-to-peak distances
First Claim
Patent Images
1. A computer-implemented method for automatic speaker recognition, the method comprising:
- obtaining a first portion of a speech signal;
computing, using one or more processing devices, a first frequency representation of the first portion of the speech signal;
obtaining a first threshold;
identifying a first plurality of peaks in the first frequency representation using the first threshold by identifying values of the first frequency representation larger than the first threshold;
computing, using the one or more processing devices, a first plurality of peak-to-peak distances using locations in frequency of the first plurality of peaks;
obtaining a second threshold;
identifying a second plurality of peaks in the first frequency representation using the second threshold by identifying values of the first frequency representation larger than the second threshold;
computing, using the one or more processing devices, a second plurality of peak-to-peak distances using locations in frequency of the second plurality of peaks;
computing, using the one or more processing devices, a first pitch estimate of the first portion of the speech signal using the first plurality of peak-to-peak distances and the second plurality of peak-to-peak distances;
obtaining a second portion of the speech signal;
computing, using the one or more processing devices, a second frequency representation of the second portion of the speech signal;
identifying a third plurality of peaks in the second frequency representation;
computing, using the one or more processing devices, a third plurality of peak-to-peak distances using locations in frequency of the third plurality of peaks;
computing, using the one or more processing devices, a second pitch estimate of the second portion of the speech signal using the third plurality of peak-to-peak distances;
generating, using the one or more processing devices, a sequence of pitch estimates, the sequence of pitch estimates comprising the first pitch estimate and the second pitch estimate; and
applying the sequence of pitch estimates to recognize a speaker as a source of the speech signal.
4 Assignments
0 Petitions
Accused Products
Abstract
An estimate of a pitch of a signal may be computed by using peak-to-peak distances in a frequency representation of the signal. A frequency representation of the signal may be computed, peaks in the frequency representation may be identified, for example, by identifying peaks larger than a threshold value. Peak-to-peak distances may be determined using the locations in frequency of the peaks. The pitch of the signal may be estimated by, for example, estimating cumulative distribution function of the peak-to-peak distances or computing a histogram of the peak-to-peak distances.
-
Citations
18 Claims
-
1. A computer-implemented method for automatic speaker recognition, the method comprising:
-
obtaining a first portion of a speech signal; computing, using one or more processing devices, a first frequency representation of the first portion of the speech signal; obtaining a first threshold; identifying a first plurality of peaks in the first frequency representation using the first threshold by identifying values of the first frequency representation larger than the first threshold; computing, using the one or more processing devices, a first plurality of peak-to-peak distances using locations in frequency of the first plurality of peaks; obtaining a second threshold; identifying a second plurality of peaks in the first frequency representation using the second threshold by identifying values of the first frequency representation larger than the second threshold; computing, using the one or more processing devices, a second plurality of peak-to-peak distances using locations in frequency of the second plurality of peaks; computing, using the one or more processing devices, a first pitch estimate of the first portion of the speech signal using the first plurality of peak-to-peak distances and the second plurality of peak-to-peak distances; obtaining a second portion of the speech signal; computing, using the one or more processing devices, a second frequency representation of the second portion of the speech signal; identifying a third plurality of peaks in the second frequency representation; computing, using the one or more processing devices, a third plurality of peak-to-peak distances using locations in frequency of the third plurality of peaks; computing, using the one or more processing devices, a second pitch estimate of the second portion of the speech signal using the third plurality of peak-to-peak distances; generating, using the one or more processing devices, a sequence of pitch estimates, the sequence of pitch estimates comprising the first pitch estimate and the second pitch estimate; and applying the sequence of pitch estimates to recognize a speaker as a source of the speech signal. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for automatic speech recognition, the system comprising one or more computing devices comprising at least one processor and at least one memory, the one or more computing devices configured to:
-
obtain a first portion of a speech signal; compute a first frequency representation of the first portion of the speech signal; obtain a first threshold; identify a first plurality of peaks in the first frequency representation using the first threshold by identifying values of the first frequency representation larger than the first threshold; compute a first plurality of peak-to-peak distances using locations in frequency of the first plurality of peaks; obtain a second threshold; identify a second plurality of peaks in the first frequency representation using the second threshold by identifying values of the first frequency representation larger than the second threshold; compute a second plurality of peak-to-peak distances using locations in frequency of the second plurality of peaks; compute a first pitch estimate of the first portion of the speech signal using the first plurality of peak-to-peak distances and the second plurality of peak-to-peak distances; obtain a second portion of the speech signal; compute a second frequency representation of the second portion of the speech signal; identify a third plurality of peaks in the second frequency representation; compute a third plurality of peak-to-peak distances using locations in frequency of the third plurality of peaks; compute a second pitch estimate of the second portion of the speech signal using the third plurality of peak-to-peak distances; generate a sequence of pitch estimates, the sequence of pitch estimates comprising the first pitch estimate and the second pitch estimate; and apply the sequence of pitch estimates to perform automatic speech recognition on the speech signal. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory computer-readable media comprising computer executable instructions that, when executed, cause at least one processor to perform actions comprising:
-
obtaining a first portion of a speech signal; computing a first frequency representation of the first portion of the speech signal; obtaining a first threshold; identifying a first plurality of peaks in the first frequency representation using the first threshold by identifying values of the first frequency representation larger than the first threshold; computing a first plurality of peak-to-peak distances using locations in frequency of the first plurality of peaks; obtaining a second threshold; identifying a second plurality of peaks in the first frequency representation using the second threshold by identifying values of the first frequency representation larger than the second threshold; computing a second plurality of peak-to-peak distances using locations in frequency of the second plurality of peaks; computing a first pitch estimate of the first portion of the speech signal using the first plurality of peak-to-peak distances and the second plurality of peak-to-peak distances; obtaining a second portion of the speech signal; computing a second frequency representation of the second portion of the speech signal; identifying a third plurality of peaks in the second frequency representation; computing a third plurality of peak-to-peak distances using locations in frequency of the third plurality of peaks; computing a second pitch estimate of the second portion of the speech signal using the third plurality of peak-to-peak distances; generating a sequence of pitch estimates, the sequence of pitch estimates comprising the first pitch estimate and the second pitch estimate; and applying the sequence of pitch estimates to recognize a speaker as a source of the speech signal. - View Dependent Claims (16, 17, 18)
-
Specification