×

Estimating pitch using peak-to-peak distances

  • US 9,842,611 B2
  • Filed: 12/15/2015
  • Issued: 12/12/2017
  • Est. Priority Date: 02/06/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for automatic speaker recognition, the method comprising:

  • obtaining a first portion of a speech signal;

    computing, using one or more processing devices, a first frequency representation of the first portion of the speech signal;

    obtaining a first threshold;

    identifying a first plurality of peaks in the first frequency representation using the first threshold by identifying values of the first frequency representation larger than the first threshold;

    computing, using the one or more processing devices, a first plurality of peak-to-peak distances using locations in frequency of the first plurality of peaks;

    obtaining a second threshold;

    identifying a second plurality of peaks in the first frequency representation using the second threshold by identifying values of the first frequency representation larger than the second threshold;

    computing, using the one or more processing devices, a second plurality of peak-to-peak distances using locations in frequency of the second plurality of peaks;

    computing, using the one or more processing devices, a first pitch estimate of the first portion of the speech signal using the first plurality of peak-to-peak distances and the second plurality of peak-to-peak distances;

    obtaining a second portion of the speech signal;

    computing, using the one or more processing devices, a second frequency representation of the second portion of the speech signal;

    identifying a third plurality of peaks in the second frequency representation;

    computing, using the one or more processing devices, a third plurality of peak-to-peak distances using locations in frequency of the third plurality of peaks;

    computing, using the one or more processing devices, a second pitch estimate of the second portion of the speech signal using the third plurality of peak-to-peak distances;

    generating, using the one or more processing devices, a sequence of pitch estimates, the sequence of pitch estimates comprising the first pitch estimate and the second pitch estimate; and

    applying the sequence of pitch estimates to recognize a speaker as a source of the speech signal.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×