×

Audio signal time offset estimation algorithm and measuring normalizing block algorithms for the perceptually-consistent comparison of speech signals

  • US 6,092,040 A
  • Filed: 11/21/1997
  • Issued: 07/18/2000
  • Est. Priority Date: 11/21/1997
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for measuring differences between two speech signals consistent with human auditory perception and judgement, said method comprising the steps of:

  • preparing, using a digital signal processor element programmed with a speech signal preparation algorithm, digital representations of two speech signals for further processing,transforming the digital representations of the two speech signals using a digital signal processor element programmed with a frequency domain transformation algorithm to segment the digital representations of the two speech signals into respective groups of frames, and transforming the respective groups of frames into the frequency domain,selecting frames using a digital signal processor element programmed with a frame selection algorithm to select frequency-domain frames for further processing,measuring perceived loudness of selected frames using a digital signal processor element programmed with a perceived loudness approximation algorithm, andcomparing, using a digital signal processor element programmed with an auditory distance algorithm to compare measured loudness values for at least two selected frequency-domain frames each corresponding to a respective one of the two speech signals and generate a numerical result representing auditory distance;

    wherein the auditory distance value is directly proportional to human auditory perception of the difference between the two speech signals,wherein said step of preparing comprises the steps of;

    converting a first of the two speech signals from analog to digital form and storing the digital form as a first vector x, andconverting a second of the two speech signals from analog to digital form and storing the digital form as a second vector y,wherein said transforming step comprises the steps of;

    generating a plurality of frames for each of the x and y vectors, respectively,transforming each frame to a frequency domain vector, andstoring each frequency domain vector in respective matrices X and Y,wherein said step of selecting frames comprises the steps of;

    selecting only frames that meet or exceed predetermined energy thresholds, andwherein said step of selecting only frames that meet or exceed predetermined energy thresholds comprises the steps of;

    for matrix X, selecting only frames which meet or exceed an energy threshold xthreshold of substantially 15 dB below an energy level xenergy of a peak frame in matrix X;

    ##EQU59## for matrix Y, selecting only frames which meet or exceed an energy threshold ythreshold of substantially 35 dB below an energy level yenergy of a peak frame in matrix Y;

    ##EQU60##

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×