Audio signal time offset estimation algorithm and measuring normalizing block algorithms for the perceptually-consistent comparison of speech signals

US 6,092,040 A
Filed: 11/21/1997
Issued: 07/18/2000
Est. Priority Date: 11/21/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method for measuring differences between two speech signals consistent with human auditory perception and judgement, said method comprising the steps of:

preparing, using a digital signal processor element programmed with a speech signal preparation algorithm, digital representations of two speech signals for further processing,transforming the digital representations of the two speech signals using a digital signal processor element programmed with a frequency domain transformation algorithm to segment the digital representations of the two speech signals into respective groups of frames, and transforming the respective groups of frames into the frequency domain,selecting frames using a digital signal processor element programmed with a frame selection algorithm to select frequency-domain frames for further processing,measuring perceived loudness of selected frames using a digital signal processor element programmed with a perceived loudness approximation algorithm, andcomparing, using a digital signal processor element programmed with an auditory distance algorithm to compare measured loudness values for at least two selected frequency-domain frames each corresponding to a respective one of the two speech signals and generate a numerical result representing auditory distance;

wherein the auditory distance value is directly proportional to human auditory perception of the difference between the two speech signals,wherein said step of preparing comprises the steps of;

converting a first of the two speech signals from analog to digital form and storing the digital form as a first vector x, andconverting a second of the two speech signals from analog to digital form and storing the digital form as a second vector y,wherein said transforming step comprises the steps of;

generating a plurality of frames for each of the x and y vectors, respectively,transforming each frame to a frequency domain vector, andstoring each frequency domain vector in respective matrices X and Y,wherein said step of selecting frames comprises the steps of;

selecting only frames that meet or exceed predetermined energy thresholds, andwherein said step of selecting only frames that meet or exceed predetermined energy thresholds comprises the steps of;

for matrix X, selecting only frames which meet or exceed an energy threshold xthreshold of substantially 15 dB below an energy level xenergy of a peak frame in matrix X;

##EQU59## for matrix Y, selecting only frames which meet or exceed an energy threshold ythreshold of substantially 35 dB below an energy level yenergy of a peak frame in matrix Y;

##EQU60##

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio signal time offset estimation algorithm estimates a time offset between two audio signals. The audio signal time offset estimation algorithm provides a way to measure that delay, even when the audio equipment causes severe distortion, and the signal coming out of the equipment sounds very different from the signal going in. Normalizing block algorithms provide perceptually consistent comparison of speech signals. These algorithms compare the sounds of two speech signals in a way that agrees with human auditory perception. This means, for example, that when these algorithms indicate that two speech signals sound identical, it is very likely that persons listening to those speech signals would describe them as identical. When these algorithms indicate that two speech signals sound similar, it is very likely that persons listening to those speech signals would describe them as similar.

Citations

66 Claims

1. A method for measuring differences between two speech signals consistent with human auditory perception and judgement, said method comprising the steps of:
- preparing, using a digital signal processor element programmed with a speech signal preparation algorithm, digital representations of two speech signals for further processing,transforming the digital representations of the two speech signals using a digital signal processor element programmed with a frequency domain transformation algorithm to segment the digital representations of the two speech signals into respective groups of frames, and transforming the respective groups of frames into the frequency domain,selecting frames using a digital signal processor element programmed with a frame selection algorithm to select frequency-domain frames for further processing,measuring perceived loudness of selected frames using a digital signal processor element programmed with a perceived loudness approximation algorithm, andcomparing, using a digital signal processor element programmed with an auditory distance algorithm to compare measured loudness values for at least two selected frequency-domain frames each corresponding to a respective one of the two speech signals and generate a numerical result representing auditory distance;
  
  wherein the auditory distance value is directly proportional to human auditory perception of the difference between the two speech signals,wherein said step of preparing comprises the steps of;
  
  converting a first of the two speech signals from analog to digital form and storing the digital form as a first vector x, andconverting a second of the two speech signals from analog to digital form and storing the digital form as a second vector y,wherein said transforming step comprises the steps of;
  
  generating a plurality of frames for each of the x and y vectors, respectively,transforming each frame to a frequency domain vector, andstoring each frequency domain vector in respective matrices X and Y,wherein said step of selecting frames comprises the steps of;
  
  selecting only frames that meet or exceed predetermined energy thresholds, andwherein said step of selecting only frames that meet or exceed predetermined energy thresholds comprises the steps of;
  
  for matrix X, selecting only frames which meet or exceed an energy threshold xthreshold of substantially 15 dB below an energy level xenergy of a peak frame in matrix X;
  
  ##EQU59## for matrix Y, selecting only frames which meet or exceed an energy threshold ythreshold of substantially 35 dB below an energy level yenergy of a peak frame in matrix Y;
  
  ##EQU60##

2. A method for measuring differences between two speech signals consistent with human auditory perception and Judgment, said method comprising the steps of:
- preparing, using a digital signal processor element programmed with a speech signal preparation algorithm, digital representations of two speech signals for further processing,transforming the digital representations of the two speech signals using a digital signal processor element programmed with a frequency domain transformation algorithm to segment the digital representations of the two speech signals into respective groups of frame, and transforming the respective groups of frames into the frequency domain,selecting frames using a digital signal processor element programmed with a frame selection algorithm to select frequency-domain frames for further processing,measuring perceived loudness of selected frames using a digital signal processor element programmed with a perceived loudness approximation algorithm, andcomparing, using a digital signal processor element programmed with an auditory distance algorithm to compare measured loudness values for at least two selected frequency-domain frames each corresponding to a respective one of the two speech signals and generate a numerical result representing auditory distance;
  
  wherein the auditory distance value is directly proportional to human auditory perception of the difference between the two speech signals,wherein said step of preparing comprises the steps of;
  
  converting a first of the two speech signals from analog to digital form and storing the digital form as a first vector x; and
  
  converting a second of the two speech signals from analog to digital form and storing the digital form as a second vector y,wherein said transforming step comprises the steps of;
  
  generating a plurality of frames for each of the x and v vectors, respectively,transforming each frame to a frequency domain vector, andstoring each frequency domain vector in respective matrices X and Y, andwherein said comparing step comprises the step of applying a frequency measuring normalizing block to matrices X and Y.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 3. The method of claim 2, wherein said step of applying a frequency measuring normalizing block to matrices X and Y further comprises the steps of:
    - measuring values f1;
      
      ##EQU61## generating values f2 normalized to 1 kHz;
      
      f2(i)=f1(i)-f1(17), 1≦
      
      i≦
      
      65,normalizing the Y matrix using the normalized f2 values;
      
      Y(i,j)=Y(i,j)-f2(i), 1≦
      
      i≦
      
      65, 1j≦
      
      N3,smoothing the measurement as values f3;
      
      ##EQU62## and saving four of the f3 values in matrix M;
      
      m(1)=f3(1)m(2)=f3(2)m(3)=f3(13)m(4)=f3(14).
  - 4. The method of claim 3, wherein said comparing step further comprises the step of:
    - applying a time measuring normalizing block to matrices X and Y at a largest frequency scale of substantially 15 Bark.
  - 5. The method of claim 4, wherein said step of comparing further comprises the steps of applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark.
  - 6. The method of claim 5, wherein said step of comparing further comprises the step of making a residual measurement of any residual values.
  - 7. The method of claim 6, wherein said step of applying a time measuring normalizing block to matrices X and Y at a largest frequency scale of substantially 15 Bark comprises the step of applying a time measuring normalizing block at a scale of 0.6 to 15.5 Bark comprising the steps of:
    - measuring values t0;
      
      ##EQU63## normalizing the Y matrix using the measured to values;
      
      Y(i,j)=Y(i,j)-t0(j), 2≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3,saving a positive portion of the measured to values in matrix M;
      
      ##EQU64## and saving a negative portion of measured t0 values in matrix M;
      
      ##EQU65##
  - 8. The method of claim 7, wherein said steps of applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark comprises the step of applying a time measuring normalizing block at a scale of 0.6 to 2.5 Bark comprising the steps of:
    - measuring values t1;
      
      ##EQU66## normalizing the Y matrix using the normalized t1 values;
      
      Y(i, j)=Y(i, j)-t1(j), 2≦
      
      i≦
      
      6, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured t1 values in matrix M;
      
      ##EQU67##
  - 9. The method of claim 8, wherein said steps of applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises the step of applying a time measuring normalizing block at a scale of 2.5 to 5 Bark comprising the steps of:
    - measuring values t2;
      
      ##EQU68## normalizing the Y matrix using the normalized t2 values;
      
      Y(i,j)=Y(i,j)-t2(j), 7≦
      
      i≦
      
      11, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured t2 values in matrix M;
      
      ##EQU69##10.
  - 10. The method of claim 9, wherein said steps of applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises the step of applying a time measuring normalizing block at a scale of 5 to 7.5 Bark comprising the steps of:
    - measuring values t3;
      
      ##EQU70## normalizing the Y matrix using the normalized t3 values;
      
      Y(i,j)=Y(i,j)-t3(j), 12≦
      
      i≦
      
      18, 1≦
      
      j≦
      
      5N3, and saving a positive portion of the measured t3 values in matrix M;
      
      ##EQU71##
  - 11. The method of claim 10, wherein said steps of applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises the step of applying a time measuring normalizing block at a scale of 7.5 to 10 Bark comprising the steps of:
    - measuring values t4;
      
      normalizing the Y matrix using the normalized t4 values;
      
      Y(i,j)=Y(i,j)-t4(j), 19≦
      
      i≦
      
      28, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured t4 values in matrix M;
      
      ##EQU72##
  - 12. The method of claim 11, wherein said steps of applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises the step of applying a time measuring normalizing block at a scale of 10 to 12.5 Bark comprising the steps of:
    - measuring values t5;
      
      ##EQU73## normalizing the Y matrix using the normalized t5 values;
      
      Y(i,j)=Y(i,j)-t5(j), 29≦
      
      i≦
      
      42, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured t5 values in matrix M;
      
      ##EQU74##
  - 13. The method of claim 12, wherein said steps of applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises the step of applying a time measuring normalizing block at a scale of 12.5 to 15.5 Bark comprising the steps of:
    - measuring values t6;
      
      ##EQU75## normalizing the Y matrix using the normalized t6 values;
      
      Y(i,j)=Y(i,j)-t6(j), 43≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured t6 values in matrix M;
      
      ##EQU76##
  - 14. The method of claim 13, wherein said step of making a residual measurement of any residual values further comprises the steps of:
    - measuring residual t7 values;
      
      t7(i,j)=Y(i,j)-X(i,j), 1≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured residual t7 values in matrix M;
      
      ##EQU77##
  - 15. The method of claim 14, wherein said comparing step further comprising the steps of:
    - linearly combining measurements mi(1) through m(13)to value.
  - 16. The method of claim 15, wherein said step of linearly combining measurements m(1) through m(13) to generate an auditory distance value comprises the step of generating an auditory distance value AD from the equation:
    - ##EQU78## where wt(1) through wt(13) represent weighting factors.
  - 17. The method of claim 16, wherein weighting factors wt(1) through wt(13) are given by:
    - space="preserve" listing-type="tabular">______________________________________ i wt(i) ______________________________________ 1 0.0010 2 -0.0043 3 -0.1289 4 0.1293 5 0.0989 6 0.0089 7 0.1129 8 1.0954 9 0.0013 10 0.0271 11 0.0118 12 0.0032 13 0.7859 ______________________________________
  - 18. The method of claim 3, wherein said comparing step further comprises the steps of:
    - applying a time measuring normalizing block of substantially 10 Bark wide to matrices X and Y in a central portion of the band,applying time measuring normalizing blocks of substantially 5 Bark wide to matrices X and Y in a central portion of the band,applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band,applying separate time measuring normalizing blocks to extreme top and bottom portions of the band, and making a residual measurement of remaining values.
  - 19. The method of claim 18, wherein said step of applying a separate time measuring normalizing block to an extreme bottom portion of the band to matrices X and Y comprises the step of applying a time measuring normalizing block at a scale of 0.6 to 2.5 Bark comprising the steps of:
    - measuring values t0;
      
      ##EQU79## normalizing the Y matrix using the normalized to values;
      
      Y(i,j)=Y(i,j)-t0(j), 2≦
      
      i≦
      
      6, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured to values in matrix M;
      
      ##EQU80##20.
  - 20. The method of claim 19, wherein said step of applying a time measuring normalizing block of substantially 10 Bark wide to matrices X and Y in a central portion of the band comprises the step of applying a time measuring normalizing block at a scale of 2.5 to 12.5 Bark comprising the steps of:
    - measuring values t1;
      
      ##EQU81## normalizing the Y matrix using the normalized t1 values;
      
      Y(i,j)=Y(i,j)-t1(j), 7≦
      
      i≦
      
      42, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured t1 values in matrix M;
      
      ##EQU82## saving a negative portion of the measured t1 values in matrix M;
      
      ##EQU83##
  - 21. The method of claim 20 wherein said step of applying a separate time measuring normalizing block to an extreme top portion of the band to matrices X and Y comprises the step of applying a time measuring normalizing block at a scale of 12.5 to 15.5 Bark comprising the steps of:
    - measuring values t2;
      
      ##EQU84## normalizing the Y matrix using the normalized t2 values;
      
      Y(i,j)=Y(i,j)-t2(j), 43≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured t2 values in matrix M;
      
      ##EQU85##
  - 22. The method of claim 21, wherein said step of applying time measuring normalizing blocks of substantially 5 Bark wide to matrices X and Y in a central portion of the band comprises the step of applying a time measuring normalizing block at a scale of 2.5 to 7.5 Bark comprising the steps of:
    - measuring values t3;
      
      ##EQU86## normalizing the Y matrix using the normalized t3 values;
      
      Y(i,j)=Y(i,j)-t3(j), 7≦
      
      i≦
      
      8, 1≦
      
      j≦
      
      N3saving a positive portion of the measured t3 values in matrix M;
      
      ##EQU87##
  - 23. The method of claim 22, wherein said step of applying time measuring normalizing blocks of substantially 5 Bark wide to matrices X and Y in a central portion of the band comprises the step of applying a time measuring normalizing block at a scale of 7.5 to 12.5 Bark comprising the steps of:
    - measuring values t4;
      
      ##EQU88## and normalizing the Y matrix using the normalized t4 values;
      
      Y(i,j)=Y(i,j)-4(j), 19≦
      
      i≦
      
      42, 1≦
      
      j≦
      
      N3.
  - 24. The method of claim 23, wherein said step of applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band comprises the step of applying a time measuring normalizing block at a scale of 2.5 to 5 Bark comprising the steps of:
    - measuring values t5;
      
      ##EQU89## normalizing the Y matrix using the normalized t5 values;
      
      Y(i,j)=Y(i,j)-t5(j), 7≦
      
      i≦
      
      11, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured t5 values in matrix M;
      
      ##EQU90##
  - 25. The method of claim 24, wherein said step of applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band further comprises the step of applying a time measuring normalizing block at a scale of 5 to 7.5 Bark comprising the steps of:
    - measuring values t6;
      
      ##EQU91## and normalizing the Y matrix using the normalized t6 values;
      
      Y(i,j)=Y(i,j)-t6(j), 12≦
      
      i≦
      
      8, 1≦
      
      j≦
      
      N3.
  - 26. The method of claim 25, wherein said step of applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band further comprises the step of applying a time measuring normalizing block at a scale of 7.7 to 10 Bark comprising the steps of:
    - measuring values t7;
      
      ##EQU92## normalizing the Y matrix using the normalized t7 values;
      
      Y(i,j)=Y(i,j)-t7(j), 19≦
      
      i≦
      
      28, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured to values in matrix M;
      
      ##EQU93##
  - 27. The method of claim 26, wherein said step of applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band further comprises the step of applying a time measuring normalizing block at a scale of 10 to 12.5 Bark comprising the steps of:
    - measuring values t8;
      
      ##EQU94## and normalizing the Y matrix using the normalized t8 values;
      
      Y(i,j)=Y(i,j)-t8(j), 29≦
      
      i≦
      
      42, 1≦
      
      j≦
      
      N3.
  - 28. The method of claim 27, wherein said step of making a residual measurement of any residual values further comprises the steps of:
    - measuring residual values t9;
      
      t9(i,j)=Y(i,j)-X(i,j), 1≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3, andsaving a positive portion of the measured residual values t9 in matrix M;
      
      ##EQU95##
  - 29. The method of claim 28, wherein said comparing step further comprises the steps of:
    - linearly combining measurements m(1) through m(12)to generate an auditory distance value.
  - 30. The method of claim 29, wherein said step of linearly combining measurements m(1) through m(12) to generate an auditory distance value comprises the step of generating an auditory distance value AD from the equation:
    - ##EQU96## where wt(1) through wt(12) represent weighting factors.
  - 31. The method of claim 29, wherein weighting factors wt(1) through wt(12) are given by:
    - space="preserve" listing-type="tabular">______________________________________ i wt(i) ______________________________________ 1 0.0000 2 -0.0023 3 -0.0684 4 0.0744 5 0.0142 6 0.0100 7 0.0008 8 0.2654 9 0.1873 10 2.2357 11 0.0329 12 0.0000 ______________________________________

32. An apparatus for measuring differences between two speech signals consistent with human auditory perception and Judgment, said apparatus comprising:
- first means for preraring digital representations of two speech signals for further processing;
  
  second means, coupled to the first means, for transforming the digital representations of the two speech signals to segment the digital representations of the two speech signals into respective groups of frames, and transforming the respective groups of frames into the frequency domain;
  
  third means, coupled to the second means, for selecting frequency-domain frames for further processing; and
  
  fourth means, coupled to the third means, for measuring perceived loudness of selected frames, andfifth means, coupled to the fourth means, for comparing measured loudness values for at least two selected frequency-domain frames each corresponding to a respective one of the two speech signals and generate a numerical result representing auditory distance;
  
  wherein the auditory distance value is directly proportional to human auditory perception of the difference between the two speech signals,wherein said first means comprises;
  
  means for converting a first of the two speech signals from analog to digital form and storing the digital form as a first vector x; and
  
  means for converting a second of the two speech signals from analog to digital form and storing the digital form as a second vector y,wherein said second means comprises;
  
  means for generating a plurality of frames for each of the x and v vectors. respectively;
  
  means for transforming each frame to a frequency domain vector; and
  
  means for storing each frequency domain vector in respective matrices X and Y.wherein said third means selects only frames that meet or exceed predetermined energy thresholds, andwherein third means selects only frames that meet or exceed predetermined energy thresholds determined as;
  
  for matrix X, selecting only frames which meet or exceed an energy threshold xthreshold of substantially 15 dB below an energy level xenergy of a peak frame in matrix X;
  
  ##EQU97## for matrix Y, selecting only frames which meet or exceed an energy threshold ythreshold of substantially 35 dB below an energy level yenergy of a peak frame in matrix Y;
  
  ##EQU98##

33. A apparatus of for measuring differences between two speech signals consistent with human auditory perception and Judgment, said apparatus comprising:
- first means for preparing digital representations of two speech signals for further processing;
  
  second means, coupled to the first means, for transforming the digital representations of the two speech signals to segment the digital representations of the two speech signals into respective groups of frames, and transforming the respective croups of frames into the frequency domain;
  
  third means, coupled to the second means, for selecting frequency-domain frames for further processing; and
  
  fourth means, coupled to the third means, for measuring perceived loudness of selected frames, andfifth means, coupled to the fourth means, for comparing measured loudness values for at least two selected frequency-domain frames each corresponding to a respective one of the two speech signals and generate a numerical result representing auditory distance;
  
  wherein the auditory distance value is directly proportional to human auditory perception of the difference between the two speech signals,wherein said first means comprises;
  
  means for converting a first of the two speech signals from analog to digital form and storing the digital form as a first vector x; and
  
  means for converting a second of the two speech signals from analog to digital form and storing the digital form as a second vector y,wherein said second means comprises;
  
  means for generating a plurality of frames for each of the x and y vectors, respectively;
  
  means for transforming each frame to a frequency domain vector; and
  
  means for storing each frequency domain vector in respective matrices X and Y, andwherein said fifth means comprises means for applying a frequency measuring normalizing block to matrices X and Y.
- View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62)
- - 34. The apparatus of claim 33, wherein said means for applying a frequency measuring normalizing block to matrices X and Y further comprises:
    - means for measuring values f1;
      
      ##EQU99## means for generating values f2 normalized to 1 kHz;
      
      f2(i)=f1(i)-f1(17), 1≦
      
      i≦
      
      65,means for normalizing the Y matrix using the normalized f2 values;
      
      Y(i,j)=Y(i,j)-f2(i), 1≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3,means for smoothing the measurement as values f3;
      
      ##EQU100## and means for saving four of the f3 values in matrix M;
      
      m(1)=f3(1)m(2)=f3(2)m(3)=f3(13)m(4)=f3(14).
  - 35. The apparatus of claim 34, wherein said fifth means further comprises means for applying a time measuring normalizing block to matrices X and Y at a largest frequency scale of substantially 15 Bark.
  - 36. The apparatus of claim 35, wherein said fifth means further comprises applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark.
  - 37. The apparatus of claim 36, wherein said fifth means further comprises means for making a residual measurement of any residual values.
  - 38. The apparatus of claim 37, wherein said means for applying a time measuring normalizing block to matrices X and Y at a largest frequency scale of substantially 15 Bark comprises means for applying a time measuring normalizing block at a scale of 0.6 to 15.5 Bark comprising:
    - means for measuring values t0;
      
      ##EQU101## means for normalizing the Y matrix using the measured to values;
      
      Y(i,j)=Y(i,j)-t0(j), 2≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3,means for saving a positive portion of the measured to values in matrix m;
      
      ##EQU102## and means for saving a negative portion of measured to values in matrix M;
      
      ##EQU103##
  - 39. The apparatus of claim 38, wherein said means for applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark comprises means for applying a time measuring normalizing block at a scale of 0.6 to 2.5 Bark comprising:
    - means for measuring values t1;
      
      ##EQU104## means for normalizing the Y matrix using the normalized t1 values;
      
      Y(i,j)=Y(i,j)-t1(j), 2≦
      
      i≦
      
      6, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured t1 values in matrix M;
      
      ##EQU105##40.
  - 40. The apparatus of claim 39, wherein said means for applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises means for applying a time measuring normalizing block at a scale of 2.5 to 5 Bark comprising:
    - means for measuring values t2;
      
      ##EQU106## means for normalizing the Y matrix using the normalized t2 values;
      
      Y(i,j)=Y(i,j)-t2(j), 7≦
      
      i≦
      
      11, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured t2 values in matrix M;
      
      ##EQU107##
  - 41. The apparatus of claim 40, wherein said means for applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises means for applying a time measuring normalizing block at a scale of 5 to 7.5 Bark comprising:
    - means for measuring values t3;
      
      ##EQU108## means for normalizing the Y matrix using the normalized t3 values;
      
      Y(i,j)=Y(i,j)-t3(j), 12≦
      
      i≦
      
      18, 1≦
      
      j≦
      
      5N3, andmeans for saving a positive portion of the measured t3 values in matrix M;
      
      ##EQU109##
  - 42. The apparatus of claim 41, wherein said means for applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises means for applying a time measuring normalizing block at a scale of 7.5 to 10 Bark comprising:
    - means for measuring values t4;
      
      ##EQU110## means for normalizing the Y matrix using the normalized t4 values;
      
      Y(i,j)=Y(i,j)-t4(j), 19≦
      
      i≦
      
      28, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured t4 values in matrix M;
      
      ##EQU111##
  - 43. The apparatus of claim 42, wherein said means for applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises means for applying a time measuring normalizing block at a scale of 10 to 12.5 Bark comprising:
    - means for measuring values t5;
      
      ##EQU112## means for normalizing the Y matrix using the normalized t5 values;
      
      Y(i,j)=Y(i,j)-t5(j), 29≦
      
      i≦
      
      42, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured t5 values in matrix M;
      
      ##EQU113##
  - 44. The apparatus of claim 43, wherein said means for applying additional time measuring normalizing blocks at a small scale of substantially 2-3 Bark further comprises means for applying a time measuring normalizing block at a scale of 12.5 to 15.5 Bark comprising:
    - means for measuring values t6;
      
      ##EQU114## means for normalizing the Y matrix using the normalized t6 values;
      
      Y(i,j)=Y(i,j)-t6(j), 43≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured t6 values in matrix M;
      
      ##EQU115##
  - 45. The apparatus of claim 44, wherein said means for making a residual measurement of any residual values further comprises:
    - means for measuring residual t7 values;
      
      t7(i,j)=Y(i,j)-X(i,j), 1≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured residual t7 values in matrix M;
      
      ##EQU116##
  - 46. The apparatus of claim 45, wherein said fifth means further comprises means for linearly combining measurements m(1) through m(13)to generate an auditory distance value.
  - 47. The apparatus of claim 46, wherein said means for linearly combining measurements m(1) through m(13) to generate an auditory distance value comprises means for generating an auditory distance value AD from the equation:
    - where wt(1) through wt(13) represent weighting factors.
      space="preserve" listing-type="tabular">______________________________________ i wt(i) ______________________________________ 1 0.0010 2 -0.0043 3 -0.1289 4 0.1293 5 0.0989 6 0.0089 7 0.1129 8 1.0954 9 0.0013 10 0.0271 11 0.0118 12 0.0032 13 0.7859 ______________________________________
  - 48. The apparatus of claim 47, wherein weighting factors are given by:
    - ##EQU117##
  - 49. The apparatus of claim 34, wherein said fifth means further comprises:
    - means for applying a time measuring normalizing block of substantially 10 Bark wide to matrices X and Y in a central portion of the band;
      
      means for applying time measuring normalizing blocks of substantially 5 Bark wide to matrices X and Y in a central portion of the band;
      
      means for applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band;
      
      means for applying separate time measuring normalizing blocks to extreme top and bottom portions of the band; and
      
      means for making a residual measurement of remaining values.
  - 50. The apparatus of claim 49, wherein said means for applying a separate time measuring normalizing block to an extreme bottom portion of the band to matrices X and Y comprises means for applying a time measuring normalizing block at a scale of 0.6 to 2.5 Bark comprising:
    - means for measuring values t0;
      
      ##EQU118## means for normalizing the Y matrix using the normalized to values;
      
      Y(i, j)=Y(i, j)-to (j), 2≦
      
      i≦
      
      6, 1;
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured to values in matrix M;
      
      ##EQU119##
  - 51. The apparatus of claim 50, wherein said means for applying a time measuring normalizing block of substantially 10 Bark wide to matrices X and Y in a central portion of the band comprises means for applying a time measuring normalizing block at a scale of 2.5 to 12.5 Bark comprising:
    - means for measuring valuest 1;
      
      ##EQU120## means for normalizing the Y matrix using the normalized tl values;
      
      Y(i, j)=Y(i, j)-t1(j), 7≦
      
      i≦
      
      42, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measuredt1values in matrix M;
      
      ##EQU121## means for saving a negative portion of the measured t1 values in matrix M;
      
      ##EQU122##
  - 52. The apparatus of claim 51, wherein said means for applying a separate time measuring normalizing block to an extreme top portion of the band to matrices X and Y comprises means for applying a time measuring normalizing block at a scale of 12.5 to 15.5 Bark comprising:
    - means for measuring values t2;
      
      ##EQU123## means for normalizing the Y matrix using the normalized t2 values;
      
      Y(i,j)=Y(i,j)-t2(j) 43≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured t2 values in matrix M;
      
      ##EQU124##
  - 53. The apparatus of claim 52, wherein said means for applying time measuring normalizing blocks of substantially 5 Bark wide to matrices X and Y in a central portion of the band comprises means for applying a time measuring normalizing block at a scale of 2.5 to 7.5 Bark comprising:
    - means for measuring values t3;
      
      ##EQU125## means for normalizing the Y matrix using the normalized t3 values;
      
      Y(i, j)=Y(i, j)-t3 (j), 7≦
      
      i≦
      
      18, 1≦
      
      j≦
      
      N3means for saving a positive portion of the measured t3 values in matrix M;
      
      ##EQU126##
  - 54. The apparatus of claim 53 wherein said means for applying time measuring normalizing blocks of substantially 5 Bark wide to matrices X and Y in a central portion of the band comprises means for applying a time measuring normalizing block at a scale of 7.5 to 12.5 Bark comprising:
    - means for measuring values t4;
      
      ##EQU127## and means for normalizing the Y matrix using the normalized t4 values;
      
      Y(i,j)=Y(i,j)-t4(j), 19≦
      
      i≦
      
      42, 1≦
      
      j≦
      
      N3.
  - 55. The apparatus of claim 54, wherein said means for applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band comprises means for applying a time measuring normalizing block at a scale of 2.5 to 5 Bark comprising:
    - means for measuring values t5;
      
      ##EQU128## means for normalizing the Y matrix using the normalized t5 values;
      
      Y(i,j)=Y(i,j)-t5(j), 7≦
      
      i≦
      
      11, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured tS values in matrix M;
      
      ##EQU129##
  - 56. The apparatus of claim 55, wherein said means for applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band further comprises means for applying a time measuring normalizing block at a scale of 5 to 7.5 Bark comprising:
    - means for measuring values t6;
      
      ##EQU130## and means for normalizing the Y matrix using the normalized t6 values;
      
      Y(i,j)=Y(i,j)-t6(j), 12≦
      
      i≦
      
      18, 1≦
      
      j≦
      
      5N3.
  - 57. The apparatus of claim 56, wherein said means for applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band further comprises means for applying a time measuring normalizing block at a scale of 7.7 to 10 Bark comprising:
    - means for measuring values t7;
      
      ##EQU131## means for normalizing the Y matrix using the normalized t7 values;
      
      Y(i,j)=Y(i,j)-t7(j), 19≦
      
      i≦
      
      28, 1≦
      
      j≦
      
      N3, andmeans for saving a positive portion of the measured t0 values in matrix M;
      
      ##EQU132##
  - 58. The apparatus of claim 57, wherein said means for applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band further comprises means for applying a time measuring normalizing block at a scale of 10 to 12.5 Bark comprising:
    - means for measuring values t8;
      
      ##EQU133## and means for normalizing the Y matrix using the normalized t8 values;
      
      Y(i,j)=Y(i,j)-t8(j), 29≦
      
      i≦
      
      42, 1≦
      
      j≦
      
      5N3.
  - 59. The apparatus of claim 58, wherein said means for making a residual measurement of any residual values further comprises:
    - means for measuring residual values t9;
      
      t9(i,j)=Y(i,j)-X(i,j), 1≦
      
      i≦
      
      565, 1≦
      
      j≦
      
      5N3; and
      
      means for saving a positive portion of the measured residual values t9 in matrix M;
      
      ##EQU134##60.
  - 60. The apparatus of claim 59 wherein said fifth means further comprises means for linearly combining measurements m(1) through m(12)to generate an auditory distance value.
  - 61. The apparatus of claim 60, wherein said means for linearly combining measurements m(1) through m(12) to generate an auditory distance value comprises means for generating an auditory distance value AD from the equation:
    - where wt(1) through wt(12) represent weighting factors.
  - 62. The apparatus of claim 61, wherein weighting factors are given by:
    - space="preserve" listing-type="tabular">______________________________________ i wt(i) ______________________________________ 1 0.0000 2 -0.0023 3 -0.0684 4 0.0744 5 0.0142 6 0.0100 7 0.0008 8 0.2654 9 0.1873 10 2.2357 11 0.0329 12 0.0000 ______________________________________

63. A computer readable memory for directing a computer to measure differences between two speech signals consistent with human auditory perception and judgment, said computer readable memory comprising:
- a first memory portion containing instructions for preparing digital representations of two speech signals for further processing,a second memory portion containing instructions for transforming the digital representations of the two speech signals to segment the digital representations of the two speech signals into respective croups of frames, and transforming the respective groups of frames into the freauencv domain,a third memory portion containing instructions for selecting frequency-domain frames for further processing,a fourth memory portion containing instructions for measuring perceived loudness of selected frames, anda fifth memory portion containing instructions for comparing measured loudness values for at least two selected freguency-domain frames each corresponding to a respective one of the two speech signals and generate a numerical result representing auditory distance;
  
  wherein the auditory distance value is directly proportional to human auditory perception of the difference between the two speech signals,wherein said first memory portion comprises;
  
  instructions for converting a first of the two speech signals from analog to digital form and storing the digital form as a first vector x, andinstructions for converting a second of the two speech signals from analog to digital form and storing the digital form as a second vector y,wherein said transforming step comprises;
  
  instructions for generating a plurality of frames for each of the x and v vectors, respectively,instructions for transforming each frame to a frequency domain vector, andinstructions for storing each frequency domain vector in respective matrices X and Y,wherein said instructions for selecting frames comprises;
  
  instructions for selecting only frames that meet or exceed predetermined energy thresholds, andwherein said instructions for selecting only frames that meet or exceed predetermined energy thresholds comprises instructions for;
  
  for matrix X, selecting only frames which meet or exceed an energy threshold xthreshold of substantially 15 dB below an energy level xenergy of a peak frame in matrix X;
  
  ##EQU135## for matrix Y, selecting only frames which meet or exceed an energy threshold ythreshold of substantially 35 dB below an energy level yenergy of a peak frame in matrix Y;
  
  ##EQU136##

64. A computer readable memory directing a computer to measure differences between two speech signals consistent with human auditory perception and iudgment, said computer readable memory comprising:
- a first memory portion containing instructions for preparing digital representations of two speech signals for further processing,a second memory portion containing instructions for transforming the digital representations of the two speech signals to segment the digital representations of the two speech signals into respective groups of frames, and transforming the respective groups of frames into the frequency domain,a third memory portion containing instructions for selecting frequency-domain frames for further processing,a fourth memory portion containing instructions for measuring perceived loudness of selected frames, anda fifth memory portion containing instructions for comparing measured loudness values for at least two selected frequency-domain frames each corresponding to a respective one of the two speech signals and generate a numerical result representing auditory distance;
  
  wherein the auditory distance value is directly proportional to human auditory perception of the difference between the two speech signals,wherein said first memory portion comprises;
  
  instructions for converting a first of the two speech signals from analog to digital form and storing the digital form as a first vector x, andinstructions for converting a second of the two speech signals from analog to digital form and storing the digital form as a second vector y,wherein said transforming step comprises;
  
  instructions for generating a plurality of frames for each of the x and y vectors, respectively,instructions for transforming each frame to a frequency domain vector, andinstructions for storing each frequency domain vector in respective matrices X and Y, andwherein said comparing step comprises the instructions for applying a frequency measuring normalizing block to matrices x and Y.
- View Dependent Claims (65, 66)
- - 65. The computer readable memory of claim 64, wherein said instructions for applying a frequency measuring normalizing block to matrices X and Y further comprises:
    - instructions for measuring values f1;
      
      ##EQU137## instructions for generating values f2 normalized to 1 kHz;
      
      f2(i)=f1(i)-f1(17), 1≦
      
      j≦
      
      65,instructions for normalizing the Y matrix using the normalized f2 values;
      
      Y(i,j) Y(i,j)-f2(i), 1≦
      
      i≦
      
      65, 1≦
      
      j≦
      
      N3,instructions for smoothing the measurement as values f3;
      
      ##EQU138## and instructions for saving four of the f3 values in matrix M;
      
      m(1)=f3(1)m(2)=f3(2)m(3)=f3(13)m(4)=f3(14).
  - 66. The computer readable memory of claim 65, wherein said comparing step further comprises:
    - instructions for applying a time measuring normalizing block of substantially 10 Bark wide to matrices X and Y in a central portion of the band,instructions for applying time measuring normalizing blocks of substantially 5 Bark wide to matrices X and Y in a central portion of the band,instructions for applying a plurality of time measuring normalizing blocks of substantially 2-3 Bark wide to matrices X and Y in a middle portion of the band,instructions for applying separate time measuring normalizing blocks to extreme top and bottom portions of the band, andinstructions for making a residual measurement of remaining values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The United States of America As Represented By The Secretary of Commerce
Original Assignee
The United States of America As Represented By The Secretary of Commerce
Inventors
Voran, Stephen
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/976,341
Time in Patent Office

970 Days
Field of Search

381/58, 381/94.1, 704/228, 704/233, 455/67.3
US Class Current

704/228
CPC Class Codes

G10L 15/10 using distance or distortio...

G10L 25/48 specially adapted for parti...

Audio signal time offset estimation algorithm and measuring normalizing block algorithms for the perceptually-consistent comparison of speech signals

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

66 Claims

Specification

Solutions

Use Cases

Quick Links

Audio signal time offset estimation algorithm and measuring normalizing block algorithms for the perceptually-consistent comparison of speech signals

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

66 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links