Harmonicity estimation, audio classification, pitch determination and noise estimation

US 10,014,005 B2
Filed: 03/21/2013
Issued: 07/03/2018
Est. Priority Date: 03/23/2012
Status: Active Grant

First Claim

Patent Images

1. A method of processing an audio signal in a voice communication device, comprising:

calculating, in a first spectrum generator circuit of the device, a log amplitude spectrum (LX) of the audio signal;

deriving, in a second spectrum generator circuit, a first spectrum (LSS) by calculating each component of the first spectrum as a sum of components of the log amplitude spectrum on frequencies which, in linear frequency scale, are odd multiples of the component'"'"'s frequency of the first spectrum;

further deriving, in the second spectrum generator circuit coupled to the first spectrum generator circuit, a second spectrum (LSH) by calculating each component of the second spectrum as a sum of components of the log amplitude spectrum on frequencies which, in linear frequency scale, are even multiples of the component'"'"'s frequency of the second spectrum;

yet further deriving, in the second spectrum generator a harmonic-to subharmonic ratio (HSR) spectrum in a linear amplitude domain by subtracting the LSS spectrum from the LSH spectrum (HSR=LSH−

LSS);

generating, in a harmonicity estimator circuit, a measure of harmonicity (H) as a monotonically increasing function of a maximum component of the HSR spectrum within a predetermined frequency range, wherein the maximum component has the most dominant harmonics; and

using the harmonicity estimator circuit to generate at least two measures of harmonicity of the audio signal based on different frequency ranges defined by different expected maximum frequencies;

providing an output of the harmonicity estimator circuit to a feature calculator to classify the audio signal into at least one of several defined audio types based on at least one of a difference and ratio between harmonicity measures obtained by the harmonicity estimator circuit based on the different frequency ranges as a portion of features extracted from the audio signal, to determine a bandwidth requirement of the voice communication device; and

transmitting the determined bandwidth requirement to a backend process through a communication link to manage at least one of the bandwidth requirement and an application utilized by the voice communication device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are described for harmonicity estimation, audio classification, pitch determination and noise estimation. Measuring harmonicity of an audio signal includes calculation a log amplitude spectrum of audio signal. A first spectrum is derived by calculating each component of the first spectrum as a sum of components of the log amplitude spectrum on frequencies. In linear frequency scale, the frequencies are odd multiples of the component'"'"'s frequency of the first spectrum. A second spectrum is derived by calculating each component of the second spectrum as a sum of components of the log amplitude spectrum on frequencies. In linear frequency scale, the frequencies are even multiples of the component'"'"'s frequency of the second spectrum. A difference spectrum is derived subtracting the first spectrum from the second spectrum. A measure of harmonicity is generated as a monotonically increasing function of the maximum component of the difference spectrum within predetermined frequency range.

Citations

8 Claims

1. A method of processing an audio signal in a voice communication device, comprising:
- calculating, in a first spectrum generator circuit of the device, a log amplitude spectrum (LX) of the audio signal;
  
  deriving, in a second spectrum generator circuit, a first spectrum (LSS) by calculating each component of the first spectrum as a sum of components of the log amplitude spectrum on frequencies which, in linear frequency scale, are odd multiples of the component'"'"'s frequency of the first spectrum;
  
  further deriving, in the second spectrum generator circuit coupled to the first spectrum generator circuit, a second spectrum (LSH) by calculating each component of the second spectrum as a sum of components of the log amplitude spectrum on frequencies which, in linear frequency scale, are even multiples of the component'"'"'s frequency of the second spectrum;
  
  yet further deriving, in the second spectrum generator a harmonic-to subharmonic ratio (HSR) spectrum in a linear amplitude domain by subtracting the LSS spectrum from the LSH spectrum (HSR=LSH−
  
  LSS);
  
  generating, in a harmonicity estimator circuit, a measure of harmonicity (H) as a monotonically increasing function of a maximum component of the HSR spectrum within a predetermined frequency range, wherein the maximum component has the most dominant harmonics; and
  
  using the harmonicity estimator circuit to generate at least two measures of harmonicity of the audio signal based on different frequency ranges defined by different expected maximum frequencies;
  
  providing an output of the harmonicity estimator circuit to a feature calculator to classify the audio signal into at least one of several defined audio types based on at least one of a difference and ratio between harmonicity measures obtained by the harmonicity estimator circuit based on the different frequency ranges as a portion of features extracted from the audio signal, to determine a bandwidth requirement of the voice communication device; and
  
  transmitting the determined bandwidth requirement to a backend process through a communication link to manage at least one of the bandwidth requirement and an application utilized by the voice communication device.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1, further comprising determining a degree of acoustic periodicity of the audio signal as the measure of H using the maximum component of the different spectrum through a monotonically increasing function relation between the measure of harmonicity and the maximum component of the difference spectrum, wherein the monotonically increasing function relation means that if a first maximum component is less than or equal to a second maximum component then a first measure of harmonicity (H1) through the function on the first maximum component is less than or equal to a second measure of harmonicity (H2) through the function on the second maximum component.
  - 3. The method according to claim 2, wherein the defined audio types comprise clean speech, noisy signals, and music, and wherein the different frequency ranges comprise at least three separate frequency ranges within an overall frequency range of 75 Hz to 5000 Hz.
  - 4. The method according to claim 1, wherein the calculation of the log amplitude spectrum comprises:
    - calculating an amplitude spectrum of the audio signal;
      
      weighting the amplitude spectrum with a weighting vector to suppress an undesired component; and
      
      performing logarithmic transform to the amplitude spectrum.

5. An apparatus for processing an audio signal in a voice communication device, comprising:
- a first spectrum generator circuit of the device configured to calculate a log amplitude spectrum (LX) of the audio signal;
  
  a second spectrum generator circuit coupled to the first spectrum generator circuit to derive a first spectrum (LSS) by calculating each component of the first spectrum as a sum of components of the log amplitude spectrum on frequencies which, in linear frequency scale, are odd multiples of the component'"'"'s frequency of the first spectrum; and
  
  to further derive a second spectrum (LSH) by calculating each component of the second spectrum as a sum of components of the log amplitude spectrum on frequencies which, in linear frequency scale, are even multiples of the component'"'"'s frequency of the second spectrum; and
  
  yet to further derive a harmonic-to-subharmonic ratio (HSR) spectrum in a linear amplitude domain by subtracting the LSS spectrum from the LSH spectrum (HSR=LSH−
  
  LSS); and
  
  a harmonicity estimator circuit configured to determine a measure of harmonicity (H) as a monotonically increasing function of a maximum component of the HSR spectrum within a predetermined frequency range, wherein the maximum component has the most dominant harmonics;
  
  the harmonicity estimator circuit further generating at least two measures of harmonicity of the audio signal based on different frequency ranges defined by different expected maximum frequencies;
  
  a transmission link providing an output of the harmonicity estimator circuit to a feature calculator to classify the audio signal into at least one of several defined audio types based on at least one of a difference and ratio between harmonicity measures obtained by the harmonicity estimator circuit based on the different frequency ranges as a portion of features extracted from the audio signal, to determine a bandwidth requirement of the voice communication device; and
  
  a communication link transmitting the determined bandwidth requirement to a backend process to manage at least one of the bandwidth requirement and an application utilized by the voice communication device.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus according to claim 5, wherein the harmonicity estimator circuit uses determines a degree of acoustic periodicity of the audio signal as a measure of harmonicity (H) using the maximum component of the different spectrum through a monotonically increasing function relation between the measure of harmonicity and the maximum component of the difference spectrum, and wherein the monotonically increasing function relation means that if a first maximum component is less than or equal to a second maximum component then a first measure of harmonicity (H1) through the function on the first maximum component is less than or equal to a second measure of harmonicity (H2) through the function on the second maximum component.
  - 7. The apparatus according to claim 6, wherein the defined audio types comprise clean speech, noisy signals, and music, and wherein the different frequency ranges comprise at least three separate frequency ranges within an overall frequency range of 75 Hz to 5000 Hz.
  - 8. The apparatus according to claim 5, wherein the calculation of the log amplitude spectrum comprises:
    - calculating an amplitude spectrum of the audio signal;
      
      weighting the amplitude spectrum with a weighting vector to suppress an undesired component; and
      
      performing logarithmic transform to the amplitude spectrum.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Original Assignee
Dolby Laboratories Licensing Corporation (Dolby Laboratories Incorporated)
Inventors
Sun, Xuejing, Shuang, Zhiwei, Huang, Shen
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
WANG, YI SHENG

Application Number

US14/384,356
Publication Number

US 20150081283A1
Time in Patent Office

1,930 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 25/18   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

G10L 25/81   for discriminating voice fr...

G10L 25/84   for discriminating voice fr...

Harmonicity estimation, audio classification, pitch determination and noise estimation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Harmonicity estimation, audio classification, pitch determination and noise estimation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links