Speech signal processing method and apparatus

US 10,586,551 B2
Filed: 08/30/2017
Issued: 03/10/2020
Est. Priority Date: 11/04/2015
Status: Active Grant

First Claim

Patent Images

1. A speech signal processing method performed at a terminal device having one or more processors, a microphone, a speaker, and memory storing one or more programs to be executed by the one or more processors, the method comprising:

receiving a to-be-output speech signal transmitted from another terminal device to the terminal device via a network;

obtaining a signal recorded by the microphone, the recorded signal including a noise signal and an echo signal, wherein the noise signal is detected from a near-end environment and the echo signal is detected from the speaker;

before outputting the speech signal via the speaker;

calculating a loop transfer function according to the recorded signal and the speech signal, wherein the loop transfer function indicates a correlation between the recorded signal and the speech signal;

calculating a power spectrum of the echo signal and a power spectrum of the noise signal according to the recorded signal, the speech signal, and the loop transfer function;

calculating a frequency weighted coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal, wherein the frequency weighted coefficient corresponds to a weakest frequency at which the noise signal has lowest energy;

adjusting a frequency amplitude of the speech signal based on the frequency weighted coefficient by increasing the frequency amplitude of the speech signal at the weakest frequency using the frequency weighted coefficient; and

after adjusting the frequency amplitude of the speech signal;

outputting the adjusted speech signal via the speaker.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech signal processing method is performed at a terminal device, including: obtaining a recorded signal and a to-be-output speech signal, the recorded signal including a noise signal and an echo signal; calculating a loop transfer function according to the recorded signal and the speech signal; calculating a power spectrum of the echo signal and a power spectrum of the noise signal according to the recorded signal, the speech signal, and the loop transfer function; calculating a frequency weighted coefficient according to the two power spectra of the echo signal and the noise signal; adjusting a frequency amplitude of the speech signal based on the frequency weighted coefficient; and outputting the adjusted speech signal to a speaker electrically coupled to the terminal device. As such, the frequency amplitude of the speech signal is automatically adjusted according to the relative frequency distribution of a noise signal and the speech signal.

Citations

19 Claims

1. A speech signal processing method performed at a terminal device having one or more processors, a microphone, a speaker, and memory storing one or more programs to be executed by the one or more processors, the method comprising:
- receiving a to-be-output speech signal transmitted from another terminal device to the terminal device via a network;
  
  obtaining a signal recorded by the microphone, the recorded signal including a noise signal and an echo signal, wherein the noise signal is detected from a near-end environment and the echo signal is detected from the speaker;
  
  before outputting the speech signal via the speaker;
  
  calculating a loop transfer function according to the recorded signal and the speech signal, wherein the loop transfer function indicates a correlation between the recorded signal and the speech signal;
  
  calculating a power spectrum of the echo signal and a power spectrum of the noise signal according to the recorded signal, the speech signal, and the loop transfer function;
  
  calculating a frequency weighted coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal, wherein the frequency weighted coefficient corresponds to a weakest frequency at which the noise signal has lowest energy;
  
  adjusting a frequency amplitude of the speech signal based on the frequency weighted coefficient by increasing the frequency amplitude of the speech signal at the weakest frequency using the frequency weighted coefficient; and
  
  after adjusting the frequency amplitude of the speech signal;
  
  outputting the adjusted speech signal via the speaker.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1, wherein the operation of calculating a loop transfer function according to the recorded signal and the speech signal comprises:
    - calculating a frequency domain cross-correlation function between the recorded signal and the speech signal;
      
      calculating a frequency domain autocorrelation function of the speech signal; and
      
      calculating the loop transfer function according to the frequency domain cross-correlation function between the recorded signal and the speech signal and the frequency domain autocorrelation function of the speech signal.
  - 3. The method according to claim 1, wherein the operation of calculating a power spectrum of the echo signal and a power spectrum of the noise signal according to the recorded signal, the speech signal, and the loop transfer function comprises:
    - calculating a power spectrum of the recorded signal;
      
      calculating the power spectrum of the echo signal according to the loop transfer function and the speech signal; and
      
      subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal.
  - 4. The method according to claim 3, wherein the operation of calculating the power spectrum of the echo signal according to the loop transfer function and the speech signal comprises:
    - calculating a power of the recorded signal, a power of the speech signal, and a power of the echo signal; and
      
      determining at least one of a power feature value indicative of whether the power of the recorded signal is greater than a first threshold, a power feature value indicative of whether the power of the speech signal is greater than a second threshold, and a power feature value indicative of whether the power of the echo signal is greater than a third threshold.
  - 5. The method according to claim 4, wherein the operation of subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal comprises:
    - when the power of the recorded signal is less than the first threshold and the power of the echo signal is less than the third threshold, subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal.
  - 6. The method according to claim 1, wherein the operation of calculating a frequency weighted coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal comprises:
    - constructing a speech intelligibility index according to the power spectrum of the echo signal and the power spectrum of the noise signal; and
      
      under the condition that the power spectrum of the echo signal remains unchanged, obtaining the frequency weighted coefficient according to a maximum value of the speech intelligibility index.
  - 7. The method according to claim 1, wherein the terminal device comprises a frequency weighted filter, and the frequency weighted coefficient indicates a ratio of the speech signal that is detected by the microphone after the speech signal passes through the frequency weighted filter and the speaker.

8. A terminal device, comprising:
- at least one processor;
  
  a microphone;
  
  a speaker;
  
  memory; and
  
  a plurality of program instructions that, when executed by the at least one processor, cause the terminal device to perform the following operations;
  
  receiving a to-be-output speech signal transmitted from another terminal device to the terminal device via a network;
  
  obtaining a signal recorded by the microphone, the recorded signal including a noise signal and an echo signal, wherein the noise signal is detected from a near-end environment and the echo signal is detected from the speaker;
  
  before outputting the speech signal via the speaker;
  
  calculating a loop transfer function according to the recorded signal and the speech signal, wherein the loop transfer function indicates a correlation between the recorded signal and the speech signal;
  
  calculating a power spectrum of the echo signal and a power spectrum of the noise signal according to the recorded signal, the speech signal, and the loop transfer function;
  
  calculating a frequency weighted coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal, wherein the frequency weighted coefficient corresponds to a weakest frequency at which the noise signal has lowest energy;
  
  adjusting a frequency amplitude of the speech signal based on the frequency weighted coefficient by increasing the frequency amplitude of the speech signal at the weakest frequency using the frequency weighted coefficient; and
  
  after adjusting the frequency amplitude of the speech signal;
  
  outputting the adjusted speech signal via the speaker.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The terminal device according to claim 8, wherein the operation of calculating a loop transfer function according to the recorded signal and the speech signal comprises:
    - calculating a frequency domain cross-correlation function between the recorded signal and the speech signal;
      
      calculating a frequency domain autocorrelation function of the speech signal; and
      
      calculating the loop transfer function according to the frequency domain cross-correlation function between the recorded signal and the speech signal and the frequency domain autocorrelation function of the speech signal.
  - 10. The terminal device according to claim 8, wherein the operation of calculating a power spectrum of the echo signal and a power spectrum of the noise signal according to the recorded signal, the speech signal, and the loop transfer function comprises:
    - calculating a power spectrum of the recorded signal;
      
      calculating the power spectrum of the echo signal according to the loop transfer function and the speech signal; and
      
      subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal.
  - 11. The terminal device according to claim 10, wherein the operation of calculating the power spectrum of the echo signal according to the loop transfer function and the speech signal comprises:
    - calculating a power of the recorded signal, a power of the speech signal, and a power of the echo signal; and
      
      determining at least one of a power feature value indicative of whether the power of the recorded signal is greater than a first threshold, a power feature value indicative of whether the power of the speech signal is greater than a second threshold, and a power feature value indicative of whether the power of the echo signal is greater than a third threshold.
  - 12. The terminal device according to claim 11, wherein the operation of subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal comprises:
    - when the power of the recorded signal is less than the first threshold and the power of the echo signal is less than the third threshold, subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal.
  - 13. The terminal device according to claim 8, wherein the operation of calculating a frequency weighted coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal comprises:
    - constructing a speech intelligibility index according to the power spectrum of the echo signal and the power spectrum of the noise signal; and
      
      under the condition that the power spectrum of the echo signal remains unchanged, obtaining the frequency weighted coefficient according to a maximum value of the speech intelligibility index.

14. A non-transitory computer readable storage medium in connection with a terminal device having one or more processors, a microphone, and a speaker, the storage medium storing a plurality of program instructions that, when executed by the one or more processors, cause the terminal device to perform the following operations:
- receiving a to-be-output speech signal transmitted from another terminal device to the terminal device via a network;
  
  obtaining a signal recorded by the microphone, the recorded signal including a noise signal and an echo signal, wherein the noise signal is detected from a near-end environment and the echo signal is detected from the speaker;
  
  before outputting the speech signal via the speaker;
  
  calculating a loop transfer function according to the recorded signal and the speech signal, wherein the loop transfer function indicates a correlation between the recorded signal and the speech signal;
  
  calculating a power spectrum of the echo signal and a power spectrum of the noise signal according to the recorded signal, the speech signal, and the loop transfer function;
  
  calculating a frequency weighted coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal, wherein the frequency weighted coefficient corresponds to a weakest frequency at which the noise signal has lowest energy;
  
  adjusting a frequency amplitude of the speech signal based on the frequency weighted coefficient by increasing the frequency amplitude of the speech signal at the weakest frequency using the frequency weighted coefficient; and
  
  after adjusting the frequency amplitude of the speech signal;
  
  outputting the adjusted speech signal via the speaker.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The non-transitory computer readable storage medium according to claim 14, wherein the operation of calculating a loop transfer function according to the recorded signal and the speech signal comprises:
    - calculating a frequency domain cross-correlation function between the recorded signal and the speech signal;
      
      calculating a frequency domain autocorrelation function of the speech signal; and
      
      calculating the loop transfer function according to the frequency domain cross-correlation function between the recorded signal and the speech signal and the frequency domain autocorrelation function of the speech signal.
  - 16. The non-transitory computer readable storage medium according to claim 14, wherein the operation of calculating a power spectrum of the echo signal and a power spectrum of the noise signal according to the recorded signal, the speech signal, and the loop transfer function comprises:
    - calculating a power spectrum of the recorded signal;
      
      calculating the power spectrum of the echo signal according to the loop transfer function and the speech signal; and
      
      subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal.
  - 17. The non-transitory computer readable storage medium according to claim 16, wherein the operation of calculating the power spectrum of the echo signal according to the loop transfer function and the speech signal comprises:
    - calculating a power of the recorded signal, a power of the speech signal, and a power of the echo signal; and
      
      determining at least one of a power feature value indicative of whether the power of the recorded signal is greater than a first threshold, a power feature value indicative of whether the power of the speech signal is greater than a second threshold, and a power feature value indicative of whether the power of the echo signal is greater than a third threshold.
  - 18. The non-transitory computer readable storage medium according to claim 17, wherein the operation of subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal comprises:
    - when the power of the recorded signal is less than the first threshold and the power of the echo signal is less than the third threshold, subtracting the power spectrum of the echo signal from the power spectrum of the recorded signal, to obtain the power spectrum of the noise signal.
  - 19. The non-transitory computer readable storage medium according to claim 14, wherein the operation of calculating a frequency weighted coefficient according to the power spectrum of the echo signal and the power spectrum of the noise signal comprises:
    - constructing a speech intelligibility index according to the power spectrum of the echo signal and the power spectrum of the noise signal; and
      
      under the condition that the power spectrum of the echo signal remains unchanged, obtaining the frequency weighted coefficient according to a maximum value of the speech intelligibility index.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Original Assignee
Tencent Technology Shenzhen Company Limited (Tencent Holdings Limited)
Inventors
Yuan, Haolei
Primary Examiner(s)
Zhu, Richard Z

Application Number

US15/691,300
Publication Number

US 20170365270A1
Time in Patent Office

923 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 2021/02082   the noise being echo, rever...

G10L 21/0232   Processing in the frequency...

G10L 21/0264   characterised by the type o...

G10L 21/0364   for improving intelligibility

G10L 25/06   the extracted parameters be...

G10L 25/21   the extracted parameters be...

H04M 9/08   Two-way loud-speaking telep...

H04M 9/082   using echo cancellers echo ...

H04R 3/02   for preventing acoustic rea...

Speech signal processing method and apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Speech signal processing method and apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links