Speech encoding method, apparatus and program

US 20040102970A1
Filed: 10/02/2003
Published: 05/27/2004
Est. Priority Date: 01/23/1997
Status: Active Grant

First Claim

Patent Images

1. A background noise/speech classification method comprising the steps of:

calculating power information and spectral information of an input signal as feature amounts; and

comparing the calculated feature amounts with estimated feature amounts constituted by estimated power information and estimated spectral information in a background noise period, thereby deciding whether the input signal belongs to speech or background noise.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a background noise/speech classification method, whether a digital input signal input through an input terminal is background noise or speech is decided by a background noise/speech decision section on the basis of calculated frame power and a calculated LSP coefficient which are obtained by supplying the input signal to a feature amount calculation section and estimated frame power and an estimated LSP coefficient obtained by an estimated feature amount update section. Thereafter, the estimated feature amount update section updates the estimated frame power and the estimated LSP coefficient by using the frame power and the LSP coefficient obtained by the feature amount calculation section to prepare for the next frame.

25 Citations

View as Search Results

25 Claims

1. A background noise/speech classification method comprising the steps of:
- calculating power information and spectral information of an input signal as feature amounts; and
  
  comparing the calculated feature amounts with estimated feature amounts constituted by estimated power information and estimated spectral information in a background noise period, thereby deciding whether the input signal belongs to speech or background noise.
- View Dependent Claims (2, 3)
- - 2. A method according to claim 1, further comprising the step of updating the estimated feature amounts by different methods depending on whether it is decided that the input signal belongs to background noise or speech, and setting an update amount when it is decided that the input signal belongs to background noise to be smaller than an update amount to be set when it is decided that the input signal belongs to speech.
  - 3. A method according to claim 1, further comprising the step of, when a decision result indicating that the input signal belongs to speech or background noise changes from speech to background noise, forcibly changing the decision result to “
    - speech”
      
      for a specific period, and changing the specific period by using the estimated power information and estimated spectral information in the background noise period.

4. A background noise/speech classification method comprising the steps of:
- calculating power information and spectral information of an input signal as feature amounts;
  
  comparing the calculated feature amounts with estimated feature amounts constituted by estimated power information and estimated spectral information in a background noise period, thereby analyzing power and spectral fluctuation amounts; and
  
  when a result obtained by analyzing the power and spectral fluctuation amounts indicates background noise, deciding that the input signal belongs to background noise, and otherwise, deciding that the input signal belongs to speech.
- View Dependent Claims (5, 6, 7, 8)
- - 5. A method according to claim 4, further comprising the step of updating the estimated feature amounts by different methods depending on whether it is decided that the input signal belongs to background noise or speech, and setting an update amount when it is decided that the input signal belongs to background noise to be smaller than an update amount to be set when it is decided that the input signal belongs to speech.
  - 6. A method according to claim 4, further comprising the step of analyzing the spectral fluctuation amount by comparing a predetermined threshold with a distortion value between a spectral envelope obtained from the spectral information of the input signal and a spectral envelope obtained from the estimated spectral information in the background noise period.
  - 7. A method according to claim 4, further comprising the step of analyzing the spectral fluctuation amount by comparing a predetermined threshold with a distortion value between a spectral envelope obtained from the spectral information of the input signal and a spectral envelope obtained from the estimated spectral information in the background noise period, and also changing the threshold in accordance with the estimated power information.
  - 8. A method according to claim 4, further comprising the step of, when a decision result indicating that the input signal belong to speech or background noise changes from “
    - speech”
      
      to “
      
      background noise”
      
      , forcibly changing the decision result to “
      
      speech”
      
      for a specific period, and also changing the specific period in accordance with the estimated power information and estimated spectral information in the background noise period.

9. A voiced/unvoiced classification method comprising the steps of:
- preparing a voiced appearance probability table and an unvoiced appearance probability table in which voiced and unvoiced appearance probabilities are respectively written in correspondence with speech feature amounts;
  
  obtaining voiced and unvoiced probabilities by referring to said voiced appearance probability table and said unvoiced appearance probability table by using a feature amount calculated from input speech as a key; and
  
  deciding on the basis of the voiced and unvoiced probabilities whether the input speech belongs to voice or unvoice.

10. A background noise decoding method comprising the steps of:
- extracting a decoded excitation signal parameter, a gain decoded parameter, and a decoded synthesis filter parameter from decoded parameters obtained by decoding encoded data;
  
  decoding an excitation signal and a gain from the decoded excitation signal parameter and the gain decoded parameter;
  
  smoothing the gain such that the gain changes smoothly; and
  
  generating a synthesized signal by using a signal obtained by multiplying the excitation signal by the smoothed gain and synthesis filter characteristic information based on the decoded synthesis filter parameter.
- View Dependent Claims (11)
- - 11. A method according to claim 10, wherein the step of smoothing the gain comprises gradually increasing the gain when the gain increases, and quickly decreasing the gain when the gain decreases.

12. A speech encoding method comprising the steps of:
- dividing an input speech signal into frames each having a predetermined length;
  
  obtaining a pitch period of a future frame with respect to a current frame to be encoded; and
  
  encoding the pitch period.

13. A speech encoding method-comprising the steps of:
- dividing an input speech signal into frames each having a predetermined length, and further dividing a speech signal of each frame into subframes;
  
  obtaining a predictive pitch period of a subframe in a current frame by using pitch periods of at least two frames of the current frame to be encoded and past and future frames with respect to the current frame; and
  
  obtaining a pitch period of a subframe in the current frame by using the predicted pitch period.
- View Dependent Claims (14, 15, 17, 18, 19)
- - 14. A method according to claim 13, further comprising the step of encoding the pitch period of the subframe in the current frame.
  - 15. A method according to claim 13, further comprising the step of preparing a pitch filter for suppressing or emphasizing a pitch period component of an input speech signal, and determining a transfer function for said pitch filter by using the pitch period of the subframe in the current frame.
  - 17. A method according to claim 13, wherein the step of obtaining the pitch period of the frame comprises adaptively deciding a pitch period analysis position for each frame.
  - 18. A method according to claim 13, further comprising the step of selecting a method of obtaining a pitch period of a subframe in the current frame in accordance with continuity of pitch periods.
  - 19. A method according to claim 13, further comprising the steps of:
    - preparing a relative pitch pattern codebook storing a plurality of relative pitch patterns representing fluctuations in pitch periods of a plurality of subframes; and
      
      expressing a change in pitch period of plural subframes with one relative pitch pattern selected from said relative pitch pattern codebook.

16. A speech encoding method comprising the steps of:
- preparing an adaptive codebook storing a plurality of adaptive vectors generated by repeating a past excitation signal series at a period included in a predetermined range;
  
  dividing an input speech signal into frames each having a predetermined length, and further dividing a speech signal of each frame into subframes;
  
  obtaining a predicted pitch period of a subframe in a current frame by using pitch periods of at least two frames of the current frame to be encoded and past and future frames with respect to the current frame; and
  
  determining a search range for subframes in the current frame by using the predicted pitch period to select an adaptive vector with a period that minimizes an error between a target vector and a signal obtained by filtering an adaptive vector extracted from said adaptive codebook through a perceptually weighted synthesis filter.

20. A speech encoding apparatus comprising:
- means for dividing an input speech signal into frames each having a predetermined length;
  
  means for obtaining a pitch period of a future frame with respect to a current frame to be encoded; and
  
  means for encoding the pitch period obtained by said means for obtaining the pitch period.

21. A speech encoding apparatus comprising:
- a divider section for dividing an input speech signal into frames each having a predetermined length, and further dividing a speech signal of each frame into subframes;
  
  a predicted subframe pitch period calculation section for obtaining a predicted pitch period of a subframe in a current frame by using pitch periods of at least two frames of the current frame to be encoded and past and future frames with respect to the current frame; and
  
  a subframe pitch period calculation section for obtaining a pitch period of a subframe in the current frame by using the predicted pitch period.

22. A speech encoding apparatus comprising:
- an adaptive codebook storing a plurality of adaptive vectors generated by repeating a past excitation signal series at a period included in a predetermined range;
  
  a divider section for dividing an input speech signal into frames each having a predetermined length, and further dividing a speech signal of each frame into subframes;
  
  a predicted subframe pitch period calculation section for obtaining a predictive pitch period of a subframe in a current frame by using pitch periods of at least two frames of the current frame to be encoded and past and future frames with respect to the current frame; and
  
  a search range determination section for determining a search range for subframes in the current frame by using the predicted pitch period to select an adaptive vector with a period that minimizes an error between a target vector and a signal obtained by filtering an adaptive vector extracted from said adaptive codebook through a perceptually weighted synthesis filter.

23. A recording medium on which a program is recorded, said program being used to execute processing of dividing an input speech signal into frames each having a predetermined length, and obtaining a pitch period of a future frame with respect to a current frame to be encoded, and processing of encoding the pitch period.

24. A recording medium on which a program is recorded, said program being used to execute processing of dividing an input speech signal into frames each having a predetermined length, further dividing a speech signal of each frame into subframes, and obtaining a predicted pitch period of a subframe in a current frame by using pitch periods of at least two frames of the current frame to be encoded and past and future frames with respect to the current frame, and processing of obtaining a pitch period of a subframe in the current frame by using the predicted pitch period.

25. A computer-readable recording medium on which a program for performing speech encoding processing is recorded, the program being used to execute processing of dividing an input speech signal into frames each having a predetermined length, further dividing a speech signal of each frame into subframes, and obtaining a predicted pitch period of a subframe in a current frame by using pitch periods of at least two frames of the current frame to be encoded and past and future frames with respect to the current frame, and processing of determining a search range for subframes in the current frame by using the predicted pitch period to select an adaptive vector with a period that minimizes an error between a target vector and a signal obtained by filtering an adaptive vector extracted from an adaptive codebook through a perceptually weighted synthesis filter, said adaptive codebook storing a plurality of adaptive vectors generated by repeating a past excitation signal series at a period included in a predetermined range.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kimio Miseki, Masahiro Oshikiri, Masami Akamine
Original Assignee
Kimio Miseki, Masahiro Oshikiri, Masami Akamine
Inventors
Oshikiri, Masahiro, Miseki, Kimio, Akamine, Masami

Granted Patent

US 7,191,120 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 19/09   Long term prediction, i.e. ...

G10L 25/78   Detection of presence or ab...

G10L 25/93   Discriminating between voic...

Speech encoding method, apparatus and program

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

25 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Speech encoding method, apparatus and program

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links