Speech encoder using voice activity detection in coding noise

US 6,823,303 B1
Filed: 09/18/1998
Issued: 11/23/2004
Est. Priority Date: 08/24/1998
Status: Expired due to Term

First Claim

Patent Images

1. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics, the speech encoding system comprising:

an encoder processing circuit that selectively applies a first or a second encoding scheme upon identification of varying characteristics of the speech signal;

where the varying characteristics are utilized to classify the speech signal as having one of active voice content and inactive voice content;

the first encoding scheme utilizes a first analysis-by-synthesis speech coding approach on a speech signal classified as active voice content; and

the second encoding scheme utilizes a second analysis-by-synthesis speech coding approach on a speech signal classified as inactive voice content, the inactive voice content comprising background noise.

View all claims

13 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. For each bit rate mode selected, pluralities of fixed or innovation subcodebooks are selected for use in generating innovation vectors. The speech coder distinguishes various voice signals as a function of their voice content. For example, a Voice Activity Detection (VAD) algorithm selects an appropriate coding scheme depending on whether the speech signal comprises active or inactive speech. The encoder may consider varying characteristics of the speech signal including sharpness, a delay correlation, a zero-crossing rate, and a residual energy. In another embodiment of the present invention, code excited linear prediction is used for voice active signals whereas random excitation is used for voice inactive signals; the energy level and spectral content of the voice inactive signal may also be used for noise coding.

Citations

20 Claims

1. A speech encoding system using an analysis by synthesis approach on a speech signal having varying characteristics, the speech encoding system comprising:
- an encoder processing circuit that selectively applies a first or a second encoding scheme upon identification of varying characteristics of the speech signal;
  
  where the varying characteristics are utilized to classify the speech signal as having one of active voice content and inactive voice content;
  
  the first encoding scheme utilizes a first analysis-by-synthesis speech coding approach on a speech signal classified as active voice content; and
  
  the second encoding scheme utilizes a second analysis-by-synthesis speech coding approach on a speech signal classified as inactive voice content, the inactive voice content comprising background noise.
- View Dependent Claims (2, 3, 4, 5, 11, 12, 13, 14)
- - 2. The speech encoding system of claim 1, wherein the varying characteristics of the speech signal comprises pitch characteristics.
  - 3. The speech encoding system of claim 1, wherein the varying characteristics of the speech signal comprises periodicity characteristics.
  - 4. The speech encoding system of claim 1, wherein the varying characteristics of the speech signal comprises intensity characteristics.
  - 5. The speech encoding system of claim 1, wherein the encoder processing circuit selectively applies one of the first and the second encoding scheme at one of a plurality of bit rates.
  - 11. The speech encoding system of claim 1, wherein the first encoding scheme selects operation in one of a long term predictor (LTP) mode and a pitch preprocessing (PP) mode.
  - 12. The speech encoding system of claim 1, wherein the second encoding scheme selects a random excitation sequence after considering an energy level and spectral information of the speech signal.
  - 13. The speech encoding system of claim 1, wherein a speech signal classified as inactive voice comprises silence.
  - 14. The speech encoding system of claim 1, wherein a speech signal classified as inactive voice comprises background noise.

6. A speech encoding system for processing a speech signal having varying characteristics, the speech encoding system comprising:
- an encoder processing circuit that selectively applies a first or a second analysis-by-synthesis encoding scheme based upon at least one of the varying characteristics of the speech signal;
  
  the encoder processing circuit applies the first analysis-by-synthesis encoding scheme following identification of an active voice frame of the speech signal; and
  
  the encoder processing circuit applies the second analysis-by-synthesis encoding scheme following identification of an inactive voice frame of the speech signal, the inactive voice frame comprising background noise.
- View Dependent Claims (7, 8, 9, 10, 15)
- - 7. The speech encoding system of claim 6, wherein the second encoding scheme selects a random excitation sequence to encode the speech signal.
  - 8. The speech encoding system of claim 6, wherein the encoder processing circuit selectively applies one of the first and the second encoding scheme at one of a plurality of bit rates.
  - 9. The speech encoding system of claim 6, wherein the second encoding scheme identifies an energy level.
  - 10. The speech encoding system of claim 6, wherein the second encoding scheme identifies a spectral information.
  - 15. The speech encoding system of claim 6, wherein the first encoding scheme selects operation in one of a long term predictor (LTP) mode and a pitch preprocessing (PP) mode.

16. A method of encoding a speech signal comprising:
- classifying the speech signal as having one of active voice content and inactive voice content, the inactive voice content comprising background noise;
  
  applying a first encoding scheme comprising analysis-by-synthesis when the speech signal is classified as having active voice content; and
  
  applying a second encoding scheme comprising analysis-by-synthesis when the speech signal is classified as having inactive voice content.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16, further comprising identifying an energy level and spectral information of the speech signal when the second encoding scheme is applied.
  - 18. The method of claim 17, further comprising performing encoding with a selected random excitation sequence after identifying the energy level and the spectral information.
  - 19. The method of claim 16, further comprising applying one of the first encoding scheme and the second encoding scheme at one of a plurality of bit rates.
  - 20. The method of claim 16, further comprising encoding a first frame of the speech signal with the first encoding scheme at a bit rate and encoding a second frame of the speech signal with the second encoding scheme at the same bit rate.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
MACOM Technology Solutions Holdings, Inc.
Original Assignee
Conexant Systems Incorporated (Synaptics Incorporated)
Inventors
Su, Huan-Yu, Thyssen, Jes, Benyassine, Adil
Primary Examiner(s)
Tsang, Fan
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US09/156,832
Time in Patent Office

2,258 Days
Field of Search

704/220, 704/213, 704/214, 704/201
US Class Current

704/220
CPC Class Codes

G10L 19/002   Dynamic bit allocation for ...

G10L 19/005   Correction of errors induce...

G10L 19/012   Comfort noise or silence co...

G10L 19/08   Determination or coding of ...

G10L 19/083   the excitation function bei...

G10L 19/09   Long term prediction, i.e. ...

G10L 19/10   the excitation function bei...

G10L 19/12   the excitation function bei...

G10L 19/125   Pitch excitation, e.g. pitc...

G10L 19/18   Vocoders using multiple modes

G10L 19/265   Pre-filtering, e.g. high fr...

G10L 2019/0005   Multi-stage vector quantisa...

G10L 2019/0007   Codebook element generation

G10L 2019/0011   Long term prediction filter...

G10L 21/0364   for improving intelligibility

Speech encoder using voice activity detection in coding noise

First Claim

13 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Speech encoder using voice activity detection in coding noise

First Claim

13 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links