Transient noise rejection for speech recognition

US 8,560,313 B2
Filed: 05/13/2010
Issued: 10/15/2013
Est. Priority Date: 05/13/2010
Status: Active Grant

First Claim

Patent Images

1. A method of speech recognition, comprising the steps of:

(a) receiving audio including user speech and at least some transient noise associated with the speech;

(b) converting the received audio into digital data;

(c) segmenting the digital data into acoustic frames;

(d) extracting acoustic feature vectors from the acoustic frames;

(e) evaluating the acoustic frames for transient noise on a frame-by-frame basis;

(f) rejecting those acoustic frames having transient noise, wherein steps (e) and (f) include assessing at least two time spaced samples within an acoustic frame to determine autocorrelation of the samples within the frame, and rejecting the acoustic frame if the autocorrelation is determined to be insufficient;

(g) accepting as speech frames those acoustic frames having no transient noise; and

thereafter(h) recognizing the user speech using the speech frames.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of and system for transient noise rejection for improved speech recognition. The method comprises the steps of (a) receiving audio including user speech and at least some transient noise associated with the speech, (b) converting the received audio into digital data, (c) segmenting the digital data into acoustic frames, and (d) extracting acoustic feature vectors from the acoustic frames. The method also comprises the steps of (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis, (f) rejecting those acoustic frames having transient noise, (g) accepting as speech frames those acoustic frames having no transient noise and, thereafter, (h) recognizing the user speech using the speech frames.

12 Citations

View as Search Results

17 Claims

1. A method of speech recognition, comprising the steps of:
- (a) receiving audio including user speech and at least some transient noise associated with the speech;
  
  (b) converting the received audio into digital data;
  
  (c) segmenting the digital data into acoustic frames;
  
  (d) extracting acoustic feature vectors from the acoustic frames;
  
  (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis;
  
  (f) rejecting those acoustic frames having transient noise, wherein steps (e) and (f) include assessing at least two time spaced samples within an acoustic frame to determine autocorrelation of the samples within the frame, and rejecting the acoustic frame if the autocorrelation is determined to be insufficient;
  
  (g) accepting as speech frames those acoustic frames having no transient noise; and
  
  thereafter(h) recognizing the user speech using the speech frames.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the steps (e) and (g) include analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal, and accepting the acoustic frame if the acoustic frame is determined to include a voiced signal.
  - 3. The method of claim 1, wherein the steps (e) and (f) include comparing an acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames, and rejecting the acoustic frame if the cross-correlation is determined to be insufficient.
  - 4. The method of claim 1, wherein the steps (e) and (f) include applying a codebook trained on speech samples to an acoustic frame to determine a location of the acoustic frame in multidimensional feature space, and rejecting the acoustic frame if it is determined that the acoustic frame is not assigned to a speech cluster.
  - 5. The method of claim 1, wherein the steps (e), (f), and (g) include:
    - analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal;
      
      accepting the acoustic frame if the acoustic frame is determined to include a voiced signal;
      
      otherwise,assessing at least two time spaced samples within the acoustic frame to determine autocorrelation of the samples within the frame; and
      
      rejecting the acoustic frame if the autocorrelation is determined to be insufficient.
  - 6. The method of claim 1, wherein the steps (e), (f), and (g) include:
    - analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal;
      
      accepting the acoustic frame if the acoustic frame is determined to include a voiced signal;
      
      otherwise,assessing at least two time spaced samples within the acoustic frame to determine autocorrelation of the samples within the frame;
      
      rejecting the acoustic frame if the autocorrelation is determined to be insufficient;
      
      otherwise,comparing the acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames; and
      
      rejecting the acoustic frame if the cross-correlation is determined to be insufficient.
  - 7. The method of claim 1, wherein the steps (e), (f), and (g) include:
    - analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal;
      
      accepting the acoustic frame if the acoustic frame is determined to include a voiced signal;
      
      otherwise,assessing at least two time spaced samples within the acoustic frame to determine autocorrelation of the samples within the frame;
      
      rejecting the acoustic frame if the autocorrelation is determined to be insufficient, and accepting the acoustic frame if the autocorrelation is determined to be sufficient;
      
      otherwise,comparing the acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames;
      
      rejecting the acoustic frame if the cross-correlation is determined to be insufficient, and accepting the acoustic frame if the cross-correlation is determined to be sufficient;
      
      otherwise,applying a codebook trained on speech samples to the acoustic frame to determine a location of the acoustic frame in multidimensional feature space; and
      
      rejecting the acoustic frame if it is determined that the acoustic frame is not assigned to a speech cluster.
  - 8. The method of claim 1, wherein the steps (e), (f), and (g) include:
    - analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal;
      
      accepting the acoustic frame if the acoustic frame is determined to include a voiced signal;
      
      otherwise,comparing the acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames; and
      
      rejecting the acoustic frame if the cross-correlation is determined to be insufficient.
  - 9. The method of claim 1, wherein the steps (e), (f), and (g) include:
    - analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal;
      
      accepting the acoustic frame if the acoustic frame is determined to include a voiced signal;
      
      otherwise,applying a codebook trained on speech samples to the acoustic frame to determine a location of the acoustic frame in multidimensional feature space; and
      
      rejecting the acoustic frame if it is determined that the acoustic frame is not assigned to a speech cluster.
  - 10. The method of claim 1, wherein the steps (e) and (f) include:
    - assessing at least two time spaced samples within an acoustic frame to determine autocorrelation of the samples within the frame;
      
      rejecting the acoustic frame if the autocorrelation is determined to be insufficient;
      
      otherwise,comparing the acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames; and
      
      rejecting the acoustic frame if the cross-correlation is determined to be insufficient.
  - 11. The method of claim 1, wherein the steps (e) and (f) include:
    - assessing at least two time spaced samples within an acoustic frame to determine autocorrelation of the samples within the frame;
      
      rejecting the acoustic frame if the autocorrelation is determined to be insufficient;
      
      otherwise,applying a codebook trained on speech samples to the acoustic frame to determine a location of the acoustic frame in multidimensional feature space; and
      
      rejecting the acoustic frame if it is determined that the acoustic frame is not assigned to a speech cluster.

12. A computer program product including instructions on a computer readable medium and executable by a computer processor of a speech recognition system to cause the system to implement steps comprising:
- (a) receiving audio including user speech and at least some transient noise associated with the speech;
  
  (b) converting the received audio into digital data;
  
  (c) segmenting the digital data into acoustic frames;
  
  (d) extracting acoustic feature vectors from the acoustic frames;
  
  (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis;
  
  (f) rejecting those acoustic frames having transient noise;
  
  (g) accepting as speech frames those acoustic frames having no transient noise, wherein the steps (e), (f), and (g) include;
  
  analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal;
  
  accepting the acoustic frame if the acoustic frame is determined to include a voiced signal;
  
  otherwise,assessing at least two time spaced samples within the acoustic frame to determine autocorrelation of the samples within the frame;
  
  rejecting the acoustic frame if the autocorrelation is determined to be insufficient;
  
  otherwise,comparing the acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames;
  
  rejecting the acoustic frame if the cross-correlation is determined to be insufficient;
  
  otherwise,applying a codebook trained on speech samples to the acoustic frame to determine a location of the acoustic frame in multidimensional feature space; and
  
  rejecting the acoustic frame if it is determined that the acoustic frame is not assigned to a speech cluster; and
  
  thereafter(h) recognizing the user speech using the speech frames.

13. A speech recognition system, comprising:
- an acoustic interface to receive audio including user speech and at least some transient noise associated with the speech;
  
  a pre-processor to pre-process the received audio, including segmenting the digital data into acoustic frames, extracting acoustic feature vectors from the acoustic frames, evaluating the acoustic frames for transient noise on a frame-by-frame basis, rejecting those acoustic frames having transient noise, and accepting as speech frames those acoustic frames having no transient noise, wherein the pre-processor rejects those acoustic frames having transient noise by assessing at least two time spaced samples within an acoustic frame to determine autocorrelation of the samples within the frame, and rejecting the acoustic frame if the autocorrelation is determined to be insufficient; and
  
  a decoder to recognize the user speech using the speech frames.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The system of claim 13, wherein the pre-processor rejects those acoustic frames having transient noise by analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal, and accepting the acoustic frame if the acoustic frame is determined to include a voiced signal.
  - 15. The system of claim 13, wherein the pre-processor rejects those acoustic frames having transient noise by comparing an acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames, and rejecting the acoustic frame if the cross-correlation is determined to be insufficient.
  - 16. The system of claim 13, wherein the pre-processor rejects those acoustic frames having transient noise by applying a codebook trained on speech samples to an acoustic frame to determine a location of the acoustic frame in multidimensional feature space, and rejecting the acoustic frame if it is determined that the acoustic frame is not assigned to a speech cluster.
  - 17. The system of claim 13, wherein the pre-processor accepts or rejects those acoustic frames having transient noise by the following steps:
    - analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal;
      
      accepting the acoustic frame if the acoustic frame is determined to include a voiced signal;
      
      otherwise,assessing at least two time spaced samples within the acoustic frame to determine autocorrelation of the samples within the frame;
      
      rejecting the acoustic frame if the autocorrelation is determined to be insufficient, and accepting the acoustic frame if the autocorrelation is determined to be sufficient;
      
      otherwise,comparing the acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames;
      
      rejecting the acoustic frame if the cross-correlation is determined to be insufficient, and accepting the acoustic frame if the cross-correlation is determined to be sufficient;
      
      otherwise,applying a codebook trained on speech samples to the acoustic frame to determine a location of the acoustic frame in multidimensional feature space; and
      
      rejecting the acoustic frame if it is determined that the acoustic frame is not assigned to a speech cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
General Motors LLC (General Motors Company)
Original Assignee
General Motors LLC (General Motors Company)
Inventors
Talwar, Gaurav, Chengalvarayan, Rathinavelu
Primary Examiner(s)
Vo, Huyen X.

Application Number

US12/779,653
Publication Number

US 20110282663A1
Time in Patent Office

1,251 Days
Field of Search

704/233, 704/222, 704/226, 704/200, 704/205, 704206-210, 704214-218, 381/94.1
US Class Current

704/233
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 21/0208 Noise filtering

Transient noise rejection for speech recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

12 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Transient noise rejection for speech recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

12 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links