System for detecting speech with background voice estimates and noise estimates

US 8,311,819 B2
Filed: 03/26/2008
Issued: 11/13/2012
Est. Priority Date: 06/15/2005
Status: Active Grant

First Claim

Patent Images

1. A process that improves speech detection by processing a limited frequency band comprising:

encoding a limited frequency band of an input into a signal by varying an amplitude of a pulse width modulated signal that is limited to a plurality of predefined values;

separating the signal into frequency bins in which each frequency bin identifies an amplitude and a phase;

estimating a signal strength of a background voice segment in time;

estimating a distribution of noise to an average acoustic power of one or a plurality of frequency bins;

comparing a signal-to-noise ratio of each frequency bin to a maximum of the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power; and

identifying a speech segment from noise that surrounds the speech segment based on the comparison.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a digital converter that converts a time-varying input signal into a digital-domain signal. A window function passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a criterion based on an output of the background voice detector and an output of the noise estimator.

111 Citations

17 Claims

1. A process that improves speech detection by processing a limited frequency band comprising:
- encoding a limited frequency band of an input into a signal by varying an amplitude of a pulse width modulated signal that is limited to a plurality of predefined values;
  
  separating the signal into frequency bins in which each frequency bin identifies an amplitude and a phase;
  
  estimating a signal strength of a background voice segment in time;
  
  estimating a distribution of noise to an average acoustic power of one or a plurality of frequency bins;
  
  comparing a signal-to-noise ratio of each frequency bin to a maximum of the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power; and
  
  identifying a speech segment from noise that surrounds the speech segment based on the comparison.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The process that improves speech detection of claim 1, where a Fast Fourier transform separates the signal into frequency bins.
  - 3. The process that improves speech detection of claim 1, where the act of estimating of the signal strength of the background voice segment comprises an estimate of a time smoothed signal.
  - 4. The process that improves speech detection of claim 3, where the act of estimating of the signal strength of the background voice segment comprises measuring a signal-to-noise ratio of the time smoothed signal.
  - 5. The process that improves speech detection of claim 4, further comprising modifying the estimation of the signal strength of the background voice segment through a multiplication with a scalar quantity.
  - 6. The process that improves speech detection of claim 4, further comprising modifying the estimation of the signal strength of the background voice segment through a subtraction of an offset.
  - 7. The process that improves speech detection of claim 1, further comprising modifying the estimation of the distribution of noise the average acoustic power through a multiplication with a scalar quantity.
  - 8. The process that improves speech detection of claim 1, further comprising modifying the estimation of the distribution of noise to the average acoustic power through an addition of an offset.

9. A process that improves speech processing by processing a limited frequency band comprising:
- converting a limited frequency band of a continuously varying input into a digital-domain signal;
  
  converting the digital-domain signal into a frequency-domain signal;
  
  estimating a signal strength of a smoothed background voice segment in time of the digital-domain signal relative to noise;
  
  estimating a noise-variance of a segment of the digital-domain signal;
  
  comparing an instant signal-to-noise ratio of the digital-domain signal to the estimated signal strength of the smoothed background voice segment in time of the digital domain signal relative to noise and the estimated noise-variance; and
  
  identifying a speech segment when the instant signal-to-noise ratio of the digital-domain signal exceeds a maximum of the estimated signal strength of the smoothed background voice segment relative to noise and the estimated noise variance.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The process that improves speech processing of claim 9, further comprising modifying the estimation of the signal strength of the smooth background voice segment through a multiplication with a scalar quantity.
  - 11. The process that improves speech processing of claim 10, where the scalar quantity is less than one.
  - 12. The process that improves speech processing of claim 9, further comprising modifying the estimation of the signal strength of the smoothed background voice segment through a subtraction of an offset.
  - 13. The process that improves speech processing of claim 9, further comprising modifying the estimation of the noise-variance through a multiplication with a scalar quantity.
  - 14. The process that improves speech processing of claim 13, where the scalar quantity is greater than about one.
  - 15. The process that improves speech processing of claim 9, further comprising modifying the estimation of the noise-variance through an addition of an offset.

16. A system that detects a speech segment that includes an unvoiced, a fully voiced, or a mixed voice content comprising:
- a digital converter that converts a time-varying input signal into a digital-domain signal;
  
  a window function configured to pass signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter;
  
  a frequency converter that converts the signals passing within the programmed aural frequency range into a plurality of frequency bins;
  
  a background voice detector configured to estimate a strength of a background speech segment relative to noise of selected portions of an aural spectrum;
  
  a noise estimator configured to estimate a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins; and
  
  a voice detector configured to compare an instant signal-to-noise ratio of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.
- View Dependent Claims (17)
- - 17. The system of claim 16 further comprising an end-pointer that applies one or more static or dynamic rules to determine a beginning or an end of the desired speech segment processed by the voice detector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Blackberry Limited
Original Assignee
QNX Software Systems Limited (Canada) (Blackberry Limited)
Inventors
Hetherington, Phillip A., Fallat, Mark
Primary Examiner(s)
Pullias, Jesse

Application Number

US12/079,376
Publication Number

US 20080228478A1
Time in Patent Office

1,693 Days
Field of Search

704/248, 704/253, 704/E17.005, 704/E15.005, 704/E11.003, 704/E15.006
US Class Current

704/233
CPC Class Codes

G10L 25/87 Detection of discrete point...

System for detecting speech with background voice estimates and noise estimates

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

111 Citations

17 Claims

Specification

Use Cases

Quick Links

Others

System for detecting speech with background voice estimates and noise estimates

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

111 Citations

17 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others