System for detecting speech with background voice estimates and noise estimates

US 8,457,961 B2
Filed: 08/03/2012
Issued: 06/04/2013
Est. Priority Date: 06/15/2005
Status: Active Grant

First Claim

Patent Images

1. A process that improves speech detection comprising:

separating an input signal into frequency bins;

estimating a signal strength of a background voice segment or a background signal-to-noise ratio;

estimating a noise level of a background noise of one or more frequency bins;

comparing an instant signal-to-noise ratio to one or more of a maximum of the estimated signal strength of the background voice segment, a maximum of the estimated noise level of the background noise and a background signal-to-noise ratio; and

identifying a speech segment from noise that surrounds the speech segment based on the comparison.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a window function that passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.

Citations

20 Claims

1. A process that improves speech detection comprising:
- separating an input signal into frequency bins;
  
  estimating a signal strength of a background voice segment or a background signal-to-noise ratio;
  
  estimating a noise level of a background noise of one or more frequency bins;
  
  comparing an instant signal-to-noise ratio to one or more of a maximum of the estimated signal strength of the background voice segment, a maximum of the estimated noise level of the background noise and a background signal-to-noise ratio; and
  
  identifying a speech segment from noise that surrounds the speech segment based on the comparison.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The process that improves speech detection of claim 1, where identifying the speech segment further leads or lags a rising or falling edge of a voice decision window dynamically or by a fixed temporal amount or by a frequency-based amount.
  - 3. The process that improves speech detection of claim 1, where the act of estimating of the signal strength of the background voice segment comprises an estimate of a time smoothed signal.
  - 4. The process that improves speech detection of claim 3, where the act of estimating of the signal strength of the background voice segment comprises measuring a signal-to-noise ratio of the time smoothed signal.
  - 5. The process that improves speech detection of claim 4, further comprising modifying the estimation of the signal strength of the background voice segment through a multiplication with a scalar quantity.
  - 6. The process that improves speech detection of claim 4, further comprising modifying the estimation of the signal strength of the background voice segment through a subtraction of an offset.
  - 7. The process that improves speech detection of claim 1, further comprising modifying the estimation of the noise level of the background noise through a multiplication with a scalar quantity.
  - 8. The process that improves speech detection of claim 1, further comprising modifying the estimation of the noise level of the background noise through an addition of an offset.

9. A process that improves speech processing comprising:
- converting a limited frequency band of a continuously varying input signal into a frequency-domain signal;
  
  estimating a signal strength of a background voice segment of the input signal;
  
  estimating a noise-variance of a segment of the input signal;
  
  comparing an instant signal-to-noise ratio of the input signal to the estimated signal strength of the background voice segment of the input signal and to the estimated noise-variance; and
  
  identifying a speech segment when the instant signal-to-noise ratio of the frequency-domain signal exceeds a maximum of the estimated signal strength of the background voice segment relative to noise and the estimated noise-variance.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The process that improves speech processing of claim 9, further comprising modifying the estimation of the signal strength of the background voice segment through a multiplication with a scalar quantity.
  - 11. The process that improves speech processing of claim 10, where the scalar quantity is less than one.
  - 12. The process that improves speech processing of claim 9, further comprising modifying the estimation of the signal strength of the background voice segment through a subtraction of an offset.
  - 13. The process that improves speech processing of claim 9, further comprising modifying the estimation of the noise-variance through a multiplication with a scalar quantity.
  - 14. The process that improves speech processing of claim 13, where the scalar quantity is greater than about one.
  - 15. The process that improves speech processing of claim 9, further comprising modifying the estimation of the noise-variance through an addition of an offset.

16. A system that detects a speech segment that includes an unvoiced, a fully voiced, or a mixed voice content comprising:
- a window function configured to pass input signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range;
  
  a frequency converter that converts the input signals passing within the programmed aural frequency range into a plurality of frequency bins;
  
  a background voice detector configured to estimate a strength of a background speech segment relative to noise of selected portions of an aural spectrum;
  
  a noise estimator configured to estimate a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins; and
  
  a voice detector configured to compare an instant signal-to-noise ratio of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16 further comprising an end-pointer that applies one or more static or dynamic rules to determine a beginning or an end of the desired speech segment processed by the voice detector.
  - 18. The system of claim 16, where the voice detector is further configured to lead or lag a rising or falling edge of a voice decision window dynamically or by a fixed temporal amount or by a frequency-based amount.
  - 19. The system of claim 16, where the voice detector is further configured with a selector that provides user customization of the comparison of the instant signal-to-noise ratio of the desired speech segment to the maximum of the output of the background voice detector and the output of the noise estimator.
  - 20. The system of claim 16, where the background voice detector is further configured to compute a time smoothed signal before estimating the strength of the background speech segment relative to noise of selected portions of the aural spectrum.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Blackberry Limited
Original Assignee
QNX Software Systems Limited (Canada) (Blackberry Limited)
Inventors
Hetherington, Phillip Alan, Fallat, Mark Ryan
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US13/566,603
Publication Number

US 20120303366A1
Time in Patent Office

305 Days
Field of Search

704/233, 704/248, 704/253, 704/E17.005, 704/15.005, 704/11.003, 704/15.006
US Class Current

704/233
CPC Class Codes

G10L 25/87 Detection of discrete point...

System for detecting speech with background voice estimates and noise estimates

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System for detecting speech with background voice estimates and noise estimates

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links