Targeted speech

US 20080228478A1
Filed: 03/26/2008
Published: 09/18/2008
Est. Priority Date: 06/15/2005
Status: Active Grant

First Claim

Patent Images

1. A process that improves speech detection by processing a limited frequency band comprising:

encoding a limited frequency band of an input into a signal by varying the amplitude of pulse width modulated signal that is limited to a plurality of predefined values;

separating the signal into frequency bins in which each bin identifies an amplitude and a phase;

estimating a signal strength of a background voice segment in time;

estimating a distribution of noise to an average acoustic power of one or a plurality of frequency bins;

comparing a signal-to-noise ratio of each frequency bin to the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power; and

identifying a speech segment from the noise that surrounds it based on the comparison.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a digital converter that converts a time-varying input signal into a digital-domain signal. A window function passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a criterion based on an output of the background voice detector and an output of the noise estimator.

Citations

21 Claims

1. A process that improves speech detection by processing a limited frequency band comprising:
- encoding a limited frequency band of an input into a signal by varying the amplitude of pulse width modulated signal that is limited to a plurality of predefined values;
  
  separating the signal into frequency bins in which each bin identifies an amplitude and a phase;
  
  estimating a signal strength of a background voice segment in time;
  
  estimating a distribution of noise to an average acoustic power of one or a plurality of frequency bins;
  
  comparing a signal-to-noise ratio of each frequency bin to the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power; and
  
  identifying a speech segment from the noise that surrounds it based on the comparison.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The process that improves speech detection of claim 1, where a Fast Fourier transform separates the signal into frequency bins.
  - 3. The process that improves speech detection of claim 1, where the estimating of the signal strength comprises an estimate of a time smoothed signal.
  - 4. The process that improves speech of claim 3, where the estimating of the signal strength of the background voice segment comprises measuring a signal-to-noise ratio of the time smoothed signal.
  - 5. The process that improves speech detection of claim 4, further comprising modifying the estimation of the signal strength of the background voice segment through a multiplication with a scalar quantity.
  - 6. The process that improves speech detection of claim 4, further comprising modifying the estimation of the signal strength of the background voice segment through a subtraction of an offset.
  - 7. The process that improves speech of claim 1, further comprising modifying the estimation of the distribution of noise to an average acoustic power through a multiplication with a scalar quantity.
  - 8. The process that improves speech of claim 1, further comprising modifying the estimation of the distribution of noise to an average acoustic power through an addition of an offset.
  - 9. The process that improves speech of claim 1, where the comparing the signal-to-noise ratio of each frequency bin to the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power comprises comparing the signal-to-noise ratio of each frequency bin to a plurality of maximum values between the estimated signal strength of the background voice segment and the estimated distribution of noise to the average acoustic power.

10. A process that improves speech processing by processing a limited frequency band comprising:
- converting a limited frequency band of a continuously varying input into a digital-domain signal;
  
  converting the digital domain signal into a frequency-domain signal;
  
  estimating the signal strength of a smoothed background voice segment in time;
  
  estimating the noise-variance of a segment of the digital domain signal;
  
  comparing a potential speech segment to the estimated signal strength of the smoothed background voice segment and the estimated noise variance; and
  
  identifying a speech segment from the noise that surrounds it based on the comparison.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The process that improves speech processing of claim 10, where the act of comparing comprises comparing a signal-to-noise ratio to a maximum criterion.
  - 12. The process that improves speech processing of claim 11, where the signal-to-noise ratio comprises an instant signal-to-noise ratio.
  - 13. The process that improves speech processing of claim 10, further comprising modifying the estimation of the signal strength of the background voice segment through a multiplication with a scalar quantity.
  - 14. The process that improves speech processing of claim 13, where the scalar quantity is less than one.
  - 15. The process that improves speech processing of claim 10, further comprising modifying the estimation of the signal strength of the background voice segment through a subtraction of an offset.
  - 16. The process that improves speech processing of claim 10, further comprising modifying the noise-variance through a multiplication with a scalar quantity.
  - 17. The process that improves speech processing of claim 16, where the scalar quantity is greater than about one.
  - 18. The process that improves speech processing of claim 11, further comprising modifying the noise-variance through an addition of an offset.

19. A system that detects a speech segment that includes an unvoiced, a fully voiced, or a mixed voice content comprising:
- a digital converter that converts a time-varying input signal into a digital-domain signal;
  
  a window function configured to pass signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter;
  
  a frequency converter that converts the signals passing within the programmed aural frequency range into a plurality of frequency bins;
  
  a background voice detector configured estimate a strength of a background speech segment relative to noise of selected portions of an aural spectrum;
  
  a noise estimator configured to estimate a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins; and
  
  a voice detector configured to compare the strength of a desired speech segment to a criterion based on an output of the background voice detector and an output of the noise estimator.
- View Dependent Claims (20, 21)
- - 20. The system of claim 19 where the criterion comprises a maximum criterion.
  - 21. The system of claim 19 further comprising an end-pointer that applies one or more static or dynamic rules to determine the beginning or the end of a speech segment processed by the voice detector.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Blackberry Limited
Original Assignee
QNX Software Systems Wavemakers Inc (Blackberry Limited)
Inventors
Hetherington, Phillip A., Fallat, Mark

Granted Patent

US 8,311,819 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 25/87 Detection of discrete point...

Targeted speech

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Targeted speech

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links