Multi-sensory speech detection system

US 7,383,181 B2
Filed: 07/29/2003
Issued: 06/03/2008
Est. Priority Date: 07/29/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system, comprising:

an audio microphone outputting a microphone signal based on a sensed audio input;

a speech sensor outputting a sensor signal based on a non-audio input generated by speech action;

a speech detector component outputting a speech detection signal indicative of a probability that a user is speaking based on the microphone signal and based on a level of variance in a first characteristic of the sensor signal and based on the microphone signal, wherein the first characteristic of the sensor signal has a first level of variance when the user is speaking and a second level of variance when the user is not speaking and wherein the speech detector component outputs the speech detection signal based on the level of variance of the first characteristic of the sensor signal relative to a baseline level of variance of the first characteristic that comprises a level of a predetermined one of the first and second levels of the characteristic over a give time period the speech detection component further calculating a combined signal by multiplying the speech detection signal by the microphone signal; and

a speech recognizer recognizing speech to provide a recognition output indicative of speech in the microphone signal based on the combined signal, wherein recognizing speech comprises;

increasing a likelihood that speech is recognized by an amount based on a probability that the speech detection signal indicates that the user is speaking; and

decreasing a likelihood that speech is recognized by an amount based on a probability that the speech detection signal indicates that the speaker is not speaking.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention combines a conventional audio microphone with an additional speech sensor that provides a speech sensor signal based on an input. The speech sensor signal is generated based on an action undertaken by a speaker during speech, such as facial movement, bone vibration, throat vibration, throat impedance changes, etc. A speech detector component receives an input from the speech sensor and outputs a speech detection signal indicative of whether a user is speaking. The speech detector generates the speech detection signal based on the microphone signal and the speech sensor signal.

Citations

13 Claims

1. A speech recognition system, comprising:
- an audio microphone outputting a microphone signal based on a sensed audio input;
  
  a speech sensor outputting a sensor signal based on a non-audio input generated by speech action;
  
  a speech detector component outputting a speech detection signal indicative of a probability that a user is speaking based on the microphone signal and based on a level of variance in a first characteristic of the sensor signal and based on the microphone signal, wherein the first characteristic of the sensor signal has a first level of variance when the user is speaking and a second level of variance when the user is not speaking and wherein the speech detector component outputs the speech detection signal based on the level of variance of the first characteristic of the sensor signal relative to a baseline level of variance of the first characteristic that comprises a level of a predetermined one of the first and second levels of the characteristic over a give time period the speech detection component further calculating a combined signal by multiplying the speech detection signal by the microphone signal; and
  
  a speech recognizer recognizing speech to provide a recognition output indicative of speech in the microphone signal based on the combined signal, wherein recognizing speech comprises;
  
  increasing a likelihood that speech is recognized by an amount based on a probability that the speech detection signal indicates that the user is speaking; and
  
  decreasing a likelihood that speech is recognized by an amount based on a probability that the speech detection signal indicates that the speaker is not speaking.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The speech detection system of claim 1 wherein the baseline level is calculated by averaging the level of the variance of the first characteristic over the time period.
  - 3. The speech detection system of claim 1 wherein the baseline level is recalculated intermittently during operation of the speech detection system.
  - 4. The speech detection system of claim 3 wherein the baseline level is recalculated periodically to represent the variance level of the first characteristic over a revolving time window.
  - 5. The speech detection system of claim 3 wherein the speech detection component outputs the speech detection signal based on a comparison of the level of the variance of the first characteristic of the sensor signal to the baseline level, and wherein the comparison is performed periodically.
  - 6. The speech detection system of claim 5 wherein the comparison is performed more frequently than the baseline level is recalculated.
  - 7. The speech detection system of claim 1 wherein the audio microphone and the speech sensor are mounted to a headset.

8. A speech recognition system, comprising:
- a speech detection system comprising;
  
  an audio microphone outputting a microphone signal based on a sensed audio input;
  
  a speech sensor outputting a sensor signal based on a non-audio input generated by speech action; and
  
  a speech detector component outputting a speech detection signal indicative of a probability that a user is speaking based on the microphone signal and the sensor signal wherein the speech detector component calculates a combined signal by multiplying the speech detection signal by the microphone signal; and
  
  a speech recognition engine recognizing speech to provide a recognition output indicative of speech in the sensed audio input based on the combined signal;
  
  increasing a likelihood that speech is recognized by an amount based on a probability that the speech detection signal indicates that the user is speaking; and
  
  decreasing a likelihood that speech is recognized by an amount based on a probability that the speech detection signal indicates that the speaker is not speaking.
- View Dependent Claims (9)
- - 9. The speech recognition system of claim 8 wherein the audio microphone and the speech sensor being mounted on a headset.

10. A method of recognizing speech, comprising:
- generating a first signal, indicative of an audio input, with an audio microphone;
  
  generating a second signal indicative of facial movement of a user, sensed by a facial movement sensor;
  
  generating a third signal indicative of a probability that the user is speaking based on the first and second signals;
  
  generating a fourth signal by multiplying the probability that the user is speaking by the first signal; and
  
  recognizing speech based on the fourth signal and the speech detection signal,wherein recognizing speech comprises;
  
  increasing a likelihood that speech is recognized by an amount based on a probability that the speech detection signal indicates that the user is speaking; and
  
  decreasing a likelihood that speech is recognized by an amount based on a probability that the speech detection signal indicates that the speaker is not speaking.
- View Dependent Claims (11, 12, 13)
- - 11. The method of claim 10 wherein generating the second signal comprises:
    - sensing vibration of one of the user'"'"'s jaw and neck.
  - 12. The method of claim 10 wherein generating the second signal comprises:
    - sensing an image indicative of movement of the user'"'"'s mouth.
  - 13. The method of claim 10 and further comprising:
    - providing a speech detection signal based on detecting whether the user is speaking.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Sinclair, Michael J., Zhang, Zhengyou, Huang, Xuedong D., Acero, Alejandro, Liu, Zicheng
Primary Examiner(s)
Vo; Huyen X.

Application Number

US10/629,278
Publication Number

US 20050027515A1
Time in Patent Office

1,771 Days
Field of Search

704/251, 704/270, 704/233, 704/275, 704/270.1, 704/231, 704/236, 704/246, 381/327, 381/318
US Class Current

704/231
CPC Class Codes

G10L 15/20   Speech recognition techniqu...

G10L 15/24   Speech recognition using no...

G10L 25/78   Detection of presence or ab...

H04R 1/10   Earpieces; Attachments ther...

H04R 1/14   Throat mountings for microp...

H04R 25/606   acting directly on the eard...

Multi-sensory speech detection system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-sensory speech detection system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links