Detecting barge-in in a speech dialogue system

US 9,026,438 B2
Filed: 03/31/2009
Issued: 05/05/2015
Est. Priority Date: 03/31/2008
Status: Active Grant

First Claim

Patent Images

1. A method for detecting barge-in in a speech dialogue system, the method comprising:

determining whether a speech prompt is being output by the speech dialogue system including receiving information from a prompter that initiates output of the speech prompt; and

detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector, the sensitivity threshold used for segmentation to determine at least a beginning of speech activity,where the sensitivity threshold is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output,wherein the speech activity is considered present if a power density spectrum of the input signal is greater than a predetermined noise signal power spectrum times a predetermined factor and wherein the predetermined factor is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for detecting barge-in in a speech dialog system comprising determining whether a speech prompt is output by the speech dialog system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialog system configured to detect barge-in is also disclosed.

82 Citations

View as Search Results

25 Claims

1. A method for detecting barge-in in a speech dialogue system, the method comprising:
- determining whether a speech prompt is being output by the speech dialogue system including receiving information from a prompter that initiates output of the speech prompt; and
  
  detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector, the sensitivity threshold used for segmentation to determine at least a beginning of speech activity,where the sensitivity threshold is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output,wherein the speech activity is considered present if a power density spectrum of the input signal is greater than a predetermined noise signal power spectrum times a predetermined factor and wherein the predetermined factor is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, where the step of determining whether a speech prompt is being output includes determining whether a signal level of a speech prompt from the prompt exceeds a predetermined threshold.
  - 3. The method of claim 1, further including the step of determining an identity of a speaker of the input signal.
  - 4. The method of claim 3, further including the step of modifying the sensitivity threshold based on the determined speaker identity.
  - 5. The method of claim 4, further including the step of modifying a speech prompt from the prompter based on the determined speaker identity.
  - 6. The method of claim 1, where the predetermined factor is a time-varying factor.
  - 7. The method of claim 1, further including the step of determining a pitch value for the input signal.
  - 8. The method of claim 7, where the detecting step comprises comparing the determined pitch value with a predetermined pitch threshold.
  - 9. The method of claim 8, where the predetermined pitch threshold is based on a pitch value of the speech prompt signal.
  - 10. The method of claim 1, including the step of performing an echo cancellation on the input signal by subtracting a speech prompt signal from the input signal.
  - 11. The method of claim 10, where the step of determining whether a speech prompt is being output is not performed until a predetermined minimum time has passed after starting a speech prompt output.
  - 12. The method of claim 1, further including the step of interrupting or fading out an output of a speech prompt if speech activity is detected in the input signal.

13. A non-transitory computer-readable medium for use with a computer system, said computer-readable medium comprising software code portions that, when executed on the computer system, perform steps comprising:
- determining whether a speech prompt is being output by the speech dialogue system including receiving information from a prompter that initiates output of the speech prompt; and
  
  detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold, the sensitivity threshold used for segmentation to determine at least a beginning of speech activity,where the sensitivity threshold is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output, wherein the speech activity is considered present if a power density spectrum of the input signal is greater than a predetermined noise signal power spectrum times a predetermined factor and wherein the predetermined factor is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.
- View Dependent Claims (14, 15)
- - 14. The computer-readable medium of claim 13, where the step of determining whether a speech prompt is being output includes determining whether a signal level of a speech prompt from the prompt exceeds a predetermined threshold.
  - 15. The computer-readable medium of claim 13, further including software code portions that, when executed on the computer system, perform the step of determining an identity of a speaker of the input signal.

16. A speech dialogue system configured to detect barge-in, the speech dialogue system comprising:
- a prompter including a loudspeaker operationally enabled to output one or more speech prompts from the speech dialogue system; and
  
  a speech activity detector for detecting speech activity in an input signal based on a time-varying sensitivity threshold, wherein the speech activity detector includes a segmentation module configured to determine the beginning and the end of a speech component in the input signal based on the sensitivity threshold, including receiving information from a prompter that initiates output of the speech promptwhere the sensitivity threshold of the speech activity detector is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined, wherein the speech activity is considered present if a power density spectrum of the input signal is greater than a predetermined noise signal power spectrum times a predetermined factor and wherein the predetermined factor is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The speech dialogue system of claim 16, including an echo cancellation module configured to subtract a speech prompt signal from the input signal.
  - 18. The speech dialogue system of claim 16, including a pitch estimation module configured to compare an estimated pitch frequency of the input signal and an estimated pitch frequency of a speech prompt signal.
  - 19. The speech dialogue system of claim 16, including an energy detection module configured to estimate the power spectral density of the input signal.
  - 20. The speech dialogue system of claim 16, including a speaker identification module configured to determine which speaker of a plurality of speakers is using the speech dialogue.
  - 21. The computer-readable medium of claim 16, where the step of determining whether a speech prompt is being output includes receiving information from a prompter that initiates output of the speech prompt.

22. A method for detecting barge-in in a speech dialogue system, the method comprising:
- determining whether a speech prompt is being output by the speech dialogue system;
  
  determining a statistical model associated to at least one speaker interacting with the speech dialogue system, the statistical model representing barge-in behavior of the speaker and includes;
  
  an identity of the speaker,a number that a particular dialogue step is performed by the speaker,a priori probability of barge-in for the particular dialogue step,a time of barge-in incident, anda number of rejected barge-in recognitions; and
  
  detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and based on the speaker information,where the sensitivity threshold is adapted based on the statistical model of the identified speaker.
- View Dependent Claims (23)
- - 23. The method of claim 22 further including the step of modifying a speech prompt from the prompter based on the determined speaker identity.

24. A non-transitory computer-readable medium for use with a computer system, said computer-readable medium comprising software code portions that, when executed on the computer system, perform steps comprising:
- determining whether a speech prompt is being output by the speech dialogue system;
  
  and determining a statistical model associated to at least one speaker interacting with the speech dialogue system, the statistical model representing barge-in behavior of the speaker and includes;
  
  an identity of the speaker,a number that a particular dialogue step is performed by the speaker,a priori probability of barge-in for the particular dialogue step,a time of barge-in incident, anda number of rejected barge-in recognitions; and
  
  detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold and based on speaker information,where the sensitivity threshold is adapted based on the statistical model of the identified speaker is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.

25. A speech dialogue system configured to detect barge-in, the speech dialogue system comprising:
- a prompter including a loudspeaker operationally enabled to output one or more speech prompts from the speech dialogue system; and
  
  a speaker identification module to determine a statistical model associated to at least one speaker interacting with the speech dialogue system, the statistical model representing barge-in behavior of the speaker and includes;
  
  an identity of the speaker,a number that a particular dialogue step is performed by the speaker,a priori probability of barge-in for the particular dialogue step,a time of barge-in incident, anda number of rejected barge-in recognitions; and
  
  a speech activity detector for detecting speech activity in an input signal based on a time varying sensitivity threshold and based on speaker information,where the sensitivity threshold is adapted based on the statistical model of the identified speaker is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Buck, Markus, Gerl, Franz, Haulick, Tim, Herbig, Tobias, Schmidt, Gerhard Uwe, Schulz, Matthias
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sharma, Neeraj

Application Number

US12/415,927
Publication Number

US 20090254342A1
Time in Patent Office

2,226 Days
Field of Search

704/203, 704/275, 704/233, 704/246, 704/270, 704/266, 704/231, 704/503, 704/9, 704/251, 704/268, 379/88.01, 379/208.01, 379/88.02, 726/22
US Class Current

704/233
CPC Class Codes

G10L 15/222   Barge in, i.e. overridable ...

G10L 17/00   Speaker identification or v...

G10L 2025/786   Adaptive threshold

G10L 25/78   Detection of presence or ab...

Detecting barge-in in a speech dialogue system

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

82 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Detecting barge-in in a speech dialogue system

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

82 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others