Detecting barge-in in a speech dialogue system
First Claim
Patent Images
1. A method for detecting barge-in in a speech dialogue system, the method comprising:
- determining whether a speech prompt is being output by the speech dialogue system including receiving information from a prompter that initiates output of the speech prompt; and
detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector, the sensitivity threshold used for segmentation to determine at least a beginning of speech activity,where the sensitivity threshold is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output,wherein the speech activity is considered present if a power density spectrum of the input signal is greater than a predetermined noise signal power spectrum times a predetermined factor and wherein the predetermined factor is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for detecting barge-in in a speech dialog system comprising determining whether a speech prompt is output by the speech dialog system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialog system configured to detect barge-in is also disclosed.
82 Citations
25 Claims
-
1. A method for detecting barge-in in a speech dialogue system, the method comprising:
-
determining whether a speech prompt is being output by the speech dialogue system including receiving information from a prompter that initiates output of the speech prompt; and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector, the sensitivity threshold used for segmentation to determine at least a beginning of speech activity, where the sensitivity threshold is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output, wherein the speech activity is considered present if a power density spectrum of the input signal is greater than a predetermined noise signal power spectrum times a predetermined factor and wherein the predetermined factor is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable medium for use with a computer system, said computer-readable medium comprising software code portions that, when executed on the computer system, perform steps comprising:
-
determining whether a speech prompt is being output by the speech dialogue system including receiving information from a prompter that initiates output of the speech prompt; and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold, the sensitivity threshold used for segmentation to determine at least a beginning of speech activity, where the sensitivity threshold is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output, wherein the speech activity is considered present if a power density spectrum of the input signal is greater than a predetermined noise signal power spectrum times a predetermined factor and wherein the predetermined factor is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output. - View Dependent Claims (14, 15)
-
-
16. A speech dialogue system configured to detect barge-in, the speech dialogue system comprising:
-
a prompter including a loudspeaker operationally enabled to output one or more speech prompts from the speech dialogue system; and a speech activity detector for detecting speech activity in an input signal based on a time-varying sensitivity threshold, wherein the speech activity detector includes a segmentation module configured to determine the beginning and the end of a speech component in the input signal based on the sensitivity threshold, including receiving information from a prompter that initiates output of the speech prompt where the sensitivity threshold of the speech activity detector is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined, wherein the speech activity is considered present if a power density spectrum of the input signal is greater than a predetermined noise signal power spectrum times a predetermined factor and wherein the predetermined factor is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A method for detecting barge-in in a speech dialogue system, the method comprising:
-
determining whether a speech prompt is being output by the speech dialogue system; determining a statistical model associated to at least one speaker interacting with the speech dialogue system, the statistical model representing barge-in behavior of the speaker and includes; an identity of the speaker, a number that a particular dialogue step is performed by the speaker, a priori probability of barge-in for the particular dialogue step, a time of barge-in incident, and a number of rejected barge-in recognitions; and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and based on the speaker information, where the sensitivity threshold is adapted based on the statistical model of the identified speaker. - View Dependent Claims (23)
-
-
24. A non-transitory computer-readable medium for use with a computer system, said computer-readable medium comprising software code portions that, when executed on the computer system, perform steps comprising:
-
determining whether a speech prompt is being output by the speech dialogue system; and determining a statistical model associated to at least one speaker interacting with the speech dialogue system, the statistical model representing barge-in behavior of the speaker and includes; an identity of the speaker, a number that a particular dialogue step is performed by the speaker, a priori probability of barge-in for the particular dialogue step, a time of barge-in incident, and a number of rejected barge-in recognitions; and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold and based on speaker information, where the sensitivity threshold is adapted based on the statistical model of the identified speaker is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.
-
-
25. A speech dialogue system configured to detect barge-in, the speech dialogue system comprising:
-
a prompter including a loudspeaker operationally enabled to output one or more speech prompts from the speech dialogue system; and a speaker identification module to determine a statistical model associated to at least one speaker interacting with the speech dialogue system, the statistical model representing barge-in behavior of the speaker and includes; an identity of the speaker, a number that a particular dialogue step is performed by the speaker, a priori probability of barge-in for the particular dialogue step, a time of barge-in incident, and a number of rejected barge-in recognitions; and a speech activity detector for detecting speech activity in an input signal based on a time varying sensitivity threshold and based on speaker information, where the sensitivity threshold is adapted based on the statistical model of the identified speaker is increased if it is determined that a speech prompt is being output, and decreased if it is determined that no output of a speech prompt is being output.
-
Specification