Acoustic speech recognizer system and method
First Claim
1. A system for use in speech recognition wherein a user receives a synthetic or recorded speech prompt from a text-to-speech (TTS) server via at least one network, comprising:
- a client application for communicating, via the at least one network, with a speech recognition (SR) server, the TTS server, and, at a location of the user, a microphone;
wherein;
said client application enables the SR server to receive speech data provided by the user via the microphone; and
said client application determines whether the TTS server is operating, where the TTS server outputs a speech prompt when it is operating, and, if it is determined that the TTS server is operating, the client application operates in a state where it determines whether barge-in speech has been detected by processing an audio input received via the microphone, and, if it is determined that the TTS server is not operating, the client application operates in a state where it does not determine whether barge-in speech has been detected.
5 Assignments
0 Petitions
Accused Products
Abstract
An adaptive endpointer system and method are used in speech recognition applications, such as telephone-based Internet browsers, to determine barge-in events during the processing of speech. The endpointer system includes a signal energy level estimator for estimating signal levels in speech data; a noise energy level estimator for estimating noise levels in the speech data; and a barge-in detector for increasing a threshold used in comparing the signal levels and the noise levels to detect the barge-in event in the speech data corresponding to a speech prompt during speech recognition.
-
Citations
11 Claims
-
1. A system for use in speech recognition wherein a user receives a synthetic or recorded speech prompt from a text-to-speech (TTS) server via at least one network, comprising:
-
a client application for communicating, via the at least one network, with a speech recognition (SR) server, the TTS server, and, at a location of the user, a microphone;
wherein;
said client application enables the SR server to receive speech data provided by the user via the microphone; and
said client application determines whether the TTS server is operating, where the TTS server outputs a speech prompt when it is operating, and, if it is determined that the TTS server is operating, the client application operates in a state where it determines whether barge-in speech has been detected by processing an audio input received via the microphone, and, if it is determined that the TTS server is not operating, the client application operates in a state where it does not determine whether barge-in speech has been detected. - View Dependent Claims (4, 5)
the client application is implemented as a state machine.
-
-
5. The system of claim 1, wherein:
the audio input is processed using a signal energy level estimator for estimating signal levels thereof, and a noise energy level estimator for estimating noise levels thereof.
-
2. The system of clam 1, wherein:
-
if said client application determines that the TTS server is operating but no such barge-in speech has been detected, said client application waits and determines whether the TS server is quiet, indicating that the TTS server is no longer operating. - View Dependent Claims (3)
if said client application determines that the TTS server is quiet, the client application transitions from the state where it determines whether barge-in speech has been detected to the state where it does not determine whether barge-in speech has been detected.
-
-
6. A method for use in speech recognition wherein a user receives a synthetic or recorded speech prompt from a text-to-speech (TTS) server via at least one network, comprising:
-
providing a client application for communicating, via the at least one network with a speech recognition (SR) server, the TTS server, and, at a location of the user, a microphone;
wherein the client application enables the SR server to receive speech data provided by the user via the microphone; and
determining whether the TTS server is operating, where the TTS server outputs a speech prompt when it is operating, and, if it is determined that the TTS server is operating, operating the client application in a state where it determines whether barge-in speech has been detected by processing an audio input received via the microphone, and, if it is determined that the TTS server is not operating, operating the client application in a state where it does not determine whether barge-in speech has been detected. - View Dependent Claims (7, 8, 9, 10)
if the client application determines that the TTS server is operating but no such barge-in speech has been detected, said client application waits and determines whether the TTS server is quiet, indicating that the TTS server is no longer operating.
-
-
8. The method of claim 7, wherein:
if the client application determines that the TTS server is quiet, the client application transitions from the state where it determines whether barge-in speech has been detected to the state where it does not determine whether barge-in speech has been detected.
-
9. The method of claim 6, wherein:
the client application is implemented as a state machine.
-
10. The method of claim 6, wherein:
the audio input is processed using a signal energy level estimator for estimating signal levels thereof, and a noise energy level estimator for estimating noise levels thereof.
-
11. A computer readable medium for use in speech recognition, wherein a user receives a synthetic or recorded speech prompt from a text-to-speech (TTS) server via at least one network, comprising:
-
software which is executable to;
(a) provide a client application for communicating, via the at least one network, with a speech recognition (SR) server, the TTS server, and, at a location of the user, a microphone;
wherein the client application enables the SR server to receive speech data provided by the user via the microphone; and
(b) determine whether the TTS server is operating, where the TTS server outputs a speech prompt when it is operating, and, if it is determined that the TTS server is operating, operating the client application in a state where it determines whether barge-in speech has been detected by processing an audio input received via the microphone, and, if it is determined that the TTS server is not operating, operating the client application in a state where it does not determine whether barge-in speech has been detected.
-
Specification