Correcting for impulse noise in speech recognition systems
First Claim
1. A system for correcting for an impulse noise in speech recognition systems, the system comprising:
- a microphone;
a speaker; and
an electronic processor, communicatively coupled to the microphone and the speaker, and configured toreceive, via the microphone, an audio signal representing an utterance;
detect, within the utterance, the impulse noise;
in response to detecting the impulse noise, generate an annotated utterance including a timing of the impulse noise;
segment the annotated utterance into silence, voice content, and other content, wherein the other content indicates that some voice content of the utterance has been missed due to the impulse noise; and
when a length of the other content is greater than or equal to an average word length for the annotated utterance,determine, based on the voice content, an intent portion of the annotated utterance and an entity portion of the annotated utterance;
generate a voice prompt based on the timing of the impulse noise and at least one of the group consisting of a timing of the intent portion and a timing of the entity portion; and
play the voice prompt via the speaker.
1 Assignment
0 Petitions
Accused Products
Abstract
System and method for correcting for impulse noise in speech recognition systems. One example system includes a microphone, a speaker, and an electronic processor. The electronic processor is configured to receive an audio signal representing an utterance. The electronic processor is configured to detect, within the utterance, the impulse noise, and, in response, generate an annotated utterance including a timing of the impulse noise. The electronic processor is configured to segment the annotated utterance into silence, voice content, and other content, and, when a length of the other content is greater than or equal to an average word length for the annotated utterance, determine, based on the voice content, an intent portion and an entity portion. The electronic processor is configured to generate a voice prompt based on the timing of the impulse noise and the intent portion and/or the entity portion, and to play the voice prompt.
10 Citations
20 Claims
-
1. A system for correcting for an impulse noise in speech recognition systems, the system comprising:
-
a microphone; a speaker; and an electronic processor, communicatively coupled to the microphone and the speaker, and configured to receive, via the microphone, an audio signal representing an utterance; detect, within the utterance, the impulse noise; in response to detecting the impulse noise, generate an annotated utterance including a timing of the impulse noise; segment the annotated utterance into silence, voice content, and other content, wherein the other content indicates that some voice content of the utterance has been missed due to the impulse noise; and when a length of the other content is greater than or equal to an average word length for the annotated utterance, determine, based on the voice content, an intent portion of the annotated utterance and an entity portion of the annotated utterance; generate a voice prompt based on the timing of the impulse noise and at least one of the group consisting of a timing of the intent portion and a timing of the entity portion; and play the voice prompt via the speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for correcting for an impulse noise in speech recognition systems, the method comprising:
-
receiving, via a microphone communicatively coupled to an electronic processor, an audio signal representing an utterance; detecting, with the electronic processor, the impulse noise within the utterance; in response to detecting the impulse noise, generating, with the electronic processor, an annotated utterance including a timing of the impulse noise; segmenting, with the electronic processor, the annotated utterance into silence, voice content, and other content, wherein the other content indicates that some voice content of the utterance has been missed due to the impulse noise; and when a length of the other content is greater than or equal to an average word length for the annotated utterance, determining, with the electronic processor, based on the voice content, an intent portion of the annotated utterance and an entity portion of the annotated utterance; generating, with the electronic processor, a voice prompt based on the timing of the impulse noise and at least one of the group consisting of a timing of the intent portion and a timing of the entity portion; and playing, with the electronic processor, the voice prompt via a speaker communicatively coupled to the electronic processor. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification