Detection and use of acoustic signal quality indicators
First Claim
Patent Images
1. A computer-driven method to regulate a speaker'"'"'s issuance of speech based commands, the method comprising operations of:
- receiving an input signal representing audio content including speech of the speaker, the audio content occurring between a prescribed begin event and a prescribed end event;
processing at least a portion of the input signal to determine whether the processed portion of the input signal contains speech exhibiting one or more predetermined conditions capable of posing difficulty in successfully performing speech recognition upon the input signal;
where the processed portion of the input signal comprises one of;
all of the input signal, a part of the input signal, speech present in the input signal;
if the processing operation determines that the processed portion of the input signal contains speech exhibiting one or more of the predetermined conditions, then performing operations comprising;
issuing at least one human comprehensible speech quality alert corresponding to each exhibited predetermined condition;
the processing operation comprising;
determining whether the speaker was speaking too loudly relative to prescribed criteria;
the operation of determining whether the speaker was speaking too loudly relative to prescribed criteria comprising;
computing a non-silent-frame-count by computing a total number of frames of the input signal with acoustic energy above a first threshold;
computing a loud-frame-count by computing a total number of frames of the input signal with acoustic energy above a second threshold higher than the first threshold;
computing a ratio between the loud-frame-count and the non-silent-frame-count;
determining whether the ratio exceeds a third threshold, and if so, concluding that the speaker was speaking too loudly relative to the prescribed criteria.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-driven device assists a user in self-regulating speech control of the device. The device processes an input signal representing human speech to compute acoustic signal quality indicators indicating conditions likely to be problematic to speech recognition, and advises the user of those conditions.
-
Citations
36 Claims
-
1. A computer-driven method to regulate a speaker'"'"'s issuance of speech based commands, the method comprising operations of:
-
receiving an input signal representing audio content including speech of the speaker, the audio content occurring between a prescribed begin event and a prescribed end event; processing at least a portion of the input signal to determine whether the processed portion of the input signal contains speech exhibiting one or more predetermined conditions capable of posing difficulty in successfully performing speech recognition upon the input signal; where the processed portion of the input signal comprises one of;
all of the input signal, a part of the input signal, speech present in the input signal;if the processing operation determines that the processed portion of the input signal contains speech exhibiting one or more of the predetermined conditions, then performing operations comprising;
issuing at least one human comprehensible speech quality alert corresponding to each exhibited predetermined condition;the processing operation comprising; determining whether the speaker was speaking too loudly relative to prescribed criteria; the operation of determining whether the speaker was speaking too loudly relative to prescribed criteria comprising; computing a non-silent-frame-count by computing a total number of frames of the input signal with acoustic energy above a first threshold; computing a loud-frame-count by computing a total number of frames of the input signal with acoustic energy above a second threshold higher than the first threshold; computing a ratio between the loud-frame-count and the non-silent-frame-count; determining whether the ratio exceeds a third threshold, and if so, concluding that the speaker was speaking too loudly relative to the prescribed criteria. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-driven method to regulate a speaker'"'"'s issuance of speech based commands, the method comprising operations of:
-
receiving an input signal representing audio content including speech of the speaker, the audio content occurring between a prescribed begin event and a prescribed end event; processing at least a portion of the input signal to determine whether the processed portion of the input signal contains speech exhibiting one or more predetermined conditions capable of posing difficulty in successfully performing speech recognition upon the input signal; where the processed portion of the input signal comprises one of;
all of the input signal, a part of the input signal, speech present in the input signal;if the processing operation determines that the processed portion of the input signal contains speech exhibiting one or more of the predetermined conditions, then performing operations comprising;
issuing at least one human comprehensible speech quality alert corresponding to each exhibited predetermined condition;the processing operation comprising; determining whether the speaker was speaking too softly relative to prescribed criteria; the operation of determining whether the speaker was speaking too softly relative to prescribed criteria comprising; computing a non-silent-frame-count by computing a total number of frames of the input signal that have acoustic energy above a first threshold; computing a soft-frame-count by computing a total number of frames of the input signal that have acoustic energy less than a second threshold; computing a ratio between the soft-frame-count and the non-silent-frame-count; determining whether the ratio is greater than a third threshold, and if so, concluding that the speaker was speaking too softly relative to the prescribed criteria. - View Dependent Claims (14, 15, 16)
-
-
17. A non-transitory computer medium storing one of (1) a program to perform operations for regulating a speaker'"'"'s issuance of speech-based commands, or (2) a program for installing a target program of machine-readable instructions on a computer, where the target program is executable to perform the operations for regulating a speaker'"'"'s issuance of speech-based commands, the operations comprising:
-
receiving an input signal representing audio content including speech of the speaker, the audio content occurring between a prescribed begin event and a prescribed end event; processing at least a portion of the input signal to determine whether the processed portion of the input signal contains speech exhibiting one or more predetermined conditions capable of posing difficulty in successfully performing speech recognition upon the input signal; where the processed portion of the input signal comprises one of;
all of the input signal, a part of the input signal, speech present in the input signal;if the processing operation determines that the processed portion of the input signal contains speech exhibiting one or more of the predetermined conditions, then performing operations comprising; issuing at least one human comprehensible speech quality alert corresponding to each exhibited predetermined condition; where the input signal originated from a microphone; the processing operation comprising; determining whether the microphone was held too near to the speaker; and determining whether the speaker was speaking too loudly relative to prescribed criteria; the operation of determining whether the speaker was speaking too loudly relative to prescribed criteria comprising; computing a non-silent-frame-count by computing a total number of frames of the input signal with acoustic energy above a first threshold; computing a loud-frame-count by computing a total number of frames of the input signal with acoustic energy above a second threshold higher than the first threshold; computing a ratio between the loud-frame-count and the non-silent-frame-count; determining whether the ratio exceeds a third threshold, and if so, concluding that the speaker was speaking too loudly relative to the prescribed criteria.
-
-
18. A non-transitory computer medium storing one of (1) a program to perform operations for regulating a speaker'"'"'s issuance of speech-based commands, or (2) a program for installing a target program of machine-readable instructions on a computer, where the target program is executable to perform the operations for regulating a speaker'"'"'s issuance of speech-based commands, the operations comprising:
-
receiving an input signal representing audio content including speech of the speaker, the audio content occurring between a prescribed begin event and a prescribed end event; processing at least a portion of the input signal to determine whether the processed portion of the input signal contains speech exhibiting one or more predetermined conditions capable of posing difficulty in successfully performing speech recognition upon the input signal; where the processed portion of the input signal comprises one of;
all of the input signal, a part of the input signal, speech present in the input signal;if the processing operation determines that the processed portion of the input signal contains speech exhibiting one or more of the predetermined conditions, then performing operations comprising; issuing at least one human comprehensible speech quality alert corresponding to each exhibited predetermined condition; where the input signal originated from a microphone; the processing operation comprising; determining whether the microphone was held too near to the speaker; and determining whether the speaker was speaking too softly relative to prescribed criteria, the operation of determining whether the speaker was speaking too softly relative to prescribed criteria comprising; computing a non-silent-frame-count by computing a total number of frames of the input signal that have acoustic energy above a first threshold; computing a soft-frame-count by computing a total number of frames of the input signal that have acoustic energy less than a second threshold; computing a ratio between the soft-frame-count and the non-silent-frame-count; determining whether the ratio is greater than a third threshold, and if so, concluding that the speaker was speaking too softly relative to the prescribed criteria.
-
-
19. A non-transitory computer medium having program instructions stored therein which, when executed by a processor, perform operations comprising:
-
receiving an input signal representing audio content including speech of the speaker, the audio content occurring between a prescribed begin event and a prescribed end event; processing at least a portion of the input signal to determine whether the processed portion of the input signal contains speech exhibiting one or more predetermined conditions capable of posing difficulty in successfully performing speech recognition upon the input signal; where the processed portion of the input signal comprises one of; all of the input signal, a part of the input signal, speech present in the input signal; if the processing operation determines that the processed portion of the input signal contains speech exhibiting one or more of the predetermined conditions, then performing operations comprising;
issuing at least one human comprehensible speech quality alert corresponding to each exhibited predetermined condition;the processing operation comprising; determining whether the speaker was speaking too loudly relative to prescribed criteria; the operation of determining whether the speaker was speaking too loudly relative to prescribed criteria comprising; computing a non-silent-frame-count by computing a total number of frames of the input signal with acoustic energy above a first threshold; computing a loud-frame-count by computing a total number of frames of the input signal with acoustic energy above a second threshold higher than the first threshold; computing a ratio between the loud-frame-count and the non-silent-frame-count; determining whether the ratio exceeds a third threshold, and if so, concluding that the speaker was speaking too loudly relative to the prescribed criteria. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A non-transitory computer medium having program instructions stored therein which, when executed by a processor, perform operations comprising:
-
receiving an input signal representing audio content including speech of the speaker, the audio content occurring between a prescribed begin event and a prescribed end event; processing at least a portion of the input signal to determine whether the processed portion of the input signal contains speech exhibiting one or more predetermined conditions capable of posing difficulty in successfully performing speech recognition upon the input signal; where the processed portion of the input signal comprises one of; all of the input signal, a part of the input signal, speech present in the input signal; if the processing operation determines that the processed portion of the input signal contains speech exhibiting one or more of the predetermined conditions, then performing operations comprising;
issuing at least one human comprehensible speech quality alert corresponding to each exhibited predetermined condition;the processing operation comprising; determining whether the speaker was speaking too softly relative to prescribed criteria; the operation of determining whether the speaker was speaking too softly relative to prescribed criteria comprising; computing a non-silent-frame-count by computing a total number of frames of the input signal that have acoustic energy above a first threshold; computing a soft-frame-count by computing a total number of frames of the input signal that have acoustic energy less than a second threshold; computing a ratio between the soft-frame-count and the non-silent-frame-count; determining whether the ratio is greater than a third threshold, and if so, concluding that the speaker was speaking too softly relative to the prescribed criteria.
-
Specification