Speech end-pointer
First Claim
Patent Images
1. A speech end-pointer system, comprising:
- a computer processor;
a voice triggering module configured to identify a portion of an audio stream comprising a speech segment; and
a rule module in communication with the voice triggering module, the rule module comprising a plurality of rules used by the computer processor to analyze the audio stream and detect a beginning and an end of the speech segment, where the plurality of rules comprises one or more rules based on an energy counter;
where the beginning of the speech segment and the end of the speech segment represent boundaries between speech and non-speech portions of the audio stream; and
where the computer processor is configured to determine whether a frame of the audio stream has energy above a background noise level and increment the energy counter by a length of the frame in response to a determination that the frame has energy above the background noise level.
7 Assignments
0 Petitions
Accused Products
Abstract
A rule-based end-pointer isolates spoken utterances contained within an audio stream from background noise and non-speech transients. The rule-based end-pointer includes a plurality of rules to determine the beginning and/or end of a spoken utterance based on various speech characteristics. The rules may analyze an audio stream or a portion of an audio stream based upon an event, a combination of events, the duration of an event, or a duration relative to an event. The rules may be manually or dynamically customized depending upon factors that may include characteristics of the audio stream itself, an expected response contained within the audio stream, or environmental conditions.
126 Citations
20 Claims
-
1. A speech end-pointer system, comprising:
-
a computer processor; a voice triggering module configured to identify a portion of an audio stream comprising a speech segment; and a rule module in communication with the voice triggering module, the rule module comprising a plurality of rules used by the computer processor to analyze the audio stream and detect a beginning and an end of the speech segment, where the plurality of rules comprises one or more rules based on an energy counter; where the beginning of the speech segment and the end of the speech segment represent boundaries between speech and non-speech portions of the audio stream; and where the computer processor is configured to determine whether a frame of the audio stream has energy above a background noise level and increment the energy counter by a length of the frame in response to a determination that the frame has energy above the background noise level. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A speech end-pointing method, comprising:
-
receiving an audio stream; analyzing energy and noise characteristics of a frame of the audio stream by a computer processor to determine whether the frame has energy above a background noise level; incrementing an energy counter by a length of the frame in response to a determination by the computer processor that the frame has energy above the background noise level; incrementing a lack of energy counter by the length of the frame in response to a determination by the computer processor that the frame does not have energy above the background noise level; and applying a plurality of rules by the computer processor to detect a beginning and an end of a speech segment of the audio stream based on the energy counter and the lack of energy counter. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable medium with instructions stored thereon, where the instructions are executable by a computer processor to cause the computer processor to perform the steps of:
-
receiving an audio stream; analyzing energy and noise characteristics of a frame of the audio stream to determine whether the frame has energy above a background noise level; incrementing an energy counter by a length of the frame in response to a determination that the frame has energy above the background noise level; incrementing a lack of energy counter by the length of the frame in response to a determination that the frame does not have energy above the background noise level; and applying a plurality of rules to detect a beginning and an end of a speech segment of the audio stream based on the energy counter and the lack of energy counter.
-
Specification