Speech end-pointer
First Claim
Patent Images
1. An end-pointer that determines a beginning and an end of a speech segment comprising:
- a voice triggering module that identifies a portion of an audio stream comprising an audio speech segment;
a rule module in communication with the voice triggering module, the rule module comprising a plurality of rules used by a processor to analyze a part of the audio stream to detect a beginning and an end of the audio speech segment; and
a consonant detector that calculates a difference between a signal-to-noise ratio in a high frequency band and a signal-to-noise ratio in a low frequency band, where the consonant detector converts the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood of a high frequency consonant in the portion of the audio stream;
where the beginning of the audio speech segment and the end of the audio speech segment represent boundaries between speech and non-speech portions of the audio stream, and where the rule module identifies the beginning of the audio speech segment or the end of the audio speech segment based on an output of the consonant detector.
8 Assignments
0 Petitions
Accused Products
Abstract
An end-pointer determines a beginning and an end of a speech segment. The end-pointer includes a voice triggering module that identifies a portion of an audio stream that has an audio speech segment. A rule module communicates with the voice triggering module. The rule module includes a plurality of rules used to analyze a part of the audio stream to detect a beginning and an end of the audio speech segment. A consonant detector detects occurrences of a high frequency consonant in the portion of the audio stream.
127 Citations
43 Claims
-
1. An end-pointer that determines a beginning and an end of a speech segment comprising:
-
a voice triggering module that identifies a portion of an audio stream comprising an audio speech segment; a rule module in communication with the voice triggering module, the rule module comprising a plurality of rules used by a processor to analyze a part of the audio stream to detect a beginning and an end of the audio speech segment; and a consonant detector that calculates a difference between a signal-to-noise ratio in a high frequency band and a signal-to-noise ratio in a low frequency band, where the consonant detector converts the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood of a high frequency consonant in the portion of the audio stream; where the beginning of the audio speech segment and the end of the audio speech segment represent boundaries between speech and non-speech portions of the audio stream, and where the rule module identifies the beginning of the audio speech segment or the end of the audio speech segment based on an output of the consonant detector. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method that identifies a beginning and an end of a speech segment using an end-pointer comprising:
-
receiving a portion of an audio stream; determining whether the portion of the audio stream includes a triggering characteristic; calculating a difference between a signal-to-noise ratio in a high frequency band of the portion of the audio stream and a signal-to-noise ratio in a low frequency band of the portion of the audio stream; converting, by a consonant detector implemented in hardware or embodied in a computer-readable storage medium, the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood of a high frequency consonant in the portion of the audio stream; and applying a rule that passes only a portion of the audio stream to a device when the triggering characteristic identifies a beginning of a voiced segment and an end of a voiced segment; where the identification of the end of the voiced segment is based on an output of the consonant detector, where the end of the voiced segment represents a boundary between speech and non-speech portions of the audio stream. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A system that identifies a beginning and an end of a speech segment comprising:
-
an end-pointer comprising a processor that analyzes a dynamic aspect of an audio stream to determine the beginning and the end of the speech segment; and a high frequency consonant detector that marks the end of the speech segment, where the high frequency consonant detector calculates a difference between a signal-to-noise ratio in a high frequency band of the audio stream and a signal-to-noise ratio in a low frequency band of the audio stream, and where the high frequency consonant detector converts the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood that a high frequency consonant exists in a frame of the audio stream; where the beginning of the speech segment and the end of the speech segment represent boundaries between speech and non-speech portions of the audio stream, and where the end-pointer identifies the beginning of the audio speech segment or the end of the audio speech segment based on an output of the high frequency consonant detector. - View Dependent Claims (28, 29, 30, 31, 32, 33)
-
-
34. A system that determines a beginning and an end of an audio speech segment in an audio stream, comprising:
-
an /s/ detector that converts a difference between a signal-to-noise ratio in a high frequency band of the audio stream and a signal-to-noise ratio in a low frequency band of the audio stream into a probability value that predicts a likelihood of an /s/ sound in the audio stream; and an end-pointer comprising a processor that varies an amount of an audio input sent to a recognition device based on a plurality of rules and an output of the /s/ detector; where the end-pointer identifies a beginning of the audio input or an end of the audio input based on the output of the /s/ detector, and where the beginning of the audio input and the end of the audio input represent boundaries between speech and non-speech portions of the audio stream. - View Dependent Claims (35)
-
-
36. A non-transitory computer readable medium that stores software that determines at least one of a beginning and end of an audio speech segment comprising:
-
a detector that converts sound waves into operational signals; a triggering logic that analyzes a periodicity of the operational signals; a signal analysis logic that analyzes a variable portion of the sound waves that are associated with the audio speech segment to determine a beginning and end of the audio speech segment, and a consonant detector that calculates a difference between a signal-to-noise ratio in a high frequency band and a signal-to-noise ratio in a low frequency band, where the consonant detector converts the difference between the signal-to-noise ratio in the high frequency band and the signal-to-noise ratio in the low frequency band into a probability value that predicts a likelihood of an /s/ sound in the sound waves, where the consonant detector provides an input to the signal analysis logic when the /s/ is detected; where the beginning of the audio speech segment and the end of the audio speech segment represent boundaries between speech and non-speech portions of the sound waves, and where the signal analysis module identifies the beginning of the audio speech segment or the end of the audio speech segment based on an output of the consonant detector. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43)
-
Specification