System and Method for Multimodal Utterance Detection
First Claim
Patent Images
1. A computer-implemented speech utterance detection method comprising:
- a) generating a plurality of features from an audio stream;
b) obtaining a plurality of time aligned speech segments based on the features;
c) filtering the plurality of speech segments using general speech related knowledge and application specific knowledge to yield at least one candidate segment;
e) finding a desired speech segment from the at least one candidate segment based on multimodal timing information related to the desired speech segment; and
f) outputting the desired speech segment.
0 Assignments
0 Petitions
Accused Products
Abstract
The disclosure describe a system and method for detecting one or more segments of desired speech utterances from an audio stream using timings of events from other modes that are correlated to the timings of the desired segments of speech. The redundant information from other modes results in a highly accurate and robust utterance detection.
90 Citations
1 Claim
-
1. A computer-implemented speech utterance detection method comprising:
-
a) generating a plurality of features from an audio stream; b) obtaining a plurality of time aligned speech segments based on the features; c) filtering the plurality of speech segments using general speech related knowledge and application specific knowledge to yield at least one candidate segment; e) finding a desired speech segment from the at least one candidate segment based on multimodal timing information related to the desired speech segment; and f) outputting the desired speech segment.
-
Specification