Systems and methods for implicitly interpreting semantically redundant communication modes
First Claim
1. A system for recognizing and evaluating possible relationships between terms expressed during cross-communication activities, the system comprising:
- a memory;
a processor in signal communication with the memory;
a speech recognition system having a speech collection device arranged to receive an ambiguously delimited speech signal and then transcribe the speech signal to a first plurality of sequences of articulatory features related to a portion of the speech signal;
an ink segmentation and recognition system having an ink input receiving device configured to receive an ambiguously delimited digital ink input while the speech recognition system receives the speech portion, the ink segmentation and recognition system further configured to segment ink input that constitutes sketches from those that constitutes handwriting and then identify a second plurality of sequences of articulatory features related to the handwriting; and
a multimodal fusion engine in signal communication with the processor, the multimodal fusion engine having a search alignment system configured to substantially align the articulatory features derived from the ambiguously delimited speech signal and the ambiguously delimited ink input using a coherence measure across articulatory features representations of the ambiguous inputs as between a candidate portion of the speech signal a candidate portion of the ink input.
2 Assignments
0 Petitions
Accused Products
Abstract
New language constantly emerges from complex, collaborative human-human interactions like meetings—such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. The system and method described includes devices for receiving various types of human communication activities (e.g., speech, writing and gestures) presented in a multimodally redundant manner, includes processors and recognizers for segmenting or parsing, and then recognizing selected sub-word units such as phonemes and syllables, and then includes alignment, refinement, and integration modules to find or at least an approximate match to the one or more terms that were presented in the multimodally redundant manner. Once the system has performed a successful integration, one or more terms may be newly enrolled into a database of the system, which permits the system to continuously learn and provide an association for proper names, abbreviations, acronyms, symbols, and other forms of communicated language.
-
Citations
18 Claims
-
1. A system for recognizing and evaluating possible relationships between terms expressed during cross-communication activities, the system comprising:
-
a memory; a processor in signal communication with the memory; a speech recognition system having a speech collection device arranged to receive an ambiguously delimited speech signal and then transcribe the speech signal to a first plurality of sequences of articulatory features related to a portion of the speech signal; an ink segmentation and recognition system having an ink input receiving device configured to receive an ambiguously delimited digital ink input while the speech recognition system receives the speech portion, the ink segmentation and recognition system further configured to segment ink input that constitutes sketches from those that constitutes handwriting and then identify a second plurality of sequences of articulatory features related to the handwriting; and a multimodal fusion engine in signal communication with the processor, the multimodal fusion engine having a search alignment system configured to substantially align the articulatory features derived from the ambiguously delimited speech signal and the ambiguously delimited ink input using a coherence measure across articulatory features representations of the ambiguous inputs as between a candidate portion of the speech signal a candidate portion of the ink input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18)
-
-
13. A method for recognizing and evaluating possible relationships between terms expressed during multiple communication modes, the method comprising:
-
detecting at least two ambiguously delimited modes of communication selected from the group consisting of speech, handwriting, sketches, and physical gestures; receiving at least two of the ambiguously delimited modes of communication within a memory of a computational processing system; determining a time period between a first communication mode and a second communication mode to check for a multimodal redundancy; within the time period, aligning a plurality of articulatory features of the first communication mode with a plurality of articulatory features of the second communication mode using a coherence measure across the ambiguously delimited articulatory features of the first and second communication modes; generating a group of hypothesized redundant terms based on the time period and based on the plurality of articulatory features of the first and second communication modes; reducing a number of the hypothesized redundant terms to populate a matrix of possibly related sub-word units from which a best-fit term is to be selected; and determining the multimodal redundancy by selecting the best-fit term based at least in part on the coherence measure of the alignment of the first and second communication modes. - View Dependent Claims (14, 15, 16)
-
Specification