SYSTEM AND METHOD FOR DYNAMIC LEARNING
First Claim
1. A system for recognizing and evaluating possible relationships between terms expressed during cross-communication activities, the system comprising:
- a memory;
a processor in signal communication with the memory;
a speech recognition system having a speech collection device arranged to receive a speech portion and then transcribe the speech portion to a first set of sub-word textual sequences related to the speech portion;
an ink recognition system having an ink input receiving device configured to receive written input at least contemporaneously while the speech recognition system receives the speech portion, the ink recognition system further configured to identify a second set of sub-word textual sequences related to the written input; and
a multimodal fusion engine in signal communication with the processor, the multimodal fusion engine comprising;
an alignment system having a plurality of grammar-based phoneme recognizers configured to identify a number of phonetically close terms corresponding to a modally redundant term defined by a temporal relationship between the speech portion and the written input, the grammar-based phoneme recognizers operable to generate a first-pass alignment matrix in which the first set of sub-word textual sequences related to the speech portion are selectively aligned with the second set sub-word sequences related to the written input;
a refinement system in communication with the alignment system for dynamically modeling the first and second sub-word sequences captured in the alignment matrix by identifying a desired path within the alignment matrix and then modifying the desired path based on temporal boundaries associated with the speech portion and the written input; and
an integration system in communication with the refinement system to select a desired term that is estimated to be a best-fit to the modally redundant term, the integration system configured to generate a normalized match score based on information received at least from the alignment system and the refinement system.
2 Assignments
0 Petitions
Accused Products
Abstract
New language constantly emerges from complex, collaborative human-human interactions like meetings—such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. The system and method described includes devices for receiving various types of human communication activities (e.g., speech, writing and gestures) presented in a multimodally redundant manner, includes processors and recognizers for segmenting or parsing, and then recognizing selected sub-word units such as phonemes and syllables, and then includes alignment, refinement, and integration modules to find or at least an approximate match to the one or more terms that were presented in the multimodally redundant manner. Once the system has performed a successful integration, one or more terms may be newly enrolled into a database of the system, which permits the system to continuously learn and provide an association for proper names, abbreviations, acronyms, symbols, and other forms of communicated language.
111 Citations
11 Claims
-
1. A system for recognizing and evaluating possible relationships between terms expressed during cross-communication activities, the system comprising:
-
a memory; a processor in signal communication with the memory; a speech recognition system having a speech collection device arranged to receive a speech portion and then transcribe the speech portion to a first set of sub-word textual sequences related to the speech portion; an ink recognition system having an ink input receiving device configured to receive written input at least contemporaneously while the speech recognition system receives the speech portion, the ink recognition system further configured to identify a second set of sub-word textual sequences related to the written input; and a multimodal fusion engine in signal communication with the processor, the multimodal fusion engine comprising; an alignment system having a plurality of grammar-based phoneme recognizers configured to identify a number of phonetically close terms corresponding to a modally redundant term defined by a temporal relationship between the speech portion and the written input, the grammar-based phoneme recognizers operable to generate a first-pass alignment matrix in which the first set of sub-word textual sequences related to the speech portion are selectively aligned with the second set sub-word sequences related to the written input; a refinement system in communication with the alignment system for dynamically modeling the first and second sub-word sequences captured in the alignment matrix by identifying a desired path within the alignment matrix and then modifying the desired path based on temporal boundaries associated with the speech portion and the written input; and an integration system in communication with the refinement system to select a desired term that is estimated to be a best-fit to the modally redundant term, the integration system configured to generate a normalized match score based on information received at least from the alignment system and the refinement system. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for recognizing and evaluating possible relationships between terms expressed during multiple communication modes, the method comprising:
-
detecting at least two modes of communication selected from the group consisting of speech, writing, and physical gestures; receiving at least two of the modes of communication within a memory of a computational processing system; determining a time period between a first communication mode and a second communication mode; aligning a selected feature of the first communication mode with a selected feature of the second communication mode; generating a group of hypothesized redundant terms based on the time period and based on the selected features of the first and second communication modes; reducing a number of the hypothesized redundant terms to populate a matrix of possibly related sub-word units from which a best-fit term is to be selected; and selecting the best-fit term based at least in part on a multimodal redundancy between the first communication mode and the second communication mode. - View Dependent Claims (9, 10, 11)
-
Specification