Systems and methods for implicitly interpreting semantically redundant communication modes

US 8,457,959 B2
Filed: 02/29/2008
Issued: 06/04/2013
Est. Priority Date: 03/01/2007
Status: Active Grant

First Claim

Patent Images

1. A system for recognizing and evaluating possible relationships between terms expressed during cross-communication activities, the system comprising:

a memory;

a processor in signal communication with the memory;

a speech recognition system having a speech collection device arranged to receive an ambiguously delimited speech signal and then transcribe the speech signal to a first plurality of sequences of articulatory features related to a portion of the speech signal;

an ink segmentation and recognition system having an ink input receiving device configured to receive an ambiguously delimited digital ink input while the speech recognition system receives the speech portion, the ink segmentation and recognition system further configured to segment ink input that constitutes sketches from those that constitutes handwriting and then identify a second plurality of sequences of articulatory features related to the handwriting; and

a multimodal fusion engine in signal communication with the processor, the multimodal fusion engine having a search alignment system configured to substantially align the articulatory features derived from the ambiguously delimited speech signal and the ambiguously delimited ink input using a coherence measure across articulatory features representations of the ambiguous inputs as between a candidate portion of the speech signal a candidate portion of the ink input.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

New language constantly emerges from complex, collaborative human-human interactions like meetings—such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. The system and method described includes devices for receiving various types of human communication activities (e.g., speech, writing and gestures) presented in a multimodally redundant manner, includes processors and recognizers for segmenting or parsing, and then recognizing selected sub-word units such as phonemes and syllables, and then includes alignment, refinement, and integration modules to find or at least an approximate match to the one or more terms that were presented in the multimodally redundant manner. Once the system has performed a successful integration, one or more terms may be newly enrolled into a database of the system, which permits the system to continuously learn and provide an association for proper names, abbreviations, acronyms, symbols, and other forms of communicated language.

Citations

18 Claims

1. A system for recognizing and evaluating possible relationships between terms expressed during cross-communication activities, the system comprising:
- a memory;
  
  a processor in signal communication with the memory;
  
  a speech recognition system having a speech collection device arranged to receive an ambiguously delimited speech signal and then transcribe the speech signal to a first plurality of sequences of articulatory features related to a portion of the speech signal;
  
  an ink segmentation and recognition system having an ink input receiving device configured to receive an ambiguously delimited digital ink input while the speech recognition system receives the speech portion, the ink segmentation and recognition system further configured to segment ink input that constitutes sketches from those that constitutes handwriting and then identify a second plurality of sequences of articulatory features related to the handwriting; and
  
  a multimodal fusion engine in signal communication with the processor, the multimodal fusion engine having a search alignment system configured to substantially align the articulatory features derived from the ambiguously delimited speech signal and the ambiguously delimited ink input using a coherence measure across articulatory features representations of the ambiguous inputs as between a candidate portion of the speech signal a candidate portion of the ink input.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 18)
- - 2. The system of claim 1, wherein the speech collection device includes at least one microphone.
  - 3. The system of claim 1, wherein the alignment system aligns the articulatory features based on a temporal relationship that includes a multimodal redundant relationship having a detected temporal boundary.
  - 4. The system of claim 3, wherein the temporal boundary includes the speech portion and ink input being received by the system approximately close in time.
  - 5. The system of claim 3, wherein the temporal boundary includes the speech portion and ink input being received contemporaneously.
  - 6. The system of claim 1, wherein the ink input includes alphanumeric characters and non-alphanumeric symbols.
  - 7. The system of claim 6, wherein the non-alphanumeric symbols include Unicode symbols.
  - 8. The system of claim 1, wherein the alignment system includes a salience-weighted articulatory-feature comparison module for generating a table having pairs of hypothesized phonemes determined from at least one articulatory feature detected by the speech recognition system.
  - 9. The system of claim 1, wherein the ink input includes pictorial and graphical sketches and symbols.
  - 10. The system of claim 1, further comprising a refinement system in communication with the alignment system for dynamically modeling the articulatory features captured in the alignment matrix by identifying a desired path within the alignment matrix and then modifying the desired path based on temporal boundaries associated with the first and second communication modes.
  - 11. The system of claim 1, wherein the ink collection device includes a digitizing pen.
  - 12. The system of claim 1, further comprising a physical gesture capturing device having at least one sensor in communication with the system.
  - 17. The system of claim 10, further comprising an integration system in communication with the refinement system to select a desired term that is estimated to be a best-fit to the aligned articulatory features.
  - 18. The system of claim 17, wherein the integration system is configured to generate a normalized match score based on information received at least from the alignment system and the refinement system.

13. A method for recognizing and evaluating possible relationships between terms expressed during multiple communication modes, the method comprising:
- detecting at least two ambiguously delimited modes of communication selected from the group consisting of speech, handwriting, sketches, and physical gestures;
  
  receiving at least two of the ambiguously delimited modes of communication within a memory of a computational processing system;
  
  determining a time period between a first communication mode and a second communication mode to check for a multimodal redundancy;
  
  within the time period, aligning a plurality of articulatory features of the first communication mode with a plurality of articulatory features of the second communication mode using a coherence measure across the ambiguously delimited articulatory features of the first and second communication modes;
  
  generating a group of hypothesized redundant terms based on the time period and based on the plurality of articulatory features of the first and second communication modes;
  
  reducing a number of the hypothesized redundant terms to populate a matrix of possibly related sub-word units from which a best-fit term is to be selected; and
  
  determining the multimodal redundancy by selecting the best-fit term based at least in part on the coherence measure of the alignment of the first and second communication modes.
- View Dependent Claims (14, 15, 16)
- - 14. The method of claim 13, further comprising reducing the number of the hypothesized redundant terms through alignment, refinement, and integration processes.
  - 15. The method of claim 13, further comprising dynamically enrolling the best-fit term into a lexical database.
  - 16. The method of claim 13, wherein reducing the number of the hypothesized redundant terms includes generating a table of salience-weighted articulatory-features that ranks an important of the various articulatory features in comparing at least the first communication mode to the second communication mode.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Adapx Incorporated
Original Assignee
Adapx Incorporated
Inventors
Kaiser, Edward C.
Primary Examiner(s)
Godbold, Douglas

Application Number

US12/040,752
Publication Number

US 20080221893A1
Time in Patent Office

1,922 Days
Field of Search

704/231, 704/270
US Class Current

704/231
CPC Class Codes

G09B 19/04 Speaking with audible prese...

G10L 15/24 Speech recognition using no...

Systems and methods for implicitly interpreting semantically redundant communication modes

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for implicitly interpreting semantically redundant communication modes

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links