Systems and methods for extracting meaning from multimodal inputs using finite-state devices

US 8,214,212 B2
Filed: 11/08/2011
Issued: 07/03/2012
Est. Priority Date: 07/12/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A method of recognizing an utterance comprising:

receiving an utterance comprising a first portion having a first mode and a second portion having a second mode;

generating, at a multimodal recognition system comprising a processor, a first mode recognition lattice and a first finite-state transducer based on the first mode; and

generating a recognition model comprising;

relating the first portion of the utterance to the second portion of the utterance using the first finite-state transducer; and

generating a second finite-state transducer, the second finite-state transducer comprising a gesture and speech recognition model finite-state transducer, based on the first mode recognition lattice and the first finite-state transducer.

View all claims

17 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Citations

18 Claims

1. A method of recognizing an utterance comprising:
- receiving an utterance comprising a first portion having a first mode and a second portion having a second mode;
  
  generating, at a multimodal recognition system comprising a processor, a first mode recognition lattice and a first finite-state transducer based on the first mode; and
  
  generating a recognition model comprising;
  
  relating the first portion of the utterance to the second portion of the utterance using the first finite-state transducer; and
  
  generating a second finite-state transducer, the second finite-state transducer comprising a gesture and speech recognition model finite-state transducer, based on the first mode recognition lattice and the first finite-state transducer.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, further comprising:
    - recognizing the second mode using the recognition model.
  - 3. The method of claim 1 wherein the first mode is a gesture mode.
  - 4. The method of claim 1 wherein the second mode is a speech mode.
  - 5. The method of claim 1 further comprising:
    - generating the first mode recognition lattice based on a first mode feature lattice.

6. A method of recognizing an utterance comprising:
- receiving an utterance comprising a plurality of modes; and
  
  generating, at a multimodal recognition system comprising a processor, a recognition model comprising;
  
  relating a first portion of the utterance comprising a first mode to a second portion of the utterance comprising a second mode using a first finite-state transducer; and
  
  generating a second finite-state transducer, the second finite-state transducer comprising a gesture and speech recognition model finite-state transducer, based on a first mode recognition lattice and the first finite-state transducer.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The method of claim 6 further comprising:
    - generating the first mode recognition lattice and the first finite-state transducer based on the first mode.
  - 8. The method of claim 7 further comprising:
    - recognizing the second mode using the recognition model.
  - 9. The method of claim 7 wherein the first mode is a gesture mode.
  - 10. The method of claim 7 wherein the second mode is a speech mode.
  - 11. The method of claim 7 further comprising:
    - generating the first mode recognition lattice based on a first mode feature lattice.
  - 12. The method of claim 11 further comprising:
    - generating the first mode feature lattice based on the utterance.

13. A method of extracting meaning from multimodal inputs comprising:
- receiving a multimodal input;
  
  generating, at a multimodal recognition system comprising a processor, a recognition model comprising;
  
  relating a first portion of the multimodal input comprising a first mode to a second portion of the multimodal input comprising a second mode using a first finite-state transducer; and
  
  generating a second finite-state transducer, the second finite-state transducer comprising a gesture and speech recognition model finite-state transducer, based on a first mode recognition lattice and the first finite-state transducer.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13 further comprising:
    - generating the first mode recognition lattice and the first finite-state transducer based on the first mode.
  - 15. The method of claim 14 wherein the first mode is a gesture mode.
  - 16. The method of claim 14 wherein the second mode is a speech mode.
  - 17. The method of claim 14 further comprising:
    - generating the first mode recognition lattice based on a first mode feature lattice.
  - 18. The method of claim 17 further comprising:
    - generating the first mode feature lattice based on the multimodal input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interactions, LLC
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Bangalore, Srinivas, Johnston, Michael J.
Primary Examiner(s)
Colucci, Michael

Application Number

US13/291,427
Publication Number

US 20120116768A1
Time in Patent Office

238 Days
Field of Search

704/9, 704/251, 704/256, 704/243, 704/255, 704/270, 706/11, 715/809, 715/863
US Class Current

704/249
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G06V 40/28   Recognition of hand or arm ...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/24   Speech recognition using no...

Systems and methods for extracting meaning from multimodal inputs using finite-state devices

First Claim

17 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for extracting meaning from multimodal inputs using finite-state devices

First Claim

17 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links