Multimodal speech recognition system

US 8,355,915 B2
Filed: 11/30/2007
Issued: 01/15/2013
Est. Priority Date: 11/30/2006
Status: Active Grant

First Claim

Patent Images

1. A multimodal system for receiving inputs via more than one mode from a user and for interpretation and display of text based upon the inputs received via the more than one modes, the system comprising:

a) a user input device having a plurality of modes, one mode accepting speech input and the remaining modes accepting entry of non-speech input;

b) a memory containing a plurality of acoustic networks, each of the plurality of acoustic networks being associated with at least one mode; and

c) a processor to;

i) process the speech input and at least one non-speech input accepted by at least one of the remaining modes;

ii) dynamically adapting an acoustic network based on the speech input and the at least one non-speech input;

iii) perform automatic speech recognition using the dynamically adapted acoustic network;

iv) determine an output based on the automatic speech recognition; and

v) return the output to aid in a determination of a subsequent user-action.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosure describes an overall system/method for text-input using a multimodal interface with speech recognition. Specifically, pluralities of modes interact with the main speech mode to provide the speech-recognition system with partial knowledge of the text corresponding to the spoken utterance forming the input to the speech recognition system. The knowledge from other modes is used to dynamically change the ASR system'"'"'s active vocabulary thereby significantly increasing recognition accuracy and significantly reducing processing requirements. Additionally, the speech recognition system is configured using three different system configurations (always listening, partially listening, and push-to-speak) and for each one of those three different user-interfaces are proposed (speak-and-type, type-and-speak, and speak-while-typing). Finally, the overall user-interface of the proposed system is designed such that it enhances existing standard text-input methods; thereby minimizing the behavior change for mobile users.

45 Citations

View as Search Results

19 Claims

1. A multimodal system for receiving inputs via more than one mode from a user and for interpretation and display of text based upon the inputs received via the more than one modes, the system comprising:
- a) a user input device having a plurality of modes, one mode accepting speech input and the remaining modes accepting entry of non-speech input;
  
  b) a memory containing a plurality of acoustic networks, each of the plurality of acoustic networks being associated with at least one mode; and
  
  c) a processor to;
  
  i) process the speech input and at least one non-speech input accepted by at least one of the remaining modes;
  
  ii) dynamically adapting an acoustic network based on the speech input and the at least one non-speech input;
  
  iii) perform automatic speech recognition using the dynamically adapted acoustic network;
  
  iv) determine an output based on the automatic speech recognition; and
  
  v) return the output to aid in a determination of a subsequent user-action.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The system of claim 1, wherein the dynamically adapted acoustic network is compiled during system design.
  - 3. The system of claim 1, wherein the system is selectively configured to receive speech input prior to receiving non-speech input.
  - 4. The system of claim 3, further comprising a user-interface for selecting a configuration out of a set of configurations comprising:
    - a) a Push-to-Speak Configuration, wherein the system waits for a signal before beginning processing speech, the signal resulting from a manual push-to-speak button;
      
      b) a Partially Listening Configuration, wherein the system begins processing speech based on a user-implied signal, the user-implied signal resulting from pressing a space-bar; and
      
      c) an Always Listening Configuration, wherein the system simultaneously processes speech input along with at least one non-speech input.
  - 5. The system of claim 1, wherein the speech input is stored in memory for subsequent error correction.
  - 6. The system of claim 1, wherein the user input device comprises one from a set including a mobile device, a mobile phone, a smartphone, an Ultra Mobile PC, a Laptop, and a Palmtop.
  - 7. The system of claim 1 wherein the system automatically defaults to a pure text prediction system in the event that the speech input from the user is unusable.
  - 8. The system of claim 1, wherein the system includes an option that configures the system as a hands-free system by using the speech input for non-speech input.
  - 9. The system of claim 1, wherein the system is hardware platform independent.
  - 10. The system of claim 1, wherein the dynamically adapted acoustic network is compiled at system start-up.
  - 11. The system of claim 1, wherein the dynamically adapted acoustic network is compiled by the system during run-time.
  - 12. The system of claim 1, wherein the system is selectively configured to receive speech input after receiving at least one non-speech input.
  - 13. The system of claim 1, wherein the system is selectively configured to receive speech input while receiving at least one non-speech input.
  - 14. The system of claim 1, wherein the non-speech input includes a visual input.
  - 15. The system of claim 1, wherein the non-speech input includes a character input.
  - 16. The system of claim 1, wherein the mechanism to feed back the output comprises a visual display.
  - 17. The system of claim 1, wherein the mechanism to feed back the output comprises an audible speaker.
  - 18. The system of claim 1, wherein the system is configured to automatically switch between a text-prediction mode and a speech-recognition mode.
  - 19. The system of claim 1, wherein the system is software/operating system independent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ashwin P. Rao
Original Assignee
Ashwin P. Rao
Inventors
Rao, Ashwin P.
Primary Examiner(s)
YEN, ERIC L

Application Number

US11/948,757
Publication Number

US 20080133228A1
Time in Patent Office

1,873 Days
Field of Search

704270-275, 704/243, 704/244, 704/255, 704/256, 704/270.1
US Class Current

704/244
CPC Class Codes

G10L 15/24 Speech recognition using no...

G10L 15/32 Multiple recognisers used i...

Multimodal speech recognition system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Multimodal speech recognition system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others