Multimodal speech recognition system

US 20080133228A1
Filed: 11/30/2007
Published: 06/05/2008
Est. Priority Date: 11/30/2006
Status: Active Grant

First Claim

Patent Images

1. A multimodal system for receiving inputs via more than one mode from a user and for interpretation and display of the text intended by the inputs received through those multiple modes, the system comprising:

a) a user input device having a plurality of modes with one mode for accessing speech and each of the remaining modes being associated with an input method for entering letters or characters or words or phrases or sentences or graphics or image or any combination;

b) a memory containing a plurality of acoustic networks, each of the plurality of acoustic networks associated with one or more inputs from the one or more modes;

c) a mechanism to feed back the system output to a user of the input device, in the form of a visual display or an audible speaker; and

d) a processor to;

i) process the inputs received via the multiple modes and select and/or build an appropriate acoustic network;

ii) perform automatic speech recognition using the selected/built acoustic network; and

iii) return the recognition outputs to the user for subsequent user-actions.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosure describes an overall system/method for text-input using a multimodal interface with speech recognition. Specifically, pluralities of modes interact with the main speech mode to provide the speech-recognition system with partial knowledge of the text corresponding to the spoken utterance forming the input to the speech recognition system. The knowledge from other modes is used to dynamically change the ASR system'"'"'s active vocabulary thereby significantly increasing recognition accuracy and significantly reducing processing requirements. Additionally, the speech recognition system is configured using three different system configurations (always listening, partially listening, and push-to-speak) and for each one of those three different user-interfaces are proposed (speak-and-type, type-and-speak, and speak-while-typing). Finally, the overall user-interface of the proposed system is designed such that it enhances existing standard text-input methods; thereby minimizing the behavior change for mobile users.

Citations

9 Claims

1. A multimodal system for receiving inputs via more than one mode from a user and for interpretation and display of the text intended by the inputs received through those multiple modes, the system comprising:
- a) a user input device having a plurality of modes with one mode for accessing speech and each of the remaining modes being associated with an input method for entering letters or characters or words or phrases or sentences or graphics or image or any combination;
  
  b) a memory containing a plurality of acoustic networks, each of the plurality of acoustic networks associated with one or more inputs from the one or more modes;
  
  c) a mechanism to feed back the system output to a user of the input device, in the form of a visual display or an audible speaker; and
  
  d) a processor to;
  
  i) process the inputs received via the multiple modes and select and/or build an appropriate acoustic network;
  
  ii) perform automatic speech recognition using the selected/built acoustic network; and
  
  iii) return the recognition outputs to the user for subsequent user-actions.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein one or more acoustic networks are compiled during system design and/or compiled at system start-up and/or compiled by the system during run-time or a combination.
  - 3. The system of claim 1, wherein the mode for accessing speech from the user has different types of use-interfaces, such that the user could selectively:
    - a) Speak before providing letter or character or any other visual inputs;
      
      b) Speak after providing letter or character or any other visual inputs; and
      
      /orc) Speak while providing letter or character or any other visual inputs.
  - 4. The system of claim 3, wherein each of the user-interface accessing speech from the user is designed using three different configurations including:
    - a) Push-to-Speak Configuration;
      
      wherein the system waits for a signal to begin processing speech as in manual push-to-speak button;
      
      b) Partially Listening Configuration;
      
      wherein the system begins processing speech based on a user-implied signal as in space-bar to begin processing and typing a letter to end processing; and
      
      c) Always Listening Configuration;
      
      wherein the system is simultaneously processing speech along with other inputs from other modes.
  - 5. The system of claim 1, wherein the speech input or its equivalent signal representation as in features is stored in memory for subsequent speech recognition or text processing enabling users to speak only once and be able to do subsequent error correction.
  - 6. The system of claim 1, wherein the user input device is a mobile device, a mobile phone, a smartphone, an Ultra Mobile PC, a Laptop, and/or a Palmtop.
  - 7. The system of claim 1 wherein the system automatically or manually defaults to a pure text prediction system in the event that the speech input from the user is either absent or is unreliable and the system has the ability to automatically switch back-and-forth between text-prediction and speech-recognition based on the design of the system set by the system designer or end-user(s).
  - 8. The system of claim 1 wherein the system includes an option to be configured as a hands-free system by using the speech mode for entering letters or characters or words or phrases or sentences or graphics or image or a combination.
  - 9. The system of claim 1, wherein the system is independent of the underlying hard-ware platform and/or the underlying software/operating system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ashwin P. Rao
Original Assignee
Ashwin P. Rao
Inventors
Rao, Ashwin P.

Granted Patent

US 8,355,915 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/24 Speech recognition using no...

G10L 15/32 Multiple recognisers used i...

Multimodal speech recognition system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Multimodal speech recognition system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links