Speak and touch auto correction interface

US 9,830,912 B2
Filed: 03/15/2013
Issued: 11/28/2017
Est. Priority Date: 11/30/2006
Status: Active Grant

First Claim

Patent Images

1. A speak and touch auto correction interface, comprising:

a key-input module configured to collect one or more key inputs entered by a user using a hardware input mechanism associated with a device, wherein a key input is associated with one of a plurality of classification types and wherein one classification type includes an end of utterance indicator that is not a letter;

an audio-input module configured to collect, in parallel with the key inputs, one or more speech samples spoken by the user using a hardware audio unit associated with the device;

a multimodal module that dynamically reduces a vocabulary as key inputs are entered, but waits for performing speech recognition until the end of word indicator is entered, the vocabulary is dynamically reduced based on the key inputs and on the classification type associated with the key inputs, and, upon receiving the end of word indicator, the multimodal module obtains and stores the key inputs and an utterance detected segment from the one or more speech samples, wherein the utterance detected segment comprises speech samples spoken by the user corresponding to the word, and performs speech recognition when the end of word indicator is received on the utterance detected segment using a current state of the dynamically reduced vocabulary, wherein if the key inputs includes at least one letter, the multimodal module applies an ambiguity filter when dynamically reducing the vocabulary, the ambiguity filter reflecting ambiguities caused by one or more potential typing errors, and wherein if the key inputs includes a symbol, the multimodal module dynamically activates an associated vocabulary associated with the symbol, the associated vocabulary becomes the vocabulary that undergoes dynamic reduction, the symbol is not letters and is not the end of word indicator.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosure describes an overall system/method for developing a “speak and touch auto correction interface” referred to as STACI which is far more superior to existing user interfaces including the widely adopted qwerty. Using STACI a user speaks and types a word at the same time. The redundant information from the two modes, namely speech and the letters typed, enables the user to sloppily and partially type the words. The result is a very fast and accurate enhanced keyboard interface enabling document production on computing devices like phones and tablets.

Citations

22 Claims

1. A speak and touch auto correction interface, comprising:
- a key-input module configured to collect one or more key inputs entered by a user using a hardware input mechanism associated with a device, wherein a key input is associated with one of a plurality of classification types and wherein one classification type includes an end of utterance indicator that is not a letter;
  
  an audio-input module configured to collect, in parallel with the key inputs, one or more speech samples spoken by the user using a hardware audio unit associated with the device;
  
  a multimodal module that dynamically reduces a vocabulary as key inputs are entered, but waits for performing speech recognition until the end of word indicator is entered, the vocabulary is dynamically reduced based on the key inputs and on the classification type associated with the key inputs, and, upon receiving the end of word indicator, the multimodal module obtains and stores the key inputs and an utterance detected segment from the one or more speech samples, wherein the utterance detected segment comprises speech samples spoken by the user corresponding to the word, and performs speech recognition when the end of word indicator is received on the utterance detected segment using a current state of the dynamically reduced vocabulary, wherein if the key inputs includes at least one letter, the multimodal module applies an ambiguity filter when dynamically reducing the vocabulary, the ambiguity filter reflecting ambiguities caused by one or more potential typing errors, and wherein if the key inputs includes a symbol, the multimodal module dynamically activates an associated vocabulary associated with the symbol, the associated vocabulary becomes the vocabulary that undergoes dynamic reduction, the symbol is not letters and is not the end of word indicator.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 19, 20)
- - 2. The speak and touch auto correction interface of claim 1, wherein the at least one letter includes one letter and the one letter is applied as a filter on entries on the vocabulary thereby reducing the vocabulary.
  - 3. The speak and touch auto correction interface of claim 1, wherein the at least one letter includes a letter string and the letter string is applied as a filter on entries on the vocabulary thereby reducing the vocabulary.
  - 4. The speak and touch auto correction interface of claim 1, wherein dynamically reducing the vocabulary includes applying constraints on entries within the vocabulary to reduce the vocabulary.
  - 5. The speak and touch auto correction interface of claim 1, wherein dynamically reducing the vocabulary further comprises applying a multimodal language model to further filter entries in the vocabulary.
  - 6. The speak and touch auto correction interface of claim 1, wherein the vocabulary includes a dictionary having words and corresponding pronunciations.
  - 7. The speak and touch auto correction interface of claim 1, wherein the multimodal module further identifies a best choice along with a plurality of alternative best choices.
  - 8. The speak and touch auto correction interface of claim 7, wherein the multimodal module further determines a recognition score calculated using an acoustic score and a multimodal language model score for the best choice and the plurality of alternative best choices.
  - 19. The speak and touch auto correction interface of claim 1, wherein the end of word indicator is entered by the user after speaking an utterance corresponding to the word.
  - 20. The speak and touch auto correction interface of claim 1, wherein the end of word indicator is entered by the user prior to the user finishing speaking an utterance corresponding to the word.

9. A computing device configured to provide a multimodal interface for entering input, the computing device comprising:
- a hardware input mechanism;
  
  a hardware audio unit;
  
  a computer storage medium storing computer-readable components comprising computer-readable instructions;
  
  a processor configured to execute the computer-readable instructions, the computer-readable components comprising;
  
  a key-input module configured to collect one or more key inputs entered by a user using the hardware input mechanism;
  
  an audio-input module configured to collect, in parallel with the key inputs, one or more speech samples spoken by the user using the hardware audio unit;
  
  a multimodal module that dynamically reduces a vocabulary as key inputs are entered, but waits for performing speech recognition until an end of utterance indicator is entered, the vocabulary is dynamically reduced based on the key inputs and on a classification type associated with the key inputs, wherein one classification type includes the end of word indicator that is not a letter, and, upon receiving the end of word indicator, obtains and stores the key inputs and an utterance detected segment from the one or more speech samples, wherein the utterance detected segment comprises speech samples spoken by the user corresponding to the word, and performs speech recognition when the end of word indicator is received on the utterance detected segment using a current state of the dynamically reduced vocabulary.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 21, 22)
- - 10. The computing device of claim 9, wherein if the key inputs includes at least one letter, the multimodal module applies an ambiguity filter when dynamically reducing the vocabulary, the ambiguity filter reflecting ambiguities caused by one or more potential typing errors.
  - 11. The computing device of claim 10, wherein the at least one letter includes one letter and the one letter is applied as a filter on entries on the vocabulary thereby reducing the vocabulary.
  - 12. The computing device of claim 10, where the key inputs includes a letter string and the letter string is applied as a filter on entries on the vocabulary thereby reducing the vocabulary.
  - 13. The computing device of claim 10, wherein dynamically reducing the vocabulary includes applying constraints on entries within the vocabulary to reduce the vocabulary.
  - 14. The computing device of claim 10, wherein dynamically reducing the vocabulary further comprises applying a multimodal language model to further filter entries in the vocabulary.
  - 15. The computing device of claim 10, wherein the vocabulary includes a dictionary having words and corresponding pronunciations.
  - 16. The computing device of claim 9, wherein if the classification type for the key inputs includes an object, a vocabulary associated with the object is dynamically activated as the vocabulary.
  - 17. The computing device of claim 9, wherein the multimodal module further identifies a best choice along with a plurality of alternative best choices.
  - 18. The computing device of claim 17, wherein the multimodal module further determines a recognition score calculated using an acoustic score and a multimodal language model score for the best choice and the plurality of alternative best choices.
  - 21. The computing device of claim 9, wherein the end of word indicator is entered by the user after speaking an utterance corresponding to the word.
  - 22. The computing device of claim 9, wherein the end of word indicator is entered by the user prior to the user finishing speaking an utterance corresponding to the word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ashwin P. Rao
Original Assignee
Ashwin P. Rao
Inventors
Rao, Ashwin P
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Tzeng, Forrest F

Application Number

US13/844,490
Publication Number

US 20130289993A1
Time in Patent Office

1,719 Days
Field of Search

704E1504
US Class Current
CPC Class Codes

G06F 3/023   Arrangements for converting...

G06F 3/0233   Character input methods

G06F 3/0235   using chord techniques G06F...

G06F 3/0237   using prediction or retriev...

G06F 40/274   Converting codes to words; ...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/32   Multiple recognisers used i...

Speak and touch auto correction interface

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Speak and touch auto correction interface

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links