Multimodal text input by a keyboard/camera text input module replacing a conventional keyboard text input module on a mobile device

US 9,811,171 B2
Filed: 03/05/2013
Issued: 11/07/2017
Est. Priority Date: 03/06/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A method of multimodal text input in a mobile device, the method comprising:

using an original communication interface between an original keyboard module of the mobile device and a third party application to enable communication between a multimodal input module, that replaces the original keyboard module, and the third party application by;

executing the multimodal input module by;

steadily running the multimodal input module in the background of the mobile device and constantly monitoring in the background of the mobile device to detect when a text input field of the third party application is activated; and

responding to detecting that the text input field of the third party application is activated by;

activating a keyboard mode;

displaying an A-Z-keyboard in a first field of a display for text input;

automatically activating a camera mode when the keyboard mode is activated;

capturing an image of written text having characters different from characters of the A-Z-keyboard, reducing a size of the A-Z-keyboard, displaying the A-Z-keyboard reduced in a reduced first field, and displaying the captured image with the written text in a second field of the display of the mobile device, the reduced first field and the second field together occupying a same field size as the first field;

converting the captured image to character text by optical character recognition (OCR) and displaying the recognized character text on the display; and

outputting a selected part of the recognized character text as the input text to the third party application receiving the input text upon a selection of the part of the recognized character text, wherein the outputting to the third party application from the multimodal input module is via the original communication interface to the third party application as between the original keyboard module and the third party application, and wherein the multimodal input module is configured to enable the respective selection to take place by a single keypress or control command, or by a single gesture.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and modules for a multimodal text input in a mobile device are provided. Text may be input via keyboard or camera mode by holding the camera over written text. An image is taken of the written text, text is recognized, and output to an application by: activating a keyboard mode; providing an A-Z-keyboard in a first input field; activating the camera mode; capturing the text image and displaying the captured image in a second field of a device display; converting the captured image to character text by OCR and displaying the recognized character text on the display; outputting a selected character as the input text to the application upon a character selection, or outputting a selected part of the recognized character text as the input text to the application upon a selection of the part of the recognized character text via by a single keypress, control command, or gesture.

Citations

16 Claims

1. A method of multimodal text input in a mobile device, the method comprising:
- using an original communication interface between an original keyboard module of the mobile device and a third party application to enable communication between a multimodal input module, that replaces the original keyboard module, and the third party application by;
  
  executing the multimodal input module by;
  
  steadily running the multimodal input module in the background of the mobile device and constantly monitoring in the background of the mobile device to detect when a text input field of the third party application is activated; and
  
  responding to detecting that the text input field of the third party application is activated by;
  
  activating a keyboard mode;
  
  displaying an A-Z-keyboard in a first field of a display for text input;
  
  automatically activating a camera mode when the keyboard mode is activated;
  
  capturing an image of written text having characters different from characters of the A-Z-keyboard, reducing a size of the A-Z-keyboard, displaying the A-Z-keyboard reduced in a reduced first field, and displaying the captured image with the written text in a second field of the display of the mobile device, the reduced first field and the second field together occupying a same field size as the first field;
  
  converting the captured image to character text by optical character recognition (OCR) and displaying the recognized character text on the display; and
  
  outputting a selected part of the recognized character text as the input text to the third party application receiving the input text upon a selection of the part of the recognized character text, wherein the outputting to the third party application from the multimodal input module is via the original communication interface to the third party application as between the original keyboard module and the third party application, and wherein the multimodal input module is configured to enable the respective selection to take place by a single keypress or control command, or by a single gesture.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, wherein the recognized character text is displayed on the display either in a separate third field or as an overlay over the text within the captured image displayed in the second field.
  - 3. The method according to claim 1, wherein converting the captured image to character text by optical character recognition and displaying the recognized character text on the display comprises:
    - determining one or more suggestion candidates using an algorithm in connection with a database; and
      
      displaying the one or more suggestion candidates in one or more third fields or as an overlay within the second field, wherein one or more of the one or more candidates are selectable by a keypress event.
  - 4. The method according to claim 1, wherein the single keypress, control command, or single gesture is at least one of:
    - a mechanical keypress, a touch keypress or a swipe gesture on a visible key or a hidden key, within one of the fields of the display, on the part of the recognized text or on another text on the display, for a certain selection.
  - 5. The method according to claim 1, wherein in the activated camera mode the second field is displayed adjacent to the keyboard.
  - 6. The method according to claim 1, wherein capturing the image, displaying the captured image, converting the captured image, displaying the recognized character text, and outputting are executed repetitively, and wherein a respective latest recognized text in a corresponding respective latest captured image is analyzed for new text in regard to previously recognized character text outputted to the third party application receiving the outputted text, whereupon the control command is generated for the selection of the new text as the part of the recognized character text in response to a certain keypress being detected.
  - 7. The method according to claim 1, wherein the control command for the selection of the part of the recognized character text is generated automatically via a detection algorithm, the detection algorithm recognizing the part of the recognized character text as being the same compared to recognized character text in a previous image.
  - 8. The method according to claim 1, wherein the multimodal input module comprises two sub-modules:
    - a first sub-module being a keyboard text input sub-module configured to execute the keyboard mode;
      
      a second sub-module being a camera text input sub-module configured to execute the camera mode;
      
      wherein the execution of the keyboard text input sub-module is independent from the execution of the camera text input sub-module, but wherein the execution of the camera text input sub-module is dependent on the execution of the keyboard text input sub-module;
      
      wherein if the camera mode is activated, at least the second field and the recognized text are visible on the display and selectable; and
      
      wherein if the keyboard text input sub-module is no longer detected to be activated, the camera mode is also deactivated.

9. A mobile device arranged to facilitate multimodal text input, the mobile device comprising:
- a display;
  
  a camera having a camera mode; and
  
  a processor in communication with the display and the camera, the processor implementing a multimodal input module that uses an original communication interface between an original keyboard module of the mobile device and a third party application to enable communication between the multimodal input module, that replaces the original keyboard module, and the third party application by;
  
  executing the multimodal input module by;
  
  steadily running the multimodal input module in the background of the mobile device and constantly monitoring in the background of the mobile device to detect when a text input field of the third party application is activated; and
  
  responding to detecting that the text input field of the third party application is activated by;
  
  activating a keyboard mode and displaying a keyboard in a first field of the display;
  
  automatically activating a camera mode when the keyboard mode is activated;
  
  detecting the camera being held over written text, such that an image is taken of the written text having characters different from characters of the keyboard;
  
  reducing a size of the keyboard and displaying the keyboard reduced in a reduced first field;
  
  displaying the image with the written text in a second field of the display, the reduced first field and the second field together occupying a same field size as the first field;
  
  converting the image to character text by optical character;
  
  recognition (OCR);
  
  causing the recognized character text to be displayed on the display; and
  
  outputting a selected part of the recognized character text as the input text to the third party application receiving the input text upon a selection of the part of the recognized character text, wherein the output to the third party application from the multimodal input module is via the original communication interface to the third party application as between the original keyboard module and the third party application, and wherein the multimodal input module is configured to enable the respective selection to take place by a single keypress or control command, or by a single gesture.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The mobile device as in claim 9 wherein the multimodal keyboard module is compatible with one or more standard applications, the one or more standard applications including a phone application or an internet search application, running on the mobile device and requiring the text input.
  - 11. The mobile device according to claim 9, wherein the keyboard and the camera mode are both active at the same time.
  - 12. The mobile device according to claim 11, wherein the multimodal input module is further configured to:
    - enable by a single keypress or control command, or by a single gesture;
      
      the selection of the part of the character text; and
      
      the outputting of the input text to the third party application.
  - 13. The mobile device according to claim 12, wherein the single keypress or control command, or the single gesture include at least one of:
    - a mechanical keypress, a touch keypress or a swipe gesture on a visible key or a hidden key, within one of the fields of the display, on the part of the recognized text or on another text on the display, for a certain selection.
  - 14. The mobile device according to claim 9, wherein the multimodal input module comprises a first sub-module and a second sub-module:
    - I) the first sub-module being a keyboard sub-module being invoked by the processor in response to a request for the text input; and
      
      II) the second sub-module being a camera sub-module, the second sub-module being dependent on the first sub-module.
  - 15. The mobile device according to claim 14, wherein the multimodal input module is configured to:
    - activate the camera mode in response to the second sub-module being activated or the camera mode being displayed such that the second field is displayed adjacent to the keyboard; and
      
      close the camera mode such that the camera mode is only displayed when the second sub-module is activated.

16. A computer program product for performing multimodal text input in a mobile device, the computer program product comprising:
- one or more non-transitory computer-readable tangible storage devices and program instructions stored on at least one of the one or more tangible storage devices, the program instructions, when loaded and executed by a processor, cause the mobile device associated with the processor to implement a multimodal input module that uses an original communication interface between an original keyboard module of the mobile device and a third party application to enable communication between the multimodal input module, that replaces the original keyboard module, and the third party application by;
  
  executing the multimodal input module by;
  
  steadily running the multimodal input module in the background of the mobile device and constantly monitoring in the background of the mobile device to detect when a text input field of the third party application is activated; and
  
  responding to detecting that the text input field of the third party application is activated by;
  
  activating a keyboard mode;
  
  displaying an A-Z keyboard in a first field of a display for text input;
  
  automatically activating a camera mode when the keyboard mode is activated;
  
  capturing an image of written text having characters different from characters of the A-Z-keyboard, reducing a size of the A-Z keyboard, displaying the A-Z-keyboard reduced in a reduced first field, and displaying the captured image with the written text in a second field of the display, the reduced first field and the second field together occupying a same field size as the first field;
  
  converting the captured image to character text by optical character recognition (OCR) and displaying the recognized character text on the display; and
  
  outputting a selected part of the recognized character text as the input text to the third party application receiving the input text upon a selection of the part of the recognized character text, wherein the outputting to the third party application from the multimodal input module is via the original communication interface to the third party application as between the original keyboard module and the third party application, and wherein the multimodal input module is configured to enable the respective selection to take place by a single keypress or control command, or by a single gesture.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cuneyt Goktekin, Kofax Incorporated
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Gktekin, Cneyt
Primary Examiner(s)
Mehmood, Jennifer
Assistant Examiner(s)
Reed, Stephen T

Application Number

US13/786,321
Publication Number

US 20130234945A1
Time in Patent Office

1,708 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 1/1686   the I/O peripheral being an...

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/005   Input arrangements through ...

G06F 3/013   Eye tracking input arrangem...

G06F 3/02   Input arrangements using ma...

G06F 3/0227   Cooperation and interconnec...

G06F 3/0237   using prediction or retriev...

G06F 3/04883   for inputting data by handw...

G06F 3/04886   by partitioning the display...

G06F 40/274   Converting codes to words; ...

G06F 40/58   Use of machine translation,...

G06V 20/63   Scene text, e.g. street names

G06V 30/10   Character recognition

G06V 30/224   of printed characters havin...

G09G 5/00   Control arrangements or cir...

Multimodal text input by a keyboard/camera text input module replacing a conventional keyboard text input module on a mobile device

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Multimodal text input by a keyboard/camera text input module replacing a conventional keyboard text input module on a mobile device

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links