Providing Information Services Related to Multimodal Inputs
First Claim
Patent Images
1. A method comprising:
- receiving at a system server a first input comprising visual information from a client device, wherein the visual information comprises a plurality of character sequence groups;
receiving at the system server a second input comprising audio information from the client device, wherein the audio information comprises vocals;
extracting the plurality of character sequence groups from the visual information using an optical character recognition engine;
converting the vocals to text using a speech recognition engine; and
generating a plurality of contexts wherein a first context comprises a first character sequence group from the plurality of character sequence groups and a first portion of the text.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method provides information services related to multimodal inputs. Several different types of data used as multimodal inputs are described. Also described are various methods involving the generation of contexts using multimodal inputs, synthesizing context-information service mappings and identifying and providing information services.
-
Citations
8 Claims
-
1. A method comprising:
-
receiving at a system server a first input comprising visual information from a client device, wherein the visual information comprises a plurality of character sequence groups; receiving at the system server a second input comprising audio information from the client device, wherein the audio information comprises vocals; extracting the plurality of character sequence groups from the visual information using an optical character recognition engine; converting the vocals to text using a speech recognition engine; and generating a plurality of contexts wherein a first context comprises a first character sequence group from the plurality of character sequence groups and a first portion of the text. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
receiving at a system server a first input comprising visual information from a client device, wherein the visual information comprises a plurality of character sequence groups; receiving at the system server a second input comprising audio information from the client device, wherein the audio information comprises vocals; extracting the plurality of character sequence groups from the visual information using an optical character recognition engine; converting the vocals to text using a speech recognition engine; generating a plurality of contexts wherein a first context comprises a first character sequence group from the plurality of character sequence groups and a first portion of the text; identifying a first context from the plurality of contexts based on the first input; identifying a second context from the plurality of contexts based on the second input; and querying a database using the first and second contexts to generate a first list comprising at least one information service.
-
Specification