Multimodal natural language query system for processing and analyzing voice and proximity-based queries
First Claim
1. A query system, comprising:
- a computing device communicatively coupled to a network and configured to receive a plurality of inputs;
wherein at least one of said inputs is an audio query and another of said inputs is a visual input;
a speech server communicatively coupled to the computing device via the network, wherein the speech server is configured to receive the audio query from the computing device, convert the audio query into a text query and return the text query to said computing device;
a mediated reality server communicatively coupled to the computing device via the network, wherein the mediated reality server is configured to receive and process the visual input to determine data relevant to said visual input and return said data relevant to said visual input to said computing device; and
a remote server communicatively coupled to the computing device via the network, wherein the remote server is configured to receive said text query and at least a portion of said data relevant to said visual input and based on the received text query and the at least a portion of said data relevant to said visual input determine a response to the query.
3 Assignments
0 Petitions
Accused Products
Abstract
The disclosure provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for receiving location/proximity information from a location/proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database.
-
Citations
13 Claims
-
1. A query system, comprising:
-
a computing device communicatively coupled to a network and configured to receive a plurality of inputs;
wherein at least one of said inputs is an audio query and another of said inputs is a visual input;a speech server communicatively coupled to the computing device via the network, wherein the speech server is configured to receive the audio query from the computing device, convert the audio query into a text query and return the text query to said computing device; a mediated reality server communicatively coupled to the computing device via the network, wherein the mediated reality server is configured to receive and process the visual input to determine data relevant to said visual input and return said data relevant to said visual input to said computing device; and a remote server communicatively coupled to the computing device via the network, wherein the remote server is configured to receive said text query and at least a portion of said data relevant to said visual input and based on the received text query and the at least a portion of said data relevant to said visual input determine a response to the query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A mobile device query method, comprising:
-
receiving an audio query from a user and a visual input from a visual input device; transmitting said visual input to a mediated reality server; said mediated reality server processing said visual input to derive data relevant to said visual input and return said data relevant to said visual input to said mobile device; transmitting the audio query and the data relevant to said visual input to a remote server; and receiving a plurality of responses to the audio query from the server, said responses being based at least in part on said data relevant to said visual input, and each of the plurality of responses being ranked by the server using an accuracy algorithm. - View Dependent Claims (11, 12, 13)
-
Specification