Intuitive computing methods and systems
First Claim
1. A method employing a device equipped with a processor, a display, a camera and a microphone, the camera capturing imagery depicting plural items in a user'"'"'s physical environment, the method comprising the acts:
- capturing first speech of the user, with the device microphone;
the device processor detecting that the captured first speech includes a cueing expression, and in response to detection of the cueing expression, the device switching from a lower activity state to a heightened alert state, in the heightened alert state the device performing functionality including;
capturing second user speech with the device microphone;
sending data corresponding to the second user speech to a recognition module, and receiving recognized second speech data in return, the recognized second user speech indicating one of said plural items depicted in the captured imagery as of particular user interest;
based on one or more descriptors included in the recognized second speech data, determining a first of said plural depicted items as being of likely user interest;
presenting a marking on the device display, at a location indicating said first item;
capturing third user speech with the device microphone, the captured third user speech being different than the second user speech;
sending data corresponding to the third user speech to the recognition module, and receiving recognized third speech data in return, the recognized third speech data again indicating one of said plural items as of particular user interest;
based on one or more descriptors included in the recognized third speech data, determining that a second, different one of said plural depicted items is of greater interest to the user than the first item;
moving said marking on the device display to a location indicating said second item; and
taking an action based on the second item, said action including presenting information related to the second item to the user;
wherein the device is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts, and the descriptors in the recognized second and third speech data iteratively guide the device in identifying which of the plural items in the user'"'"'s physical environment is of user interest, thereby further bounding the device'"'"'s processing efforts.
1 Assignment
0 Petitions
Accused Products
Abstract
A system senses audio, imagery, and/or other stimulus from a user'"'"'s environment, and responds to fulfill user desires. In one particular arrangement, a discovery session is launched when the user speaks a cueing expression, which serves to switch the system from a lower activity state to a heightened alert state. The system may recognize that the speech expresses a user request that requires analysis of camera-captured imagery to fulfill. In response the system can apply an operation, such as a recognition operation (e.g., barcode decoding), to the imagery and take an action based on resulting information. Operation of the system can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.
97 Citations
40 Claims
-
1. A method employing a device equipped with a processor, a display, a camera and a microphone, the camera capturing imagery depicting plural items in a user'"'"'s physical environment, the method comprising the acts:
-
capturing first speech of the user, with the device microphone; the device processor detecting that the captured first speech includes a cueing expression, and in response to detection of the cueing expression, the device switching from a lower activity state to a heightened alert state, in the heightened alert state the device performing functionality including; capturing second user speech with the device microphone; sending data corresponding to the second user speech to a recognition module, and receiving recognized second speech data in return, the recognized second user speech indicating one of said plural items depicted in the captured imagery as of particular user interest; based on one or more descriptors included in the recognized second speech data, determining a first of said plural depicted items as being of likely user interest; presenting a marking on the device display, at a location indicating said first item; capturing third user speech with the device microphone, the captured third user speech being different than the second user speech; sending data corresponding to the third user speech to the recognition module, and receiving recognized third speech data in return, the recognized third speech data again indicating one of said plural items as of particular user interest; based on one or more descriptors included in the recognized third speech data, determining that a second, different one of said plural depicted items is of greater interest to the user than the first item; moving said marking on the device display to a location indicating said second item; and taking an action based on the second item, said action including presenting information related to the second item to the user; wherein the device is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts, and the descriptors in the recognized second and third speech data iteratively guide the device in identifying which of the plural items in the user'"'"'s physical environment is of user interest, thereby further bounding the device'"'"'s processing efforts. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method employing a device equipped with one or more processors, a camera and a microphone, the camera capturing imagery depicting plural items in a user'"'"'s physical environment, the method comprising the acts:
-
capturing first speech of the user, with the device microphone; detecting, with a device processor, that the captured first speech includes a cueing expression; in response to detection of the cueing expression, switching the device from a lower activity state to a heightened alert state, in the heightened alert state the device performing functionality including; transmitting second speech data of the user, identifying an item depicted in the camera-captured imagery as being of particular user interest, and the captured camera imagery, from the device to a remote computer system, said captured second speech including a noun; following said transmitting, receiving data produced by the remote computer system, said received data having been produced by the remote computer system by applying a selected recognition operation to said captured camera imagery, said recognition operation having been selected, from among a plurality of available recognition operation options, based on said second speech data including said noun; and taking an action based on said received data, including presenting information corresponding thereto to the user; wherein the device is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts, and wherein, in its heightened alert state, the device cooperates with the remote computer system to present the user with information recognition-processed from imagery captured by the device camera, the recognition processing having been selected based on the user'"'"'s second speech, including said noun. - View Dependent Claims (9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
8. A method employing a device equipped with a processor, a camera and a microphone, the camera capturing imagery depicting plural items in a user'"'"'s physical environment, the method comprising the acts:
-
capturing first speech of the user, with the device microphone; detecting, with said device processor, that the captured first speech includes a cueing expression; in response to detection of the cueing expression, switching the device from a lower activity state to a heightened alert state, in the heightened alert state the device performing functionality including; capturing second speech of the user; sending data from the device, said data including data corresponding to the second user speech and data corresponding to the captured imagery, and receiving data, including (a) recognized second speech data and (b) recognition-processed data about a subject depicted in the imagery, in return; and taking an action based on said received data, including presenting information based on the recognition-processed data to the user; the method further including; based on one or more descriptors included in the recognized second speech data, determining a first of said plural depicted items as being of likely user interest; presenting a marking on a display of the device, at a location indicating said first item; capturing third user speech with the device microphone; sending data corresponding to the third user speech from the device, and receiving recognized third speech data in return; based on one or more descriptors included in the recognized third speech data, determining that a second, different one of said plural depicted items is of greater interest to the user than the first item; moving said marking on the device display to a location indicating said second item; receiving a user confirmation that the second item is of interest; and sending data corresponding to the second item from the device for recognition processing; wherein the device is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts; wherein, in its heightened alert state, the device cooperates with a remote computer system to recognition-process imagery captured by the device camera; and wherein descriptors in the recognized second and third speech data iteratively guide the device in identifying which of the plural items in the user'"'"'s physical environment is of user interest, thereby further bounding the device'"'"'s processing efforts.
-
-
11. A tangible computer readable medium containing instructions to configure a device, equipped with a display, a camera and a microphone, to perform acts including:
-
capturing first speech of the user; detecting that the captured first speech includes a cueing expression, and in response to detection of the cueing expression, switching the device from a lower activity state to a heightened alert state, in the heightened alert state the instructions configuring the device to perform functions including; capturing second user speech; sending data corresponding to the second user speech to a recognition module, and receiving recognized second speech data in return; based on one or more descriptors included in the recognized second speech data, determining a first of said plural depicted items as being of likely user interest; presenting a marking on the display, at a location indicating said first item; capturing third user speech, the captured third user speech being different than the second user speech; sending data corresponding to the third user speech to the recognition module, and receiving recognized third speech data in return; based on one or more descriptors included in the recognized third speech data, determining that a second, different one of said plural depicted items is of greater interest to the user than the first item; moving said marking on the display to a location indicating said second item; and taking an action based on the second item, said action including presenting information related to the second item to the user; wherein the device is not on heightened alert all the time, but is cued by said instructions into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts, and the instructions enable descriptors in the recognized second and third speech data to iteratively guide the device in identifying which of the plural items in the user'"'"'s physical environment is of user interest, thereby further bounding the device'"'"'s processing efforts.
-
-
26. A method of operating a server system to interact with a remote user device, the remote user device including one or more processors, a camera and a microphone, the remote user device being operative to capture imagery depicting plural items in a user'"'"'s physical environment and also capture user speech, the user device further being operative to respond to detection of a cueing expression in first captured speech to switch the device from a lower activity state to a heightened alert state, the method including:
-
the server system receiving captured imagery and second user speech information from the remote user device when the remote user device is in the heightened alert state, wherein a noun in said second user speech information identifies an item depicted in the device-captured imagery as being of particular user interest; the server system selecting a recognition operation to apply to the received captured imagery, from among a plurality of available recognition operations, based on said noun; a hardware processor in the server system applying said selected operation to the received captured imagery; and the server system transmitting data resulting from application of said selected operation to the received captured imagery, to the user device, for presentation to said user; wherein the server system cooperates with the user device to present information recognition-processed from imagery captured by the user device, wherein said recognition processing is selected from among said plurality of available recognition operations, based on the noun in the user'"'"'s second speech information. - View Dependent Claims (27, 28, 29, 30)
-
-
31. A method performed using a system comprising a local device and a remote server, the system including plural processors, the local device including a camera that captures imagery from a user'"'"'s physical environment and a microphone that captures user speech, the method comprising the acts:
-
capturing user speech with the microphone; recognizing that an initial portion of the captured speech includes a cueing expression; in response to recognition of the cueing expression, switching the system from a lower activity state to a heightened alert state; recognizing a further portion of the captured user speech, the user speech expressing a user request for the system to fulfill; determining, based on the recognized user speech, that fulfillment of said request requires analysis of imagery captured by the camera; after the system has been switched to the heightened alert state, and after determining, based on the recognized user speech, that fulfillment of the user request requires analysis of imagery captured by the camera, applying a recognition operation to camera-captured imagery using one or more of said plural processors to extract information; and taking an action based on the information extracted from the camera-captured imagery; wherein the system is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, and wherein, when in its heightened alert state, and in response to the recognized user speech, the system extracts information from the captured imagery and takes action based thereon. - View Dependent Claims (32, 33, 34, 35, 36, 37)
-
-
38. A system comprising a microphone, a camera, and plural processors controlled by instructions stored in one or more memories, the processors performing acts including:
-
capturing user speech with the microphone; recognizing that an initial portion of the captured user speech includes a cueing expression; in response to recognition of the cueing expression, switching the system from a lower activity state to a heightened alert state; recognizing a further portion of the captured user speech, the user speech expressing a user request for the system to fulfill; determining, based on the recognized user speech, that fulfillment of said request requires analysis of imagery captured by the camera; after the system has been switched to the heightened alert state, and after determining, based on the recognized user speech, that fulfillment of the user request requires analysis of imagery captured by the camera, applying a recognition operation to camera-captured imagery using one or more of said plural processors to extract information; and taking an action based on the information extracted from the camera-captured imagery; wherein the system is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, and wherein, when in its heightened alert state, and in response to the recognized user speech, the system extracts information from the captured imagery and takes action based thereon. - View Dependent Claims (39, 40)
-
Specification