Intuitive computing methods and systems

US 10,785,365 B2
Filed: 06/12/2017
Issued: 09/22/2020
Est. Priority Date: 10/28/2009
Status: Active Grant

First Claim

Patent Images

1. A method employing a device equipped with a processor, a display, a camera and a microphone, the camera capturing imagery depicting plural items in a user'"'"'s physical environment, the method comprising the acts:

capturing first speech of the user, with the device microphone;

the device processor detecting that the captured first speech includes a cueing expression, and in response to detection of the cueing expression, the device switching from a lower activity state to a heightened alert state, in the heightened alert state the device performing functionality including;

capturing second user speech with the device microphone;

sending data corresponding to the second user speech to a recognition module, and receiving recognized second speech data in return, the recognized second user speech indicating one of said plural items depicted in the captured imagery as of particular user interest;

based on one or more descriptors included in the recognized second speech data, determining a first of said plural depicted items as being of likely user interest;

presenting a marking on the device display, at a location indicating said first item;

capturing third user speech with the device microphone, the captured third user speech being different than the second user speech;

sending data corresponding to the third user speech to the recognition module, and receiving recognized third speech data in return, the recognized third speech data again indicating one of said plural items as of particular user interest;

based on one or more descriptors included in the recognized third speech data, determining that a second, different one of said plural depicted items is of greater interest to the user than the first item;

moving said marking on the device display to a location indicating said second item; and

taking an action based on the second item, said action including presenting information related to the second item to the user;

wherein the device is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts, and the descriptors in the recognized second and third speech data iteratively guide the device in identifying which of the plural items in the user'"'"'s physical environment is of user interest, thereby further bounding the device'"'"'s processing efforts.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system senses audio, imagery, and/or other stimulus from a user'"'"'s environment, and responds to fulfill user desires. In one particular arrangement, a discovery session is launched when the user speaks a cueing expression, which serves to switch the system from a lower activity state to a heightened alert state. The system may recognize that the speech expresses a user request that requires analysis of camera-captured imagery to fulfill. In response the system can apply an operation, such as a recognition operation (e.g., barcode decoding), to the imagery and take an action based on resulting information. Operation of the system can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.

97 Citations

View as Search Results

40 Claims

1. A method employing a device equipped with a processor, a display, a camera and a microphone, the camera capturing imagery depicting plural items in a user'"'"'s physical environment, the method comprising the acts:
- capturing first speech of the user, with the device microphone;
  
  the device processor detecting that the captured first speech includes a cueing expression, and in response to detection of the cueing expression, the device switching from a lower activity state to a heightened alert state, in the heightened alert state the device performing functionality including;
  
  capturing second user speech with the device microphone;
  
  sending data corresponding to the second user speech to a recognition module, and receiving recognized second speech data in return, the recognized second user speech indicating one of said plural items depicted in the captured imagery as of particular user interest;
  
  based on one or more descriptors included in the recognized second speech data, determining a first of said plural depicted items as being of likely user interest;
  
  presenting a marking on the device display, at a location indicating said first item;
  
  capturing third user speech with the device microphone, the captured third user speech being different than the second user speech;
  
  sending data corresponding to the third user speech to the recognition module, and receiving recognized third speech data in return, the recognized third speech data again indicating one of said plural items as of particular user interest;
  
  based on one or more descriptors included in the recognized third speech data, determining that a second, different one of said plural depicted items is of greater interest to the user than the first item;
  
  moving said marking on the device display to a location indicating said second item; and
  
  taking an action based on the second item, said action including presenting information related to the second item to the user;
  
  wherein the device is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts, and the descriptors in the recognized second and third speech data iteratively guide the device in identifying which of the plural items in the user'"'"'s physical environment is of user interest, thereby further bounding the device'"'"'s processing efforts.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 that includes sending data corresponding to the second item to a recognition engine, receiving second item recognition data in return, and taking an action based on the second item recognition data.
  - 3. The method of claim 2 in which the second item is a face of a person, and the method includes sending data corresponding to said face to a facial recognition engine, receiving information about said person in return, and presenting the received information to the user.
  - 4. The method of claim 1 that further includes receiving a user confirmation that the second item is of interest, prior to presenting information related to the second item to the user.
  - 5. The method of claim 1 that further includes presenting information related to the second item to the user on the device display.
  - 6. The method of claim 1 in which said device is battery-powered, wherein the method includes the device processor detecting the cueing expression using battery power.

7. A method employing a device equipped with one or more processors, a camera and a microphone, the camera capturing imagery depicting plural items in a user'"'"'s physical environment, the method comprising the acts:
- capturing first speech of the user, with the device microphone;
  
  detecting, with a device processor, that the captured first speech includes a cueing expression;
  
  in response to detection of the cueing expression, switching the device from a lower activity state to a heightened alert state, in the heightened alert state the device performing functionality including;
  
  transmitting second speech data of the user, identifying an item depicted in the camera-captured imagery as being of particular user interest, and the captured camera imagery, from the device to a remote computer system, said captured second speech including a noun;
  
  following said transmitting, receiving data produced by the remote computer system, said received data having been produced by the remote computer system by applying a selected recognition operation to said captured camera imagery, said recognition operation having been selected, from among a plurality of available recognition operation options, based on said second speech data including said noun; and
  
  taking an action based on said received data, including presenting information corresponding thereto to the user;
  
  wherein the device is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts, and wherein, in its heightened alert state, the device cooperates with the remote computer system to present the user with information recognition-processed from imagery captured by the device camera, the recognition processing having been selected based on the user'"'"'s second speech, including said noun.
- View Dependent Claims (9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 9. The method of claim 7 in which the cueing expression consists of two words.
  - 10. The method of claim 7 in which said device is battery-powered, wherein the method includes the device processor detecting the cueing expression using battery power.
  - 12. The method of claim 7 that further includes consulting a data structure to identify a recognition operation to perform, based on said noun.
  - 13. The method of claim 7 in which selection of said operation is performed by the device.
  - 14. The method of claim 7 in which selection of said operation is performed by the remote computer system.
  - 15. The method of claim 7 in which the second speech of the user includes the word “
    - LOOK.”
  - 16. The method of claim 7 in which the second speech of the user includes a word selected from the list consisting of:
    - “
      
      watch,”
      
      “
      
      view,”
      
      “
      
      see,” and
      
      “
      
      read.”
  - 17. The method of claim 7 in which the second speech of the user includes a word selected from the list consisting of:
    - “
      
      newspaper,”
      
      “
      
      book,”
      
      “
      
      magazine,”
      
      “
      
      poster,”
      
      “
      
      text,”
      
      “
      
      printing,”
      
      “
      
      ticket,”
      
      “
      
      box,”
      
      “
      
      package,”
      
      “
      
      carton,”
      
      “
      
      wrapper,”
      
      “
      
      product,”
      
      “
      
      barcode,”
      
      “
      
      watermark,”
      
      “
      
      photograph,”
      
      “
      
      photo,”
      
      “
      
      person,”
      
      “
      
      man,”
      
      “
      
      boy,”
      
      “
      
      woman,”
      
      “
      
      girl,”
      
      “
      
      him,”
      
      “
      
      her,”
      
      “
      
      them,”
      
      “
      
      people,”
      
      “
      
      display,”
      
      “
      
      screen,”
      
      “
      
      monitor,”
      
      “
      
      video,”
      
      “
      
      movie,”
      
      “
      
      television,”
      
      “
      
      radio,”
      
      “
      
      iPhone,”
      
      “
      
      iPad,” and
      
      “
      
      Kindle.”
  - 18. The method of claim 7 in which the second speech of the user includes the word “
    - no”
      
      or “
      
      not.”
  - 19. The method of claim 7 in which selected recognition operation is applied to the captured camera imagery after an earlier recognition operation was applied but failed.
  - 20. The method of claim 7 in which said received data has been produced by the remote computer system by applying a barcode decoding operation to said captured camera imagery, said barcode decoding operation having been selected based on said second speech data including said noun.
  - 21. The method of claim 7 in which said received data has been produced by the remote computer system by applying a digital watermark decoding operation to said captured camera imagery, said digital watermark decoding operation having been selected based on said second speech data including said noun.
  - 22. The method of claim 7 in which said received data has been produced by the remote computer system by a process that includes calculating image or video fingerprints from said captured camera imagery, said process of calculating image or video fingerprints from said captured camera imagery having been selected based on said second speech data including said noun.
  - 23. The method of claim 7 in which said received data has been produced by the remote computer system by applying a facial recognition operation to said captured camera imagery, said facial recognition operation having been selected based on said second speech data including said noun.
  - 24. The method of claim 7 in which said received data has been produced by the remote computer system by applying a neural net pattern recognition operation to said captured camera imagery, said neural net pattern recognition operation having been selected based on said second speech data including said noun.
  - 25. The method of claim 7 in which said received data was produced by the remote computer system interpreting said noun as a directive not to apply certain of said plurality of available recognition operation options to said camera-captured imagery.

8. A method employing a device equipped with a processor, a camera and a microphone, the camera capturing imagery depicting plural items in a user'"'"'s physical environment, the method comprising the acts:
- capturing first speech of the user, with the device microphone;
  
  detecting, with said device processor, that the captured first speech includes a cueing expression;
  
  in response to detection of the cueing expression, switching the device from a lower activity state to a heightened alert state, in the heightened alert state the device performing functionality including;
  
  capturing second speech of the user;
  
  sending data from the device, said data including data corresponding to the second user speech and data corresponding to the captured imagery, and receiving data, including (a) recognized second speech data and (b) recognition-processed data about a subject depicted in the imagery, in return; and
  
  taking an action based on said received data, including presenting information based on the recognition-processed data to the user;
  
  the method further including;
  
  based on one or more descriptors included in the recognized second speech data, determining a first of said plural depicted items as being of likely user interest;
  
  presenting a marking on a display of the device, at a location indicating said first item;
  
  capturing third user speech with the device microphone;
  
  sending data corresponding to the third user speech from the device, and receiving recognized third speech data in return;
  
  based on one or more descriptors included in the recognized third speech data, determining that a second, different one of said plural depicted items is of greater interest to the user than the first item;
  
  moving said marking on the device display to a location indicating said second item;
  
  receiving a user confirmation that the second item is of interest; and
  
  sending data corresponding to the second item from the device for recognition processing;
  
  wherein the device is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts;
  
  wherein, in its heightened alert state, the device cooperates with a remote computer system to recognition-process imagery captured by the device camera; and
  
  wherein descriptors in the recognized second and third speech data iteratively guide the device in identifying which of the plural items in the user'"'"'s physical environment is of user interest, thereby further bounding the device'"'"'s processing efforts.

11. A tangible computer readable medium containing instructions to configure a device, equipped with a display, a camera and a microphone, to perform acts including:
- capturing first speech of the user;
  
  detecting that the captured first speech includes a cueing expression, and in response to detection of the cueing expression, switching the device from a lower activity state to a heightened alert state, in the heightened alert state the instructions configuring the device to perform functions including;
  
  capturing second user speech;
  
  sending data corresponding to the second user speech to a recognition module, and receiving recognized second speech data in return;
  
  based on one or more descriptors included in the recognized second speech data, determining a first of said plural depicted items as being of likely user interest;
  
  presenting a marking on the display, at a location indicating said first item;
  
  capturing third user speech, the captured third user speech being different than the second user speech;
  
  sending data corresponding to the third user speech to the recognition module, and receiving recognized third speech data in return;
  
  based on one or more descriptors included in the recognized third speech data, determining that a second, different one of said plural depicted items is of greater interest to the user than the first item;
  
  moving said marking on the display to a location indicating said second item; and
  
  taking an action based on the second item, said action including presenting information related to the second item to the user;
  
  wherein the device is not on heightened alert all the time, but is cued by said instructions into activation from a lower activity state by the cueing expression, thereby bounding the device'"'"'s processing efforts, and the instructions enable descriptors in the recognized second and third speech data to iteratively guide the device in identifying which of the plural items in the user'"'"'s physical environment is of user interest, thereby further bounding the device'"'"'s processing efforts.

26. A method of operating a server system to interact with a remote user device, the remote user device including one or more processors, a camera and a microphone, the remote user device being operative to capture imagery depicting plural items in a user'"'"'s physical environment and also capture user speech, the user device further being operative to respond to detection of a cueing expression in first captured speech to switch the device from a lower activity state to a heightened alert state, the method including:
- the server system receiving captured imagery and second user speech information from the remote user device when the remote user device is in the heightened alert state, wherein a noun in said second user speech information identifies an item depicted in the device-captured imagery as being of particular user interest;
  
  the server system selecting a recognition operation to apply to the received captured imagery, from among a plurality of available recognition operations, based on said noun;
  
  a hardware processor in the server system applying said selected operation to the received captured imagery; and
  
  the server system transmitting data resulting from application of said selected operation to the received captured imagery, to the user device, for presentation to said user;
  
  wherein the server system cooperates with the user device to present information recognition-processed from imagery captured by the user device, wherein said recognition processing is selected from among said plurality of available recognition operations, based on the noun in the user'"'"'s second speech information.
- View Dependent Claims (27, 28, 29, 30)
- - 27. The method of claim 26 in which the second user speech information includes the word “
    - LOOK.”
  - 28. The method of claim 26 in which the second user speech information includes a word selected from the list consisting of:
    - “
      
      watch,”
      
      “
      
      view,”
      
      “
      
      see,” and
      
      “
      
      read.”
  - 29. The method of claim 26 in which the second user speech information includes a word selected from the list consisting of:
    - “
      
      newspaper,”
      
      “
      
      book,”
      
      “
      
      magazine,”
      
      “
      
      poster,”
      
      “
      
      text,”
      
      “
      
      printing,”
      
      “
      
      ticket,”
      
      “
      
      box,”
      
      “
      
      package,”
      
      “
      
      carton,”
      
      “
      
      wrapper,”
      
      “
      
      product,”
      
      “
      
      barcode,”
      
      “
      
      watermark,”
      
      “
      
      photograph,”
      
      “
      
      photo,”
      
      “
      
      person,”
      
      “
      
      man,”
      
      “
      
      boy,”
      
      “
      
      woman,”
      
      “
      
      girl,”
      
      “
      
      him,”
      
      “
      
      her,”
      
      “
      
      them,”
      
      “
      
      people,”
      
      “
      
      display,”
      
      “
      
      screen,”
      
      “
      
      monitor,”
      
      “
      
      video,”
      
      “
      
      movie,”
      
      “
      
      television,”
      
      “
      
      radio,”
      
      “
      
      iPhone,”
      
      “
      
      iPad,” and
      
      “
      
      Kindle.”
  - 30. The method of claim 26 in which the second user speech information includes the word “
    - no”
      
      or “
      
      not.”

31. A method performed using a system comprising a local device and a remote server, the system including plural processors, the local device including a camera that captures imagery from a user'"'"'s physical environment and a microphone that captures user speech, the method comprising the acts:
- capturing user speech with the microphone;
  
  recognizing that an initial portion of the captured speech includes a cueing expression;
  
  in response to recognition of the cueing expression, switching the system from a lower activity state to a heightened alert state;
  
  recognizing a further portion of the captured user speech, the user speech expressing a user request for the system to fulfill;
  
  determining, based on the recognized user speech, that fulfillment of said request requires analysis of imagery captured by the camera;
  
  after the system has been switched to the heightened alert state, and after determining, based on the recognized user speech, that fulfillment of the user request requires analysis of imagery captured by the camera, applying a recognition operation to camera-captured imagery using one or more of said plural processors to extract information; and
  
  taking an action based on the information extracted from the camera-captured imagery;
  
  wherein the system is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, and wherein, when in its heightened alert state, and in response to the recognized user speech, the system extracts information from the captured imagery and takes action based thereon.
- View Dependent Claims (32, 33, 34, 35, 36, 37)
- - 32. The method of claim 31 in which said acts include selecting the recognition operation, from plural recognition operations, based on the recognized user speech.
  - 33. The method of claim 32 that includes performing said selected recognition operation using a processor at the remote server.
  - 34. The method of claim 31 in which said determining act comprises determining, based on a recognized verb of the user speech, that fulfillment of said request requires analysis of imagery captured by the camera.
  - 35. The method of claim 31 in which the recognition operation is a barcode decoding operation.
  - 36. The method of claim 31 that includes selecting the recognition operation, from among plural alternatives, based on a recognized noun from the user'"'"'s speech.
  - 37. The method of claim 31 in which said action includes the remote server performing one or more acts.

38. A system comprising a microphone, a camera, and plural processors controlled by instructions stored in one or more memories, the processors performing acts including:
- capturing user speech with the microphone;
  
  recognizing that an initial portion of the captured user speech includes a cueing expression;
  
  in response to recognition of the cueing expression, switching the system from a lower activity state to a heightened alert state;
  
  recognizing a further portion of the captured user speech, the user speech expressing a user request for the system to fulfill;
  
  determining, based on the recognized user speech, that fulfillment of said request requires analysis of imagery captured by the camera;
  
  after the system has been switched to the heightened alert state, and after determining, based on the recognized user speech, that fulfillment of the user request requires analysis of imagery captured by the camera, applying a recognition operation to camera-captured imagery using one or more of said plural processors to extract information; and
  
  taking an action based on the information extracted from the camera-captured imagery;
  
  wherein the system is not on heightened alert all the time, but is cued into activation from a lower activity state by the cueing expression, and wherein, when in its heightened alert state, and in response to the recognized user speech, the system extracts information from the captured imagery and takes action based thereon.
- View Dependent Claims (39, 40)
- - 39. The system of claim 38 wherein said acts include selecting the recognition operation applied to the captured imagery from among plural different recognition operations, based on the recognized user speech.
  - 40. The system of claim of claim 38 in which said determining act comprises determining, based on a recognized verb of the recognized user speech, that fulfillment of said request requires analysis of imagery captured by the camera.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Digimarc Corporation
Original Assignee
Digimarc Corporation
Inventors
Rodriguez, Tony F., Rhoads, Geoffrey B., Davis, Bruce L.
Primary Examiner(s)
Sharma, Neeraj

Application Number

US15/620,380
Publication Number

US 20170289341A1
Time in Patent Office

1,198 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 3/04842   Selection of displayed obje...

G06F 3/04847   Interaction techniques to c...

G06F 3/0488   using a touch-screen or dig...

G06F 9/50   Allocation of resources, e....

G06V 10/751   Comparing pixel values or l...

G06V 40/20   Movements or behaviour, e.g...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

H04M 1/72448   with means for adapting the...

H04M 1/72469   for operating the device by...

H04W 72/02   Selection of wireless resou...

Intuitive computing methods and systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

97 Citations

40 Claims

Specification

Solutions

Use Cases

Quick Links

Intuitive computing methods and systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

97 Citations

40 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links