Intuitive computing methods and systems

US 9,197,736 B2
Filed: 06/09/2010
Issued: 11/24/2015
Est. Priority Date: 12/31/2009
Status: Active Grant

First Claim

Patent Images

1. A method of declarative reconfiguration of a smart phone system, said system having a processor configured to perform one or more acts of the method, said system also including at least first and second sensors for capturing, respectively, first and second different types of media content from a user'"'"'s environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data, the method comprising the acts:

(a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone;

(b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user'"'"'s environment from which sensor data is captured;

(c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content;

(d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content;

(e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and

(f) providing results based on said tuned content recognition operation to the user;

wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A smart phone senses audio, imagery, and/or other stimulus from a user'"'"'s environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone'"'"'s camera. The image processing tasks applied to the scene can be selected from among various alternatives by reference to resource costs, resource constraints, other stimulus information (e.g., audio), task substitutability, etc. The phone can apply more or less resources to an image processing task depending on how successfully the task is proceeding, or based on the user'"'"'s apparent interest in the task. In some arrangements, data may be referred to the cloud for analysis, or for gleaning. Cognition, and identification of appropriate device response(s), can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.

148 Citations

View as Search Results

22 Claims

1. A method of declarative reconfiguration of a smart phone system, said system having a processor configured to perform one or more acts of the method, said system also including at least first and second sensors for capturing, respectively, first and second different types of media content from a user'"'"'s environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data, the method comprising the acts:
- (a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone;
  
  (b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user'"'"'s environment from which sensor data is captured;
  
  (c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content;
  
  (d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content;
  
  (e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and
  
  (f) providing results based on said tuned content recognition operation to the user;
  
  wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The method of claim 1 in which the recognized verb data comprises data corresponding to a verb from the list consisting of:
    - look, watch, view, see, and read.
  - 3. The method of claim 1 in which the recognized verb data comprises data corresponding to a verb from the list consisting of:
    - listen, and hear.
  - 4. The method of claim 1 in which the recognized noun data comprises data corresponding to a noun from the list consisting of:
    - newspaper, book, magazine, poster, text, printing, ticket, box, package, carton, wrapper, product, barcode, watermark, photograph, person, man, boy, woman, girl, people, display, screen, monitor, video, movie, television, radio, iPhone, iPad, and Kindle.
  - 5. The method of claim 1 that includes determining, by reference to the recognized verb data, that visual content, rather than audio content, is of interest to the user, and the method includes determining a type of image processing to be applied to the image output data.
  - 6. The method of claim 5 wherein the type of image processing comprises digital watermark decoding.
  - 7. The method of claim 5 wherein the type of image processing comprises image fingerprinting.
  - 8. The method of claim 5 wherein the type of image processing comprises optical character recognition.
  - 9. The method of claim 5 wherein the type of image processing comprises barcode reading.
  - 10. The method of claim 1 that includes:
    - determining, by reference to the recognized verb data, that visual content, rather than audio content, is of interest to the user; and
      
      determining, by reference to the recognized noun data, a filtering function to be applied to the image output data.
  - 11. The method of claim 1 that includes:
    - determining, by reference to the recognized verb data, that visual content, rather than audio content, is of interest to the user; and
      
      determining, by reference to the recognized noun data, an optical focusing function to be applied to the image output data.
  - 12. The method of claim 1 in which the user speech data includes a negation from the list:
    - not, no and ignore.
  - 13. The method of claim 1 in which said recognized verb data directs the system that the user is interested in audio content rather than visual content, and said recognized noun data establishes an audio filtering function that is to be applied to said audio output data.
  - 14. The method of claim 13 in which a passband of said audio filtering function depends on said recognized noun data.
  - 15. The method of claim 13 that includes establishing a male voice-tailored audio filtering passband function in response to first recognized noun data, and establishing a female voice-tailored audio filtering passband function in response to second recognized noun data.
  - 16. The method of claim 13 that includes, as a consequence of first user speech, processing audio output data with an audio filtering function having a first passband, and as a consequence of second user speech, processing audio output data with an audio filtering function having a second passband different than the first passband.
  - 17. The method of claim 1 that includes:
    - as a consequence of first user speech, including a first verb and a first noun, directing the system to process audio output data with a first signal processing operation; and
      
      as a consequence of second user speech, including a second verb and a second noun, directing the system to process image output data with a second signal processing operation;
      
      wherein the first verb is different than the second verb, and the first noun is different than the second noun.
  - 18. The method of claim 1 that further includes, before act (c), detecting a keyword in the user speech, said keyword detection serving as a cue to the system to perform acts (c) through (e).
  - 19. The method of claim 1 in which the first sensor comprises the microphone and the second sensor comprises the image sensor, and the determined user interest comprises an indication of an interest in the first type of media content, in which the first type of media content comprises audio content, and in which act (e) preforms said tuned content recognition operation on the audio output data.
  - 20. The method of claim 1 in which the first sensor comprises the microphone and the second sensor comprises the image sensor, and the determined user interest comprises an indication of an interest in the second type of media content, in which the second type of media content comprises visual content, and in which act (e) preforms said tuned content recognition operation on the image output data.

21. A non-transitory computer readable medium containing programming instructions for configuring a smart phone system that includes a processor and at least first and second sensors for capturing, respectively, first and second different types of media content from a user'"'"'s environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data, said instructions configuring the system programmed thereby to perform acts including:
- (a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone;
  
  (b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user'"'"'s environment from which sensor data is captured;
  
  (c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content;
  
  (d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content;
  
  (e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and
  
  (f) providing results based on said tuned content recognition operation to the user;
  
  wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content.

22. A smart phone system including:
- a processor;
  
  a memory;
  
  at least first and second sensors for capturing, respectively, first and second different types of media content from a user'"'"'s environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data; and
  
  instructions in said memory that configure the system to perform;
  
  (a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone;
  
  (b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user'"'"'s environment from which sensor data is captured;
  
  (c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content;
  
  (d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content;
  
  (e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and
  
  (f) providing results based on said tuned content recognition operation to the user;
  
  wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Digimarc Corporation
Original Assignee
Digimarc Corporation
Inventors
Davis, Bruce L., Rodriguez, Tony F., Conwell, William Y., Rhoads, Geoffrey B.
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Sharma, Neeraj

Application Number

US12/797,503
Publication Number

US 20110161076A1
Time in Patent Office

1,994 Days
Field of Search

704/233, 704/275, 704/235, 704/270, 704/231, 704/9, 704/4, 704/260, 348/231.99, 348/207.99, 348/14.01, 348/345, 348/231.4, 705/14.52, 705/7.18, 705/75, 707/100, 713/202, 706/16, 345/702, 345/156, 345/419, 340/669, 340/541, 340/566, 340/5.31, 340/573.4
US Class Current

1/1
CPC Class Codes

G06F 3/04842   Selection of displayed obje...

G06F 3/04847   Interaction techniques to c...

G06F 3/0488   using a touch-screen or dig...

G06F 9/50   Allocation of resources, e....

G06V 10/751   Comparing pixel values or l...

G06V 40/20   Movements or behaviour, e.g...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 17/00   Speaker identification or v...

H04M 1/72448   with means for adapting the...

H04M 1/72469   for operating the device by...

H04W 72/02   Selection of wireless resou...

Intuitive computing methods and systems

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

148 Citations

22 Claims

Specification

Use Cases

Quick Links

Others

Intuitive computing methods and systems

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

148 Citations

22 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others