Intuitive computing methods and systems
First Claim
1. A method of declarative reconfiguration of a smart phone system, said system having a processor configured to perform one or more acts of the method, said system also including at least first and second sensors for capturing, respectively, first and second different types of media content from a user'"'"'s environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data, the method comprising the acts:
- (a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone;
(b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user'"'"'s environment from which sensor data is captured;
(c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content;
(d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content;
(e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and
(f) providing results based on said tuned content recognition operation to the user;
wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content.
1 Assignment
0 Petitions
Accused Products
Abstract
A smart phone senses audio, imagery, and/or other stimulus from a user'"'"'s environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone'"'"'s camera. The image processing tasks applied to the scene can be selected from among various alternatives by reference to resource costs, resource constraints, other stimulus information (e.g., audio), task substitutability, etc. The phone can apply more or less resources to an image processing task depending on how successfully the task is proceeding, or based on the user'"'"'s apparent interest in the task. In some arrangements, data may be referred to the cloud for analysis, or for gleaning. Cognition, and identification of appropriate device response(s), can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.
148 Citations
22 Claims
-
1. A method of declarative reconfiguration of a smart phone system, said system having a processor configured to perform one or more acts of the method, said system also including at least first and second sensors for capturing, respectively, first and second different types of media content from a user'"'"'s environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data, the method comprising the acts:
-
(a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone; (b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user'"'"'s environment from which sensor data is captured; (c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content; (d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content; (e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and (f) providing results based on said tuned content recognition operation to the user;
wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer readable medium containing programming instructions for configuring a smart phone system that includes a processor and at least first and second sensors for capturing, respectively, first and second different types of media content from a user'"'"'s environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data, said instructions configuring the system programmed thereby to perform acts including:
-
(a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone; (b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user'"'"'s environment from which sensor data is captured; (c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content; (d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content; (e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and (f) providing results based on said tuned content recognition operation to the user;
wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content.
-
-
22. A smart phone system including:
-
a processor; a memory; at least first and second sensors for capturing, respectively, first and second different types of media content from a user'"'"'s environment, and for producing, respectively, first and second different types of sensor output data, one of said sensors comprising a microphone for sensing audio content and producing audio output data, and another of said sensors comprising an image sensor for sensing visual content and producing image output data; and
instructions in said memory that configure the system to perform;(a) applying, to a speech recognition module, audio output data corresponding to user speech received by the microphone; (b) receiving, from the speech recognition module, recognized verb data and recognized noun data corresponding, respectively, to a verb and a noun included in said user speech, the noun data identifying a subject in the user'"'"'s environment from which sensor data is captured; (c) based on said recognized verb data, determining that the user is either interested in the first type of media content or in the second type of media content; (d) based on said recognized noun data, tuning a content recognition operation of the system in accordance with a determined user interest, said tuning comprising establishing a set of one or more audio or image processing operations to be performed on output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content, said set being selected from a larger set of signal processing operations comprising image or audio processing operations, said tuning including accessing a data structure using said recognized noun data to obtain data identifying said set of one or more signal processing operations to be performed on said output data from the first sensor or the second sensor based on the determined user interest in the first type of media content or in the second type of media content; (e) performing said tuned content recognition operation on the first sensor output data or on the second sensor output data; and (f) providing results based on said tuned content recognition operation to the user;
wherein speech recognition is employed both (1) in identifying a type of media content of interest to the user, and (2) in tuning content recognition processing of said identified type of media content.
-
Specification