Method and apparatus for enhancing digital images with textual explanations
First Claim
Patent Images
1. A digital camera, comprising:
- a housing;
a digital optical sensing apparatus mounted within said housing, said digital optical sensing apparatus sensing optical images;
a storage medium for storing digital optical images captured by said digital optical sensing apparatus;
an acoustic sensor capable of sensing human speech;
a speech reduction apparatus coupled to said acoustic sensor, said speech reduction apparatus converting human speech sensed by said acoustic sensor to a symbolic text form; and
a controller which stores said symbolic text form in said storage medium in a relationship associated with a captured digital image, wherein said controller;
(a) receives a user indication of a plurality of discrete time intervals;
(b) records a plurality of discrete human speech segments sensed by said acoustic sensor in respective said discrete time intervals;
(c) causes said speech reduction apparatus to convert each said human speech segment to a corresponding symbolic text segment; and
(d) automatically associates a respective digital optical image captured by said digital optical sensing apparatus with each said symbolic text segment based on a temporal relationship between the time interval in which the discrete human speech segment corresponding to the symbolic text segment was recorded and the capturing of said digital optical image;
wherein said controller associates a respective digital image with each symbolic text segment according to all of the following association priorities;
(1) if a first digital image is captured during the recording of a human speech segment corresponding to the symbolic text segment, the symbolic text segment is associated with the first digital image;
(2) if no digital image is captured from a time the digital camera is powered on until the end of the recoding of the human speech segment corresponding to the symbolic text segment, and a second digital image is captured after recording the human speech segment but before the digital camera is powered off, then the symbolic text segment is associated with the second digital image; and
(3) in all other cases, the symbolic text is associated with the digital image most recently captured before the recording of the human speech segment corresponding to the symbolic text segment.
1 Assignment
0 Petitions
Accused Products
Abstract
A photographer adds explanatory text to a captured image by speaking the text at approximately the time the image is captured by the digital camera. The spoken information is reduced to text by recognizing the user'"'"'s speech, and is associated with the digital image. Preferably, the digital camera contains an on-board speech reduction capability to produce an intermediate symbolic form expressing the user'"'"'s speech as a series of basic sounds, or phonemes, which can be later reduced to natural language text by a computer system having access to sophisticated vocabulary lists and syntactical analysis.
53 Citations
15 Claims
-
1. A digital camera, comprising:
-
a housing; a digital optical sensing apparatus mounted within said housing, said digital optical sensing apparatus sensing optical images; a storage medium for storing digital optical images captured by said digital optical sensing apparatus; an acoustic sensor capable of sensing human speech; a speech reduction apparatus coupled to said acoustic sensor, said speech reduction apparatus converting human speech sensed by said acoustic sensor to a symbolic text form; and a controller which stores said symbolic text form in said storage medium in a relationship associated with a captured digital image, wherein said controller; (a) receives a user indication of a plurality of discrete time intervals; (b) records a plurality of discrete human speech segments sensed by said acoustic sensor in respective said discrete time intervals; (c) causes said speech reduction apparatus to convert each said human speech segment to a corresponding symbolic text segment; and (d) automatically associates a respective digital optical image captured by said digital optical sensing apparatus with each said symbolic text segment based on a temporal relationship between the time interval in which the discrete human speech segment corresponding to the symbolic text segment was recorded and the capturing of said digital optical image; wherein said controller associates a respective digital image with each symbolic text segment according to all of the following association priorities; (1) if a first digital image is captured during the recording of a human speech segment corresponding to the symbolic text segment, the symbolic text segment is associated with the first digital image; (2) if no digital image is captured from a time the digital camera is powered on until the end of the recoding of the human speech segment corresponding to the symbolic text segment, and a second digital image is captured after recording the human speech segment but before the digital camera is powered off, then the symbolic text segment is associated with the second digital image; and (3) in all other cases, the symbolic text is associated with the digital image most recently captured before the recording of the human speech segment corresponding to the symbolic text segment. - View Dependent Claims (2, 3, 4)
-
-
5. A method of operating a digital camera, comprising the steps of:
-
capturing a plurality of digital images of respective objects of interest with optical sensing apparatus of said digital camera; recording a plurality of discrete segments of human speech of a user in said digital camera during a plurality of respective discrete time intervals, each respective discrete time interval occurring substantially contemporaneously with capturing of each respective digital image of said plurality of digital images; rendering each said segment of said plurality of discrete segments of human speech in a respective corresponding segment of symbolic text using speech reduction apparatus within said digital camera; and automatically associating each respective digital image of said plurality of digital images with a respective corresponding segment of symbolic text rendered from a respective corresponding segment of human speech based on a temporal relationship between the respective discrete time interval during which the corresponding segment of human speed was recorded and the capturing of the respective digital image, and storing each said symbolic text segment in a relationship associated with each respective said captured digital image; wherein said step of automatically associating each respective digital image of said plurality of digital images with a respective corresponding segment of symbolic text comprises automatically associating according to all of the following association priorities; (1) if a first digital image is captured during the recording of a human speech segment corresponding to the symbolic text segment, the symbolic text segment is associated with the first digital image; (2) if no digital image is captured from a time the digital camera is powered on until the end of the recoding of the human speech segment corresponding to the symbolic text segment, and a second digital image is captured after recording the human speech segment but before the digital camera is powered off, then the symbolic text segment is associated with the second digital image; and (3) in all other cases, the symbolic text is associated with the digital image most recently captured before the recording of the human speech segment corresponding to the symbolic text segment. - View Dependent Claims (6, 7, 8)
-
-
9. A program product for controlling the operation of a digital camera, said program product comprising a plurality of processor executable instructions recorded on signal-bearing media, wherein said instructions, when executed by at least one programmable processor within said digital camera, cause the camera to perform the steps of:
-
capturing a plurality of digital images of respective objects of interest with optical sensing apparatus of said digital camera; recording a plurality of discrete segments of human speech of a user in said digital camera during a plurality of respective discrete time intervals, each respective discrete time interval occurring substantially contemporaneously with capturing of each respective digital image of said plurality of digital images; rendering each said segment of said plurality of discrete segments of human speech in a respective corresponding segment of symbolic text using speech reduction apparatus within said digital camera; and associating each respective digital image of said plurality of digital images with a respective corresponding segment of symbolic text rendered from a respective corresponding segment of human speech based on a temporal relationship between the respective discrete time interval during which the corresponding segment of human speech was recorded and the capturing of the respective digital image, and storing each said symbolic text segment in a relationship associated with each respective said captured digital image; wherein said step of associating each respective digital image of said plurality of digital images with a respective corresponding segment of symbolic text comprises associating according to all of the following priorities; (1) if a first digital image is captured during the recording of a human speech segment corresponding to the symbolic text segment, the symbolic text segment is associated with the first digital image; (2) if no digital image is captured from a time the digital camera is powered on until the end of the recoding of the human speech segment corresponding to the symbolic text segment, and a second digital image is captured after recording the human speech segment but before the digital camera is powered off, then the symbolic text segment is associated with the second digital image; and (3) in all other cases, the symbolic text is associated with the digital image most recently captured before the recording of the human speech segment corresponding to the symbolic text segment. - View Dependent Claims (10)
-
-
11. A method of recording information with digital images, comprising the steps of:
-
capturing a plurality of digital images of respective objects of interest with optical sensing apparatus of a digital camera; recording a plurality of discrete segments of human speech of a user in said digital camera during a plurality of respective discrete time intervals occurring substantially contemporaneously with capturing of each respective digital image of said plurality of digital images; rendering each said segment of said plurality of discrete segments of human speech into a respective corresponding segment of symbolic text using speech reduction apparatus within said digital camera; automatically associating each respective digital image of said plurality of digital images with a respective corresponding segment of symbolic text rendered from a respective corresponding segment of human speech based on a temporal relationship between the respective discrete time interval during which the corresponding segment of human speech was recorded and the capturing of the respective digital image, and recording said association in a memory of said digital camera; uploading said at least one digital image and said at least one segment of symbolic text to a digital image formatting apparatus; and formatting said plurality of digital images and said plurality of segments of symbolic text for viewing by a user using said digital image formatting apparatus, wherein each said segment of symbolic text is formatted for viewing in a human readable form associated with its corresponding digital image; wherein said step of automatically associating each respective digital image of said plurality of digital images with a respective corresponding segment of symbolic text comprises automatically associating according to all of the following association priorities; (1) if a first digital image is captured during the recording of a human speech segment corresponding to the symbolic text segment, the symbolic text segment is associated with the first digital image; (2) if no digital image is captured from a time the digital camera is powered on until the end of the recoding of the human speech segment corresponding to the symbolic text segment, and a second digital image is captured after recording the human speech segment but before the digital camera is powered off, then the symbolic text segment is associated with the second digital image; and (3) in all other cases, the symbolic text is associated with the digital image most recently captured before the recording of the human speech segment corresponding to the symbolic text segment. - View Dependent Claims (12, 13, 14, 15)
-
Specification