Automatic tag extraction from audio annotated photos
First Claim
Patent Images
1. A method comprising:
- receiving, by a server computer, an image file captured by a client device, the image file comprising an associated audio component, the audio component spoken by a user of the client device as a tag of the image file;
determining, by the server computer, metadata associated with the image file;
identifying, by the server computer, a dictionary of potential textual tags from the metadata;
determining, by the server computer, a textual tag from the audio component using the dictionary of potential textual tags in conjunction with speech-to-text technology; and
associating, by the server computer, the textual tag with the image file as additional metadata.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for assigning one or more tags to an image file. In one aspect, a server computer receives an image file captured by a client device. In one embodiment, the image file includes an audio component embedded therein by the client device, where the audio component was spoken by a user of the client device as a tag of the image file. The server computer determines metadata associated with the image file and identifies a dictionary of potential textual tags from the metadata. The server computer determines a textual tag from the audio component and from the dictionary of potential textual tags. The server computer then associates the textual tag with the image file as additional metadata.
295 Citations
23 Claims
-
1. A method comprising:
-
receiving, by a server computer, an image file captured by a client device, the image file comprising an associated audio component, the audio component spoken by a user of the client device as a tag of the image file; determining, by the server computer, metadata associated with the image file; identifying, by the server computer, a dictionary of potential textual tags from the metadata; determining, by the server computer, a textual tag from the audio component using the dictionary of potential textual tags in conjunction with speech-to-text technology; and associating, by the server computer, the textual tag with the image file as additional metadata. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computing device comprising:
-
a processor; a storage medium for tangibly storing thereon program logic for execution by the processor, the program logic comprising; receiving logic executed by the processor for receiving an image file captured by a client device, the image file comprising an associated audio component, the audio component spoken by a user of the client device as a tag of the image file; metadata determining logic executed by the processor for determining metadata associated with the image file; identifying logic executed by the processor for identifying a dictionary of potential textual tags from the metadata; tag determining logic executed by the processor for determining a textual tag from the audio component using the dictionary of potential textual tags in conjunction with speech-to-text technology; and associating logic executed by the processor for associating the textual tag with the image file as additional metadata. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A non-transitory computer readable storage medium tangibly storing computer program instructions capable of being executed by a computer processor, the computer program instructions defining the steps of:
-
receiving, by the computer processor, an image file captured by a client device, the image file comprising an associated audio component, the audio component spoken by a user of the client device as a tag of the image file; determining, by the computer processor, metadata associated with the image file; identifying, by the computer processor, a dictionary of potential textual tags from the metadata; determining, by the computer processor, a textual tag from the audio component using the dictionary of potential textual tags in conjunction with speech-to-text technology; and associating, by the computer processor, the textual tag with the image file as additional metadata.
-
Specification