Automatic tag extraction from audio annotated photos

US 8,768,693 B2
Filed: 05/31/2012
Issued: 07/01/2014
Est. Priority Date: 05/31/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

receiving, by a server computer, an image file captured by a client device, the image file comprising an associated audio component, the audio component spoken by a user of the client device as a tag of the image file;

determining, by the server computer, metadata associated with the image file;

identifying, by the server computer, a dictionary of potential textual tags from the metadata;

determining, by the server computer, a textual tag from the audio component using the dictionary of potential textual tags in conjunction with speech-to-text technology; and

associating, by the server computer, the textual tag with the image file as additional metadata.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for assigning one or more tags to an image file. In one aspect, a server computer receives an image file captured by a client device. In one embodiment, the image file includes an audio component embedded therein by the client device, where the audio component was spoken by a user of the client device as a tag of the image file. The server computer determines metadata associated with the image file and identifies a dictionary of potential textual tags from the metadata. The server computer determines a textual tag from the audio component and from the dictionary of potential textual tags. The server computer then associates the textual tag with the image file as additional metadata.

295 Citations

23 Claims

1. A method comprising:
- receiving, by a server computer, an image file captured by a client device, the image file comprising an associated audio component, the audio component spoken by a user of the client device as a tag of the image file;
  
  determining, by the server computer, metadata associated with the image file;
  
  identifying, by the server computer, a dictionary of potential textual tags from the metadata;
  
  determining, by the server computer, a textual tag from the audio component using the dictionary of potential textual tags in conjunction with speech-to-text technology; and
  
  associating, by the server computer, the textual tag with the image file as additional metadata.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, further comprising communicating, by the server computer to a content server, the textual tag with the image file.
  - 3. The method of claim 1, further comprising storing, by the server computer, the image file and the textual tag.
  - 4. The method of claim 1, further comprising enabling, by the server computer, the user to perform operations related to the image file using the textual tag.
  - 5. The method of claim 1, further comprising communicating, by the server computer, the textual tag and the image file to the client device for display and for enabling the user to approve, reject, and edit the textual tags.
  - 6. The method of claim 4, wherein the enabling of the user to perform operations related to the image file further comprises enabling sharing of the image file with other users.
  - 7. The method of claim 4, wherein the enabling of the user to perform operations related to the image file further comprisesreceiving, by the server computer, a search term from the client device;
    - searching, by the server computer, the textual tags for the search term; and
      
      communicating, by the server computer, the image file associated with the textual tag to the client device.
  - 8. The method of claim 1, wherein the identifying of the dictionary of potential textual tags further comprises receiving, from a content server, a plurality of previously stored tags.
  - 9. The method of claim 8, wherein the identifying of the dictionary of potential textual tags further comprises determining the dictionary of potential textual tags from the metadata and from the plurality of previously stored tags.
  - 10. The method of claim 1, further comprising communicating, by the server computer, an advertisement based on the textual tag.
  - 11. The method of claim 1, further comprising enabling the audio component to be played.
  - 12. The method of claim 1, further comprising enabling playing of the audio component at one or more of the client device and a digital photograph web site.

13. A computing device comprising:
- a processor;
  
  a storage medium for tangibly storing thereon program logic for execution by the processor, the program logic comprising;
  
  receiving logic executed by the processor for receiving an image file captured by a client device, the image file comprising an associated audio component, the audio component spoken by a user of the client device as a tag of the image file;
  
  metadata determining logic executed by the processor for determining metadata associated with the image file;
  
  identifying logic executed by the processor for identifying a dictionary of potential textual tags from the metadata;
  
  tag determining logic executed by the processor for determining a textual tag from the audio component using the dictionary of potential textual tags in conjunction with speech-to-text technology; and
  
  associating logic executed by the processor for associating the textual tag with the image file as additional metadata.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. The computing device of claim 13, further comprising communicating logic executed by the processor for communicating, to a content server, the textual tag with the image file.
  - 15. The computing device of claim 13, further comprising storing logic executed by the processor for storing the image file and the textual tag.
  - 16. The computing device of claim 13, further comprising enabling logic executed by the processor for enabling the user to perform operations related to the image file using the textual tag.
  - 17. The computing device of claim 13, further comprising communicating logic executed by the processor for communicating the textual tag and the image file to the client device for display.
  - 18. The computing device of claim 16, wherein the enabling logic further comprises sharing logic executed by the processor for enabling sharing of the image file with other users.
  - 19. The computing device of claim 16, wherein the enabling logic further comprisesreceiving logic executed by the processor for receiving a search term from the client device;
    - searching logic executed by the processor for searching the textual tags for the search term; and
      
      communicating logic executed by the processor for communicating the image file associated with the textual tag to the client device.
  - 20. The computing device of claim 13, wherein the identifying logic further comprises receiving logic executed by the processor for receiving, from a content server, a plurality of previously stored tags.
  - 21. The computing device of claim 20, wherein the identifying logic further comprises determining logic executed by the processor for determining the dictionary of potential textual tags from the metadata and from the plurality of previously stored tags.
  - 22. The computing device of claim 13, further comprising communicating, by the processor, an advertisement based on the textual tag.

23. A non-transitory computer readable storage medium tangibly storing computer program instructions capable of being executed by a computer processor, the computer program instructions defining the steps of:
- receiving, by the computer processor, an image file captured by a client device, the image file comprising an associated audio component, the audio component spoken by a user of the client device as a tag of the image file;
  
  determining, by the computer processor, metadata associated with the image file;
  
  identifying, by the computer processor, a dictionary of potential textual tags from the metadata;
  
  determining, by the computer processor, a textual tag from the audio component using the dictionary of potential textual tags in conjunction with speech-to-text technology; and
  
  associating, by the computer processor, the textual tag with the image file as additional metadata.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
R2 Solutions LLC (Acacia Research Corporation)
Original Assignee
Yahoo! Inc. (Apollo Global Management, Inc.)
Inventors
Somekh, Oren, Golbandi, Nadav, Katzir, Liran, Lempel, Ronny, Maarek, Yoelle
Primary Examiner(s)
Neway, Samuel G

Application Number

US13/485,159
Publication Number

US 20130325462A1
Time in Patent Office

761 Days
Field of Search

704231-257, 3482311-2319
US Class Current

704/230
CPC Class Codes

G06F 16/58   Retrieval characterised by ...

G06F 16/587   using geographical or spati...

G10L 15/26   Speech to text systems G10L...

Automatic tag extraction from audio annotated photos

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

295 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic tag extraction from audio annotated photos

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

295 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links