Audio tagging

US 9,304,657 B2
Filed: 06/23/2014
Issued: 04/05/2016
Est. Priority Date: 12/31/2013
Status: Active Grant

First Claim

Patent Images

1. A method implemented by a data processing apparatus, the method comprising:

obtaining, at one or more processors, an audio message associated with one or more image files, wherein the obtaining comprises;

detecting that a first image file is being displayed on a device associated with a user,determining a first period of time when the first image file is displayed on the device associated with the user, andtime stamping the obtained audio message;

processing, at the one or more processors, the audio message using speech recognition technology to detect a text component of the audio message;

determining, at the one or more processors, one or more textual tags for the one or more image files based on the detected text component, wherein the determining comprises determining a first portion of the detected text component corresponding to the first period of time using the time stamps of the obtained audio message and identifying a first set of the one or more textual tags that were determined based on the first portion of the detected text component; and

assigning, at the one or more processors, the one or more textual tags to the one or more image files, wherein the assigning comprises assigning one or more of the textual tags from the first set of the one or more textual tags to the first image file.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various embodiments are provided for enabling audio tagging of image files. The audio messages are obtained by the system, usually by recording an audio message from a user, and then converted into a textual tag, using speech recognition technology. In some implementations semantic analysis of text component of these massages is performed. In some implementations the textual tags are then propagated to other image files associated with the user.

Citations

27 Claims

1. A method implemented by a data processing apparatus, the method comprising:
- obtaining, at one or more processors, an audio message associated with one or more image files, wherein the obtaining comprises;
  
  detecting that a first image file is being displayed on a device associated with a user,determining a first period of time when the first image file is displayed on the device associated with the user, andtime stamping the obtained audio message;
  
  processing, at the one or more processors, the audio message using speech recognition technology to detect a text component of the audio message;
  
  determining, at the one or more processors, one or more textual tags for the one or more image files based on the detected text component, wherein the determining comprises determining a first portion of the detected text component corresponding to the first period of time using the time stamps of the obtained audio message and identifying a first set of the one or more textual tags that were determined based on the first portion of the detected text component; and
  
  assigning, at the one or more processors, the one or more textual tags to the one or more image files, wherein the assigning comprises assigning one or more of the textual tags from the first set of the one or more textual tags to the first image file.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the determining of the one or more textual tags comprises performing semantic analysis of the text component.
  - 3. The method of claim 2, wherein the semantic analysis of the text component comprises identifying one or more semantic classes for one or more portions of the detected text component;
    - performing semantic clustering of the portions of the detected text components; and
      
      wherein the determining one or more textual tags for the one or more image files is at least partially based on the semantic clustering of the portions of the detected text.
  - 4. The method of claim 1, wherein the one or more image files are from a plurality of image files associated with a user, the method further comprising:
    - assigning the one or more textual tags to a second image file from the plurality of image files associated with the user based on a comparison of one or more properties of the one or more image files and the second image file.
  - 5. The method of claim 4, wherein the one or more properties of the one or more image files and the second image file are selected from the following group:
    - file name, file location, file metadata, file creation date, file size, geographical location of a place where the image was captured, and file image analysis results.
  - 6. The method of claim 1, wherein the one or more image files are digital photographs.
  - 7. The method of claim 1, wherein the one or more image files are digital video files.
  - 8. The method of claim 1, wherein the determining the one or more textual tags comprises selecting the one or more textual tags from a tag library.
  - 9. The method of claim 1, wherein the assigning the one or more textual tags to the one or more image files comprises assigning the one or more textual tags to a portion of an image or group of images in the one or more image files.

10. A system comprising:
- a machine-readable storage device having instructions stored thereon; and
  
  a data processing apparatus in communication with the machine-readable storage device and operable to execute the instructions to perform operations comprising;
  
  obtaining an audio message associated with one or more image files, wherein the obtaining comprises detecting that a first image file is being displayed on a device associated with a user, determining a first period of time when the first image file is displayed on the device associated with the user, and time stamping the obtained audio message;
  
  processing the audio message using speech recognition technology to detect a text component of the audio message;
  
  determining one or more textual tags for the one or more image files based on the detected text component, wherein the determining comprises determining a first portion of the detected text component corresponding to the first period of time using the time stamps of the obtained audio message and identifying a first set of the one or more textual tags that were determined based on the first portion of the detected text component; and
  
  assigning the one or more textual tags to the one or more image files, wherein the assigning comprises assigning one or more of the textual tags from the first set of the one or more textual tags to the first image file.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The system of claim 10, wherein the determining of the one or more textual tags comprises performing semantic analysis of the text component.
  - 12. The system of claim 11, wherein the semantic analysis of the text component comprises identifying one or more semantic classes for one or more portions of the detected text component;
    - performing semantic clustering of the portions of the detected text components; and
      
      wherein the determining one or more textual tags for the one or more image files is at least partially based on the semantic clustering of the portions of the detected text.
  - 13. The system of claim 10, wherein the one or more image files are from a plurality of image files associated with a user, the method further comprising:
    - assigning the one or more textual tags to a second image file from the plurality of image files associated with the user based on a comparison of one or more properties of the one or more image files and the second image file.
  - 14. The system of claim 13, wherein the one or more properties of the one or more image files and the second image file are selected from the following group:
    - file name, file location, file metadata, file creation date, file size, geographical location of a place where the image was captured, and file image analysis results.
  - 15. The system of claim 10, wherein the one or more image files are digital photographs.
  - 16. The system of claim 10, wherein the one or more image files are digital video files.
  - 17. The system of claim 10, wherein the determining the one or more textual tags comprises selecting the one or more textual tags from a tag library.
  - 18. The system of claim 10, wherein the assigning the one or more textual tags to the one or more image files comprises assigning the one or more textual tags to a portion of an image or group of images in the one or more image files.

19. A storage device having instructions stored thereon that, when executed by a data processing apparatus, cause the data processing apparatus to perform operations comprising:
- obtaining an audio message associated with one or more image files, wherein the obtaining comprises detecting that a first image file is being displayed on a device associated with a user, determining a first period of time when the first image file is displayed on the device associated with the user, and time stamping the obtained audio message;
  
  processing the audio message using speech recognition technology to detect a text component of the audio message;
  
  determining one or more textual tags for the one or more image files based on the detected text component, wherein the determining comprises determining a first portion of the detected text component corresponding to the first period of time using the time stamps of the obtained audio message and identifying a first set of the one or more textual tags that were determined based on the first portion of the detected text component; and
  
  assigning the one or more textual tags to the one or more image files, wherein the assigning comprises assigning one or more of the textual tags from the first set of the one or more textual tags to the first image file.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. The storage device of claim 19, wherein the determining of the one or more textual tags comprises performing semantic analysis of the text component.
  - 21. The storage device of claim 20, wherein the semantic analysis of the text component comprises identifying one or more semantic classes for one or more portions of the detected text component;
    - performing semantic clustering of the portions of the detected text components; and
      
      wherein the determining one or more textual tags for the one or more image files is at least partially based on the semantic clustering of the portions of the detected text.
  - 22. The storage device of claim 19, wherein the one or more image files are from a plurality of image files associated with a user, the method further comprising:
    - assigning the one or more textual tags to a second image file from the plurality of image files associated with the user based on a comparison of one or more properties of the one or more image files and the second image file.
  - 23. The storage device of claim 22, wherein the one or more properties of the one or more image files and the second image file are selected from the following group:
    - file name, file location, file metadata, file creation date, file size, geographical location of a place where the image was captured, and file image analysis results.
  - 24. The storage device of claim 19, wherein the one or more image files are digital photographs.
  - 25. The storage device of claim 19, wherein the one or more image files are digital video files.
  - 26. The storage device of claim 19, wherein the determining the one or more textual tags comprises selecting the one or more textual tags from a tag library.
  - 27. The storage device of claim 19, wherein the assigning the one or more textual tags to the one or more image files comprises assigning the one or more textual tags to a portion of an image or group of images in the one or more image files.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Visier Solutions Inc. (Visier, Inc.)
Original Assignee
ABBYY Development LLC
Inventors
Yan, David, Anisimovich, Konstantin
Primary Examiner(s)
VO, HUYEN X

Application Number

US14/311,851
Publication Number

US 20150187353A1
Time in Patent Office

652 Days
Field of Search

704 1- 10, 704/251, 704/255, 704/257, 704/235, 704/270, 704/270.1
US Class Current

1/1
CPC Class Codes

G06F 16/14   Details of searching files ...

G06F 16/156   Query results presentation

G06F 16/16   File or folder operations, ...

G06F 16/168   Details of user interfaces ...

G06F 16/2246   Trees, e.g. B+trees

G06F 16/24578   using ranking

G06F 16/248   Presentation of query results

G06F 16/26   Visual data mining; Browsin...

G06F 16/285   Clustering or classification

G06F 16/40   of multimedia data, e.g. sl...

G06F 16/483   using metadata automaticall...

G06F 16/5846   using extracted text

G06F 16/5866   using information manually ...

G06F 16/907   Retrieval characterised by ...

G06F 16/9535   Search customisation based ...

G06F 3/04817   using icons graphical or vi...

G06F 3/04842   Selection of displayed obje...

G06F 40/103   Formatting, i.e. changing o...

G06F 40/12   Use of codes for handling t...

G06Q 10/107   Computer-aided management o...

G10L 15/1815 : Semantic context, e.g. disa...

View All

Audio tagging

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Audio tagging

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links