Using speech to text functionality to create specific user generated content metadata for digital content files (eg images) during capture, review, and/or playback process

US 20070250526A1
Filed: 04/24/2006
Published: 10/25/2007
Est. Priority Date: 04/24/2006
Status: Abandoned Application

First Claim

Patent Images

1. A solution that allows for the capture of content metadata that compromises:

a digital capture device that is capable of capturing and/or storing one or more forms of digital content a speech to text engine integrated within the digital capture device that converts the users spoken word to text a storage mechanism for the created content metadata text, where the text is stored and added to the intended content file(s) during, after, and/or before the content capture process

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for adding user defined metadata to digital files (eg images, video music, etc) is disclosed. The input method for the user-defined metadata consists of using speech to text conversion technology where the user speaks a description of the content, which is then included as metadata with the digital file that was intended. Through the invention described, the metadata is added to the appropriate metadata field(s) of the intended digital content file(s). The addition and editing of metadata can happen before, during or after the digital content capture and/or during the content review process. This functionality allows for a quick, intuitive and user friendly way for users to add specific self-generated metadata content to digital content files (eg digital images). Results include more efficient and enhanced sorting, storing and searching of digital content as well as attaching notes to better describe an image, akin to writing on the back of printed photos.

Citations

14 Claims

1. A solution that allows for the capture of content metadata that compromises:
- a digital capture device that is capable of capturing and/or storing one or more forms of digital content a speech to text engine integrated within the digital capture device that converts the users spoken word to text a storage mechanism for the created content metadata text, where the text is stored and added to the intended content file(s) during, after, and/or before the content capture process
- View Dependent Claims (2, 5, 6, 7, 8, 11)
- - 2. The system defined in claim 1, additionally compromising the ability for the user to purposely create a description and/or keywords to describe digital content, outside the process of capturing said content, where the content is purposefully created to function as content metadata data for the chosen content file(s)
  - 5. The system defined in claim 1, additionally compromising a user interface on the image capture device which facilitates the administration and selection of preferences and settings for the user to add and edit the metadata i. Wherein the interface to add metadata is integrated into the overall function and control of the device ii. Wherein the user can add metadata to images before, during and after the time of capture iii. Wherein the user can add metadata to images (or other content) while reviewing them on the device display iv. Wherein the ability to capture metadata can be turned on, off, or edited at any time v. Wherein the user can add different levels of metadata to single and also groups of images E.G. an overall metadata tag is selected to be added to a group of images where-in addition, the user can add additional metadata to each image individually
  - 6. The system defined in claim 1, additionally compromising a microphone on the device to capture and record the audio track, containing the users spoken word i. Wherein the microphone captures the spoken word and via analog to digital conversion, it is relayed to the speech to text engine where the conversion of the voice track to text format occurs ii. Wherein the audio track captured by the microphone will be converted to digital via an Analog to digital converter and/or software running on the device iii. Wherein the content metadata in text form is added to the intended digital file(s) as content metadata
  - 7. The system defined in claim 1, additionally compromising a method for the user to review and edit the metadata that has been associated with each image i. Wherein the user can view the keywords on the device'"'"'s display and/or listen to the keywords desired via the utilization of text to speech or via some other mechanism
  - 8. The system defined in claim 1, additionally compromising a method for the user to approve the metadata created
  - 11. The system defined in claim 1, additionally compromising a user interface for digital devices (eg. camera display) which allows the user to administer and control the speech to text functionality, to add, edit and delete metadata to images, or groups of images, as desired.

3. Wherein the intent to generate the metadata is a descriptive interpretation of the content that is captured or will be captured and in the user'"'"'s desired words

4. Wherein the content metadata is captured using a speech to text engine to convert the users spoken word to text (eg ASCII) Wherein the generated content metadata that is ultimately converted to text (eg ASCII) is added to the appropriate metadata fields of the image file per the Exif 2.2 specification and/or other standard or non-standard implementations.

9. The adding of the captured metadata to the image file, once the metadata has been converted to text (ASCII or other) i. Wherein the metadata is added per one of the following methods:
- 1. The Exif (Exchangeable Image file format) specifications from JEITA (Japan Electronics and Information Technology Industries Association) 2. Dig35 specification from the Digital Imaging Group 3. Flashpix of I3A (International Imaging Industry Association) 4. Any proprietary or non-standard means developed by a computer software company or individual 5. Any proprietary or non-standard means implemented by manufacturers of Digital Image capture devices.

10. The user will have the option through the previously described user interface to add metadata to different categories per the above mentioned methods i. Wherein, the user can choose the title of the image ii. Wherein the user can add an image description iii. Wherein the user can add the author of the image iv. Wherein the user can add metadata to any number of metadata fields that are in the spirit of content metadata.

12. A software application on a personal computer that utilizes speech to text functionality, which takes the users spoken words and through the speech to text engine outputs text (eg. ASCII), then through an interface(s) with the desired image file(s) adds the content metadata desired i. Wherein the speech to text functionality is integrated into a software application, a web based application, or simply through a direct viewing of the image file through an image browsing application ii. Wherein the content fields where metadata is added are the content fields that relate to image description, user comments, title, author, artist, and the like.

13. The ability to add user generated metadata via the speech to text functionality relates to all digital content, including images (JPEG, TIFF, etc), Video clips (MPEG4, H.263, H.264, AVI, Quicktime, Windows media, etc), Music files (AAC, eAAC+, MP3, Windows Media, etc) and the like.

14. The ability to add user generated metadata via the speech to text functionality relates to all digital devices, including music players, video recorders, digital cameras, personal computers, DVD players, image viewers, and the like.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Michael Hanna
Original Assignee
Michael Hanna
Inventors
Hanna, Michael

Application Number

US11/379,995
Publication Number

US 20070250526A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/48   Retrieval characterised by ...

G06F 16/58   Retrieval characterised by ...

G06F 16/68   Retrieval characterised by ...

G06F 16/78   Retrieval characterised by ...

Using speech to text functionality to create specific user generated content metadata for digital content files (eg images) during capture, review, and/or playback process

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Using speech to text functionality to create specific user generated content metadata for digital content files (eg images) during capture, review, and/or playback process

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links