Using speech to text functionality to create specific user generated content metadata for digital content files (eg images) during capture, review, and/or playback process
First Claim
1. A solution that allows for the capture of content metadata that compromises:
- a digital capture device that is capable of capturing and/or storing one or more forms of digital content a speech to text engine integrated within the digital capture device that converts the users spoken word to text a storage mechanism for the created content metadata text, where the text is stored and added to the intended content file(s) during, after, and/or before the content capture process
0 Assignments
0 Petitions
Accused Products
Abstract
A method for adding user defined metadata to digital files (eg images, video music, etc) is disclosed. The input method for the user-defined metadata consists of using speech to text conversion technology where the user speaks a description of the content, which is then included as metadata with the digital file that was intended. Through the invention described, the metadata is added to the appropriate metadata field(s) of the intended digital content file(s). The addition and editing of metadata can happen before, during or after the digital content capture and/or during the content review process. This functionality allows for a quick, intuitive and user friendly way for users to add specific self-generated metadata content to digital content files (eg digital images). Results include more efficient and enhanced sorting, storing and searching of digital content as well as attaching notes to better describe an image, akin to writing on the back of printed photos.
-
Citations
14 Claims
-
1. A solution that allows for the capture of content metadata that compromises:
-
a digital capture device that is capable of capturing and/or storing one or more forms of digital content a speech to text engine integrated within the digital capture device that converts the users spoken word to text a storage mechanism for the created content metadata text, where the text is stored and added to the intended content file(s) during, after, and/or before the content capture process - View Dependent Claims (2, 5, 6, 7, 8, 11)
-
-
3. Wherein the intent to generate the metadata is a descriptive interpretation of the content that is captured or will be captured and in the user'"'"'s desired words
-
4. Wherein the content metadata is captured using a speech to text engine to convert the users spoken word to text (eg ASCII)
Wherein the generated content metadata that is ultimately converted to text (eg ASCII) is added to the appropriate metadata fields of the image file per the Exif 2.2 specification and/or other standard or non-standard implementations.
-
9. The adding of the captured metadata to the image file, once the metadata has been converted to text (ASCII or other)
i. Wherein the metadata is added per one of the following methods: -
1. The Exif (Exchangeable Image file format) specifications from JEITA (Japan Electronics and Information Technology Industries Association) 2. Dig35 specification from the Digital Imaging Group 3. Flashpix of I3A (International Imaging Industry Association) 4. Any proprietary or non-standard means developed by a computer software company or individual 5. Any proprietary or non-standard means implemented by manufacturers of Digital Image capture devices.
-
-
10. The user will have the option through the previously described user interface to add metadata to different categories per the above mentioned methods
i. Wherein, the user can choose the title of the image ii. Wherein the user can add an image description iii. Wherein the user can add the author of the image iv. Wherein the user can add metadata to any number of metadata fields that are in the spirit of content metadata.
-
12. A software application on a personal computer that utilizes speech to text functionality, which takes the users spoken words and through the speech to text engine outputs text (eg. ASCII), then through an interface(s) with the desired image file(s) adds the content metadata desired
i. Wherein the speech to text functionality is integrated into a software application, a web based application, or simply through a direct viewing of the image file through an image browsing application ii. Wherein the content fields where metadata is added are the content fields that relate to image description, user comments, title, author, artist, and the like.
-
13. The ability to add user generated metadata via the speech to text functionality relates to all digital content, including images (JPEG, TIFF, etc), Video clips (MPEG4, H.263, H.264, AVI, Quicktime, Windows media, etc), Music files (AAC, eAAC+, MP3, Windows Media, etc) and the like.
-
14. The ability to add user generated metadata via the speech to text functionality relates to all digital devices, including music players, video recorders, digital cameras, personal computers, DVD players, image viewers, and the like.
Specification