METHOD AND APPARATUS FOR ANNOTATING VIDEO CONTENT WITH METADATA GENERATED USING SPEECH RECOGNITION TECHNOLOGY
4 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content.
31 Citations
21 Claims
- 1. (canceled)
-
2. A computer-implemented method comprising:
-
providing a particular portion of a video for output; receiving an utterance while the particular portion of the video is being provided for output; obtaining, from an automated speech recognizer, a transcription of the utterance; generating video metadata based on the transcription; and updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata.
-
-
9. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; providing a particular portion of a video for output; receiving an utterance while the particular portion of the video is being provided for output; obtaining, from an automated speech recognizer, a transcription of the utterance; generating video metadata based on the transcription; and updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
providing a particular portion of a video for output; receiving an utterance while the particular portion of the video is being provided for output; obtaining, from an automated speech recognizer, a transcription of the utterance; generating video metadata based on the transcription; and updating a header of a frame of data that corresponds to the particular portion of the video, to include the video metadata. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification