Audio-based annotation of video
First Claim
1. A computer-implemented method, comprising:
- obtaining a plurality of user-provided content items that include respective audio information and video information associated with a type of annotation item;
training a set of classifiers using the plurality of user-provided content items to determine a relationship between features of the plurality of user-provided content items and the type of annotation item, the set of classifiers configured to identify a customized list of a set of annotation items;
receiving a content item that includes audio information and video information, the audio information being time synchronized with the video information;
extracting the audio information from the content item;
determining an acoustic pattern that characterizes a portion of the audio information by performing audio analysis on the audio information, wherein the acoustic pattern is associated with a temporal location in the audio information;
determining a corresponding temporal location in the video information based at least in part on the temporal location of the acoustic pattern;
analyzing a portion of the video information at the corresponding temporal location using the set of classifiers to determine a set of matching vectors, individual matching vectors of the set of matching vectors including weighted annotation items from the customized list of the set of annotation items;
determining a weighted summation of the weighted annotation items of the set of matching vectors to generate a merged matching vector; and
selecting an annotation item from the customized list of the set of annotation items from the merged matching vector associated with a highest weight, wherein the annotation item characterizes the video information at the corresponding temporal location.
4 Assignments
0 Petitions
Accused Products
Abstract
A technique for determining annotation items associated with video information is described. During this annotation technique, a content item that includes audio information and the video information is received. For example, a file may be downloaded from a uniform resource locator. Then, the audio information is extracted from the content item, and the audio information is analyzed to determine features or descriptors that characterize the audio information. Note that the features may be determined solely by analyzing the audio information or may be determined by subsequent further analysis of at least some of the video information based on the analysis of the audio information (i.e., sequential or cascaded analysis). Next, annotation items or tags associated with the video information are determined based on the features.
-
Citations
21 Claims
-
1. A computer-implemented method, comprising:
-
obtaining a plurality of user-provided content items that include respective audio information and video information associated with a type of annotation item; training a set of classifiers using the plurality of user-provided content items to determine a relationship between features of the plurality of user-provided content items and the type of annotation item, the set of classifiers configured to identify a customized list of a set of annotation items; receiving a content item that includes audio information and video information, the audio information being time synchronized with the video information; extracting the audio information from the content item; determining an acoustic pattern that characterizes a portion of the audio information by performing audio analysis on the audio information, wherein the acoustic pattern is associated with a temporal location in the audio information; determining a corresponding temporal location in the video information based at least in part on the temporal location of the acoustic pattern; analyzing a portion of the video information at the corresponding temporal location using the set of classifiers to determine a set of matching vectors, individual matching vectors of the set of matching vectors including weighted annotation items from the customized list of the set of annotation items; determining a weighted summation of the weighted annotation items of the set of matching vectors to generate a merged matching vector; and selecting an annotation item from the customized list of the set of annotation items from the merged matching vector associated with a highest weight, wherein the annotation item characterizes the video information at the corresponding temporal location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, enable a computing device to:
-
obtain a plurality of user-provided content items that include respective audio information and video information associated with a type of annotation item; train a set of classifiers using the plurality of user-provided content items to determine a relationship between features of the plurality of user-provided content items and the type of annotation item, the set of classifiers configured to identify a customized list of a set of annotation items; receive a content item that includes audio information and video information, the audio information being time synchronized with the video information; extract the audio information from the content item; determine an acoustic pattern that characterizes a portion of the audio information by performing audio analysis on the audio information, wherein the acoustic pattern is associated with a temporal location in the audio information; determine a corresponding temporal location in the video information based at least in part on the temporal location of the acoustic pattern; analyze a portion of the video information at the corresponding temporal location using the set of classifiers to determine a set of matching vectors, individual matching vectors of the set of matching vectors including weighted annotation items from the customized list of the set of annotation items; determine a weighted summation of the weighted annotation items of the set of matching vectors to generate a merged matching vector; and select an annotation item from the customized list of the set of annotation items from the merged matching vector associated with a highest weight, wherein the annotation item characterizes the video information at the corresponding temporal location. - View Dependent Claims (18, 19, 20)
-
-
21. A computer system, comprising:
-
a processor; memory including instructions that, when executed by the processor, cause the computing system to; obtain a plurality of user-provided content items that include respective audio information and video information associated with a type of annotation item; train a set of classifiers using the plurality of user-provided content items to determine a relationship between features of the plurality of user-provided content items and the type of annotation item, the set of classifiers configured to identify a customized list of a set of annotation items; receive a content item that includes audio information and video information, the audio information being time synchronized with the video information; extract the audio information from the content item; determine an acoustic pattern that characterizes a portion of the audio information by performing audio analysis on the audio information, wherein the acoustic pattern is associated with a temporal location in the audio information; determine a corresponding temporal location in the video information based at least in part on the temporal location of the acoustic pattern; and analyze a portion of the video information at the corresponding temporal location using the set of classifiers to determine a set of matching vectors, individual matching vectors of the set of matching vectors including weighted annotation items from the customized list of the set of annotation items; determine a weighted summation of the weighted annotation items of the set of matching vectors to generate a merged matching vector; and select an annotation item from the customized list of the set of annotation items from the merged matching vector associated with a highest weight, wherein the annotation item characterizes the video information at the corresponding temporal location.
-
Specification