Systems and methods for automatically suggesting media accompaniments based on identified media content
First Claim
1. A method, comprising:
- identifying, by a computing device, a video item comprising a plurality of frames, wherein the video item was recorded using a video camera;
analyzing the plurality of frames of the video item to identify visual objects in the video item;
determining a plurality of keywords representative of the visual objects in the video item;
determining weights for the plurality of keywords based on a frequency of appearance of corresponding visual objects in the analyzed plurality of frames, wherein a weight for a respective keyword of the plurality of keywords increases corresponding to a higher frequency of appearance of a corresponding visual object in the analyzed plurality of frames of the video item;
identifying an audio item that is related to one or more keywords of the plurality of keywords associated with higher weights than other keywords of the plurality of keywords, wherein identifying the audio item comprises;
providing a plurality of audio items that are related to the one or more keywords associated with higher weights than other keywords of the plurality of keywords for selection by a user, andreceiving an indication of a user selection of the audio item from the plurality of audio items; and
providing the audio item for playback, wherein the playback of the audio item is concurrent with playback of the video item.
2 Assignments
0 Petitions
Accused Products
Abstract
The disclosed technology includes automatically suggesting audio, video, or other media accompaniments to media content based on identified objects in the media content. Media content may include images, audio, video, or a combination. In one implementation, one or more images representative of the media content may be extracted. A visual search may be run across the images to identify objects or characteristics present in or associated with the media content. Keywords may be generated based on the identified objects and characteristics. The keywords may be used to determine suitable audio tracks to accompany the media content, for example by performing a search based on the keywords. The determined tracks may be presented to a user, or automatically arranged to match the media content. In another implementation, an aural search may be run across samples of the audio data to similarly identify objects and characteristics of the media content.
-
Citations
17 Claims
-
1. A method, comprising:
-
identifying, by a computing device, a video item comprising a plurality of frames, wherein the video item was recorded using a video camera; analyzing the plurality of frames of the video item to identify visual objects in the video item; determining a plurality of keywords representative of the visual objects in the video item; determining weights for the plurality of keywords based on a frequency of appearance of corresponding visual objects in the analyzed plurality of frames, wherein a weight for a respective keyword of the plurality of keywords increases corresponding to a higher frequency of appearance of a corresponding visual object in the analyzed plurality of frames of the video item; identifying an audio item that is related to one or more keywords of the plurality of keywords associated with higher weights than other keywords of the plurality of keywords, wherein identifying the audio item comprises; providing a plurality of audio items that are related to the one or more keywords associated with higher weights than other keywords of the plurality of keywords for selection by a user, and receiving an indication of a user selection of the audio item from the plurality of audio items; and providing the audio item for playback, wherein the playback of the audio item is concurrent with playback of the video item. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer-readable medium that stores instructions that, responsive to execution by a computing device, cause the computing device to perform operations comprising:
-
identifying, by the computing device, a video item comprising a plurality of frames; analyzing the plurality of frames of the video item to identify visual objects in the video item; determining a plurality of keywords representative of the visual objects in the video item; determining weights for the plurality of keywords based on a frequency of appearance of corresponding visual objects in the analyzed plurality of frames, wherein a weight for a respective keyword of the plurality of keywords increases corresponding to a higher frequency of appearance of a corresponding visual object in the analyzed plurality of frames of the video item; identifying an audio item that is related to one or more keywords of the plurality of keywords associated with higher weights than other keywords of the plurality of keywords, wherein identifying the audio item comprises; providing a plurality of audio items that are related to the one or more keywords associated with higher weights than other keywords of the plurality of keywords for selection by a user, and receiving an indication of a user selection of the audio item from the plurality of audio items; and providing the audio item for playback, wherein the playback of the audio item is concurrent with playback of the video item. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A system comprising:
-
a memory; and a computing device, coupled to the memory, the computing device to; identify a video item comprising a plurality of frames, wherein the video item was recorded using a video camera; analyze the plurality of frames of the video item to identify visual objects in the video item; determine a plurality of keywords representative of the visual objects in the video item; determine weights for the plurality of keywords based on a frequency of appearance of corresponding visual objects in the analyzed plurality of frames, wherein a weight for a respective keyword of the plurality of keywords increases corresponding to a higher frequency of appearance of a corresponding visual object in the analyzed plurality of frames of the video item; identify an audio item that is related to one or more keywords of the plurality of keywords associated with higher weights than other keywords of the plurality of keywords, wherein identifying the audio item comprises; providing a plurality of audio items that are related to the one or more keywords associated with higher weights than other keywords of the plurality of keywords for selection by a user, and receiving an indication of a user selection of the audio item from the plurality of audio items; and provide the audio item for playback, wherein the playback of the audio item is concurrent with playback of the video item. - View Dependent Claims (16, 17)
-
Specification