System and method for generating an audio thumbnail of an audio track
First Claim
Patent Images
1. A method for generating an audio thumbnail of an audio track, comprising:
- detecting a first content feature within the audio track;
extracting a first portion of the audio track corresponding to the detected first content feature;
detecting an occurrence of an increase in energy within the audio track;
extracting a second portion of the audio track corresponding to the detected increase in energy; and
combining the extracted first and second portions of the audio track into the audio thumbnail of the audio track, wherein the audio track is a song and wherein the first content feature is the start of a human voice within the song.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for generating an audio thumbnail of an audio track in which a first content feature, such as singing, is detected as a characteristic of an audio track. A predetermined length of the detected portion of the audio track corresponding to the first content feature is extracted from the audio track. A highlight of the audio track, such as a portion of the audio track having a sudden increase in temporal energy within the audio track, is detected; and a portion of the audio track corresponding to the highlight is extracted from the audio track. The two extracted portions of the audio track are combined as a thumbnail of the audio track.
41 Citations
17 Claims
-
1. A method for generating an audio thumbnail of an audio track, comprising:
-
detecting a first content feature within the audio track; extracting a first portion of the audio track corresponding to the detected first content feature; detecting an occurrence of an increase in energy within the audio track; extracting a second portion of the audio track corresponding to the detected increase in energy; and combining the extracted first and second portions of the audio track into the audio thumbnail of the audio track, wherein the audio track is a song and wherein the first content feature is the start of a human voice within the song. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for generating an audio thumbnail of an audio track, comprising:
-
detecting a first content feature within the audio track; mapping a pointer to the detected first content feature within the audio track; setting a first duration of time; detecting an occurrence of an increase in energy within the audio track; mapping a pointer to the detected occurrence of an increase in energy within the audio track; setting a second duration of time; and storing the pointer to the detected first content feature, the first duration of time, the pointer to the detected occurrence of an increase in energy, and the second duration of time as the audio thumbnail of the audio track, wherein the audio track is a song and wherein the first content feature is the start of a human voice within the song.
-
-
12. A method of identifying a representative excerpt of a song, comprising:
-
processing the song to detect a target point in the song that is at least one of (1) a start of singing within the song and (2) a point at which occurs a sudden increase in temporal energy within the song, with a portion of the song immediately following the sudden increase corresponding to singing; designating the representative excerpt for the song by defining a starting point for the representative excerpt based on the target point wherein the representative excerpt includes two portions, a first portion having a first starting point based on the start of singing within the song and a second portion having a second starting point based on the point at which occurs the sudden increase in temporal energy within the song; and storing at least one of the representative excerpt and a pointer to the representative excerpt, together with information for corresponding other representative excerpts of other songs, to facilitate efficient user browsing of the representative excerpt and said other representative excerpts.
-
-
13. A method of identifying a representative excerpt of a song, comprising:
-
processing the song to detect a target point in the song that is at least one of (1) a start of singing within the song and (2) a point at which occurs a sudden increase in temporal energy within the song, with a portion of the song immediately following the sudden increase corresponding to singing; designating the representative excrept for the song by defining a starting point for the representative excerpt based on the target point wherein the representative excerpt includes a portion having a first starting point based on the point at which occurs the sudden increase in temporal energy within the song, wherein the point at which occurs the sudden increase in temporal energy within the song corresponds to a greatest increase in temporal energy in the song where the portion of the song immediately following the increase corresponds to singing; and storing at least one of the representative excerpt and a pointer to the representative excerpt, together with information for corresponding other representative excerpts of other songs, to facilitate efficient user browsing the representative excerpt and said other representative excerpts.
-
-
14. A computer-readable medium encoded with computer executable instructions for identifying a representative excerpt of a song, said computer executable instructions comprising:
-
processing the song to detect a target point in the song that is at least one of (1) a start of singing within the song and (2) a point at which occurs a sudden increase in temporal energy within the song, with a portion of the song immediately following the sudden increase corresponding to singing; designating the representative excerpt for the song by defining a starting point for the representative excerpt based on the target point wherein the representative excerpt includes two portions, a first portion having a first starting point based on the start of singing within the song and a second portion having a second starting point based on the point at which occurs the sudden increase in temporal energy within the song; and storing at least one of the representative excerpt and a pointer to the representative excerpt, together with information for corresponding other representative excerpts of other songs, to facilitate efficient user browsing of the representative excerpt and said other representative excerpts.
-
-
15. A computer-readable medium encoded with computer executable instructions for identifying a representative excerpt of a song, said computer executable instructions comprising:
-
processing the song to detect a target point in the song that is at least one of (1) a start of singing within the song and (2) a point at which occurs a sudden increase in temporal energy within the song, with a portion of the song immediately following the sudden increase corresponding to singing; designating the representative excerpt for the song by defining a starting point for the representative excerpt based on the target point wherein the representative excerpt includes a portion having a first starting point based on the point at which occurs the sudden increase in temporal energy within the song, wherein the point at which occurs the sudden increase in temporal energy within the song corresponds to a greatest increase in temporal energy in the song where the portion of the song immediately following the increase corresponds to singing; and storing at least one of the representative excerpt and a pointer to the representative excerpt, together with information for corresponding other representative excerpts of other songs, to facilitate efficient user browsing of the representative excerpt and said other representative excerpts.
-
-
16. A computer-readable medium encoded with computer executable instructions for generating an audio thumbnail of an audio track, said computer executable instructions comprising:
-
detecting a first content feature within the audio track; extracting a first portion of the audio track corresponding to the detected first content feature; detecting an occurrence of an increase in energy within the audio track; extracting a second portion of the audio track corresponding to the detected increase in energy; and combining the extracted first and second portions of the audio track into the audio thumbnail of the audio track, wherein the audio track is a song and wherein the first content feature is the start of a human voice within the song. - View Dependent Claims (17)
-
Specification