System and method for generating an audio thumbnail of an audio track

US 7,386,357 B2
Filed: 09/30/2002
Issued: 06/10/2008
Est. Priority Date: 09/30/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A method for generating an audio thumbnail of an audio track, comprising:

detecting a first content feature within the audio track;

extracting a first portion of the audio track corresponding to the detected first content feature;

detecting an occurrence of an increase in energy within the audio track;

extracting a second portion of the audio track corresponding to the detected increase in energy; and

combining the extracted first and second portions of the audio track into the audio thumbnail of the audio track, wherein the audio track is a song and wherein the first content feature is the start of a human voice within the song.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for generating an audio thumbnail of an audio track in which a first content feature, such as singing, is detected as a characteristic of an audio track. A predetermined length of the detected portion of the audio track corresponding to the first content feature is extracted from the audio track. A highlight of the audio track, such as a portion of the audio track having a sudden increase in temporal energy within the audio track, is detected; and a portion of the audio track corresponding to the highlight is extracted from the audio track. The two extracted portions of the audio track are combined as a thumbnail of the audio track.

41 Citations

View as Search Results

17 Claims

1. A method for generating an audio thumbnail of an audio track, comprising:
- detecting a first content feature within the audio track;
  
  extracting a first portion of the audio track corresponding to the detected first content feature;
  
  detecting an occurrence of an increase in energy within the audio track;
  
  extracting a second portion of the audio track corresponding to the detected increase in energy; and
  
  combining the extracted first and second portions of the audio track into the audio thumbnail of the audio track, wherein the audio track is a song and wherein the first content feature is the start of a human voice within the song.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein the audio track comprises singing and instrumental music.
  - 3. The method according to claim 1, further comprising:
    - storing the audio thumbnail; and
      
      playing the stored audio thumbnail to preview the song.
  - 4. The method according to claim 1, wherein detecting the first content feature includes application of at least one of voice detection methods of short-time average zero crossing rate, harmonic coefficient spectral flux, filter analysis, and short-time energy function.
  - 5. The method according to claim 1, further comprising:
    - selecting a first duration of time, wherein the first portion of the audio track has a duration corresponding to the first duration of time; and
      
      selecting a second duration of time, wherein the second portion of the audio track has a duration corresponding to the second duration of time.
  - 6. The method according to claim 1, wherein the increase in energy exceeds a predetermined threshold.
  - 7. The method according to claim 1, wherein detecting an occurrence of an increase in energy comprises comparing an increase in the temporal energy between two adjacent portions of the audio track with a predetermined threshold.
  - 8. The method according to claim 7, wherein detecting the occurrence of the increase in energy comprises:
    - computing a temporal energy envelope for the audio track;
      
      mapping two adjacent windows on the audio track;
      
      detecting locations on the audio track corresponding to human sound;
      
      for each detected location of human sound on the audio track, comparing the computed temporal energy of the window corresponding to the detected location of the human sound with the computed temporal energy of the prior adjacent window;
      
      determining the portions of the audio track whose temporal energy increase over the energy of the prior adjacent portion of the audio track exceeds the predetermined threshold; and
      
      selecting as the second portion of the audio track the determined portion of the audio track having the greatest increase in temporal energy.
  - 9. The method according to claim 8, wherein the locations corresponding to human sound are detected by applying a zero crossing rate algorithm, and wherein if the application of the zero crossing rate algorithm produces inconclusive results, applying one or more of a harmonic coefficient algorithm, a spectral flux algorithm, an energy function algorithm, or a filter analysis algorithm.
  - 10. The method according to claim 1, whereindetecting the first content feature within the audio track detects first occurrence of singing on the audio track;
    - extracting the first portion of the audio track corresponding to the detected first content feature extracts a predetermined portion of the audio track corresponding to the start of the detected first occurrence of singing;
      
      detecting the occurrence of an increase in energy within the audio track detects a second occurrence of human sound on the audio track as characterized by a greatest increase in temporal energy within human sound portions on the audio track;
      
      extracting the second portion of the audio track corresponding to the detected increase in energy extracts a predetermined portion of the audio track corresponding to the start of the detected second occurrence of human sound; and
      
      combining the enacted first and second portions of the audio track into the audio thumbnail of the audio track combines the extracted first occurrence of singing and second occurrence of human sound as the audio thumbnail of the song on the audio track.

11. A method for generating an audio thumbnail of an audio track, comprising:
- detecting a first content feature within the audio track;
  
  mapping a pointer to the detected first content feature within the audio track;
  
  setting a first duration of time;
  
  detecting an occurrence of an increase in energy within the audio track;
  
  mapping a pointer to the detected occurrence of an increase in energy within the audio track;
  
  setting a second duration of time; and
  
  storing the pointer to the detected first content feature, the first duration of time, the pointer to the detected occurrence of an increase in energy, and the second duration of time as the audio thumbnail of the audio track, wherein the audio track is a song and wherein the first content feature is the start of a human voice within the song.

12. A method of identifying a representative excerpt of a song, comprising:
- processing the song to detect a target point in the song that is at least one of (1) a start of singing within the song and (2) a point at which occurs a sudden increase in temporal energy within the song, with a portion of the song immediately following the sudden increase corresponding to singing;
  
  designating the representative excerpt for the song by defining a starting point for the representative excerpt based on the target point wherein the representative excerpt includes two portions, a first portion having a first starting point based on the start of singing within the song and a second portion having a second starting point based on the point at which occurs the sudden increase in temporal energy within the song; and
  
  storing at least one of the representative excerpt and a pointer to the representative excerpt, together with information for corresponding other representative excerpts of other songs, to facilitate efficient user browsing of the representative excerpt and said other representative excerpts.

13. A method of identifying a representative excerpt of a song, comprising:
- processing the song to detect a target point in the song that is at least one of (1) a start of singing within the song and (2) a point at which occurs a sudden increase in temporal energy within the song, with a portion of the song immediately following the sudden increase corresponding to singing;
  
  designating the representative excrept for the song by defining a starting point for the representative excerpt based on the target point wherein the representative excerpt includes a portion having a first starting point based on the point at which occurs the sudden increase in temporal energy within the song, wherein the point at which occurs the sudden increase in temporal energy within the song corresponds to a greatest increase in temporal energy in the song where the portion of the song immediately following the increase corresponds to singing; and
  
  storing at least one of the representative excerpt and a pointer to the representative excerpt, together with information for corresponding other representative excerpts of other songs, to facilitate efficient user browsing the representative excerpt and said other representative excerpts.

14. A computer-readable medium encoded with computer executable instructions for identifying a representative excerpt of a song, said computer executable instructions comprising:
- processing the song to detect a target point in the song that is at least one of (1) a start of singing within the song and (2) a point at which occurs a sudden increase in temporal energy within the song, with a portion of the song immediately following the sudden increase corresponding to singing;
  
  designating the representative excerpt for the song by defining a starting point for the representative excerpt based on the target point wherein the representative excerpt includes two portions, a first portion having a first starting point based on the start of singing within the song and a second portion having a second starting point based on the point at which occurs the sudden increase in temporal energy within the song; and
  
  storing at least one of the representative excerpt and a pointer to the representative excerpt, together with information for corresponding other representative excerpts of other songs, to facilitate efficient user browsing of the representative excerpt and said other representative excerpts.

15. A computer-readable medium encoded with computer executable instructions for identifying a representative excerpt of a song, said computer executable instructions comprising:
- processing the song to detect a target point in the song that is at least one of (1) a start of singing within the song and (2) a point at which occurs a sudden increase in temporal energy within the song, with a portion of the song immediately following the sudden increase corresponding to singing;
  
  designating the representative excerpt for the song by defining a starting point for the representative excerpt based on the target point wherein the representative excerpt includes a portion having a first starting point based on the point at which occurs the sudden increase in temporal energy within the song, wherein the point at which occurs the sudden increase in temporal energy within the song corresponds to a greatest increase in temporal energy in the song where the portion of the song immediately following the increase corresponds to singing; and
  
  storing at least one of the representative excerpt and a pointer to the representative excerpt, together with information for corresponding other representative excerpts of other songs, to facilitate efficient user browsing of the representative excerpt and said other representative excerpts.

16. A computer-readable medium encoded with computer executable instructions for generating an audio thumbnail of an audio track, said computer executable instructions comprising:
- detecting a first content feature within the audio track;
  
  extracting a first portion of the audio track corresponding to the detected first content feature;
  
  detecting an occurrence of an increase in energy within the audio track;
  
  extracting a second portion of the audio track corresponding to the detected increase in energy; and
  
  combining the extracted first and second portions of the audio track into the audio thumbnail of the audio track, wherein the audio track is a song and wherein the first content feature is the start of a human voice within the song.
- View Dependent Claims (17)
- - 17. A computer-readable medium according to claim 16, wherein detecting an occurrence of an increase in energy comprises comparing an increase in the temporal energy between two adjacent portions of the audio track with a predetermined threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Inventors
Zhang, Tong
Primary Examiner(s)
Tran; Sinh
Assistant Examiner(s)
Briney, III; Walter F

Application Number

US10/259,572
Publication Number

US 20040064209A1
Time in Patent Office

2,080 Days
Field of Search

846/16, 846/54, 846/81, 704/255, 704/500, 369/4, 700/94
US Class Current

700/94
CPC Class Codes

G10H 2210/031   Musical analysis, i.e. isol...

G10L 25/78   Detection of presence or ab...

G11B 27/036   Insert-editing

G11B 27/28   by using information signal...

System and method for generating an audio thumbnail of an audio track

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for generating an audio thumbnail of an audio track

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links