×

AUTOMATIC SYSTEM FOR TEMPORAL ALIGNMENT OF MUSIC AUDIO SIGNAL WITH LYRICS

  • US 20080097754A1
  • Filed: 08/07/2007
  • Published: 04/24/2008
  • Est. Priority Date: 10/24/2006
  • Status: Active Grant
First Claim
Patent Images

1. An automatic system for temporal alignment between music audio signal and lyrics, comprising:

  • dominant sound audio signal extraction means for extracting, from a music audio signal of music including vocals and accompaniment sounds, a dominant sound audio signal of the most dominant sound including the vocal at each time,vocal-section feature extraction means for extracting a vocal-section feature available to estimate a vocal section which includes the vocal and a non-vocal section which does not include the vocal, from the dominant sound audio signal at each time,vocal section estimation means for estimating the vocal section and the non-vocal section, based on a plurality of the vocal-section features and outputting information on the vocal section and the non-vocal section,temporal-alignment feature extraction means for extracting a temporal-alignment feature suitable to make temporal alignment between lyrics of the vocal and the music audio signal, from the dominant sound audio signal at each time,phoneme network storage means for storing a phoneme network constituted from a plurality of phonemes and short pauses in respect of lyrics in music corresponding to the music audio signal, andalignment means for performing an alignment operation that makes temporal alignment between the plurality of phonemes in the phoneme network and the dominant sound audio signals, the alignment means being provided with a phone model for singing voice that estimates a phoneme corresponding to the temporal-alignment feature, based on the temporal-alignment feature, whereinthe alignment means receives the temporal-alignment feature outputted from the temporal-alignment feature extraction means, the information on the vocal section and the non-vocal section, and the phoneme network, and performs the alignment operation on condition that no phoneme exists at least in the non-vocal section.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×