×

Identification of the presence of speech in digital audio data

  • US 8,036,884 B2
  • Filed: 02/24/2005
  • Issued: 10/11/2011
  • Est. Priority Date: 02/26/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for causing an audio data processing apparatus to determine speech related audio data within a recording of digital audio data based on transitions between voiced and unvoiced sequences, the method comprising:

  • extracting, in the audio data processing apparatus, audio features from the recording of digital audio data at an analyzing apparatus;

    classifying, in the audio data processing apparatus, the recording of digital audio data based on the extracted audio features and with respect to one or more predetermined audio classes stored in an electronic memory of the apparatus;

    marking, in the audio data processing apparatus, at least a part of the recording of digital audio data classified as speech, wherein the extraction of at least one audio feature includes partitioning the recording of digital audio data into adjoining frames;

    defining, in the audio data processing apparatus and for each frame, a window being formed by a sequence of adjoining frames containing a frame under consideration;

    determining, in the audio data processing apparatus, for the frame under consideration, and at least one next frame of the window, a spectral-emphasis-value which is related to a frequency distribution contained in the digital audio data of a respective frame and which represents a frequency at which a main audio energy is contained in the respective frame, the main audio energy indicating a major part of the audio energy in the respective frame, and classifying the frame under consideration as containing voiced or unvoiced audio data based on the spectral-emphasis-value of the frame under consideration; and

    assigning, in the audio data processing apparatus, a presence-of-speech indicator value to the frame under consideration based on an evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and the at least one next frame of the window, said presence-of-speech indicator value being based on a detection of transitions between frames containing voiced and unvoiced audio data.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×