Identification of the presence of speech in digital audio data

US 20050192795A1
Filed: 02/24/2005
Published: 09/01/2005
Est. Priority Date: 02/26/2004
Status: Active Grant

First Claim

Patent Images

1. Method for determining speech related audio data within a record of digital audio data, the method comprising steps for extracting audio features from the record of digital audio data, classifying the record of digital audio data based on the extracted audio features and with respect to one or more predetermined audio classes, and marking at least a part of the record of digital audio data classified as speech, characterised in that the extraction of at least one audio feature comprises the following steps:

partitioning the record of digital audio data into adjoining frames, for each frame defining a window being formed by a sequence of adjoining frames containing the frame under consideration, determining for the frame under consideration and at least one further frame of the window a spectral-emphasis-value which is related to the frequency distribution contained in the digital audio data of the respective frame, and assigning a presence-of-speech indicator value to the frame under consideration based on an evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and the at least one further frame of the window.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a method, a computer-software-product and an apparatus for enabling a determination of speech related audio data within a record of digital audio data. The method comprises steps for extracting audio features from the record of digital audio data, for classifying one or more subsections of the record of digital audio data, and for marking at least a part of the record of digital audio data classified as speech. The classification of the digital audio data record is performed on the basis of the extracted audio features and with respect to at least one predetermined audio class. The extraction of the at least one audio feature as used by a method according to the invention comprises steps for partitioning the record of digital audio data into adjoining frames, defining a window for each frame which is formed by a sequence of adjoining frames containing the frame under consideration, determining for the frame under consideration and at least one further frame of the window a spectral-emphasis-value which is related to the frequency distribution contained in the digital audio data of the respective frame, and assigning a presence-of-speech indicator value to the frame under consideration based on an evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and at least one further frame of the window.

49 Citations

View as Search Results

9 Claims

1. Method for determining speech related audio data within a record of digital audio data, the method comprising steps for extracting audio features from the record of digital audio data, classifying the record of digital audio data based on the extracted audio features and with respect to one or more predetermined audio classes, and marking at least a part of the record of digital audio data classified as speech, characterised in that the extraction of at least one audio feature comprises the following steps:
- partitioning the record of digital audio data into adjoining frames, for each frame defining a window being formed by a sequence of adjoining frames containing the frame under consideration, determining for the frame under consideration and at least one further frame of the window a spectral-emphasis-value which is related to the frequency distribution contained in the digital audio data of the respective frame, and assigning a presence-of-speech indicator value to the frame under consideration based on an evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and the at least one further frame of the window.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. Method according to claim 1, characterised in that the extraction of the at least one audio feature is based on the record of digital audio data providing the digital audio data in a time domain representation.
  - 3. Method according to claim 1, characterised in that the evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and the at least one further frame of the window is effected by determining the difference between the maximum spectral-emphasis-value and the minimum spectral-emphasis-value determined.
  - 4. Method according to claim 1, characterised in that the evaluation of the differences between the spectral-emphasis-values determined for the frame under consideration and the at least one further frame of the window is effected by forming the standard deviation of the spectral-emphasis-values determined for the frame under consideration and the at least one further frame of the window.
  - 5. Method according to claim 1, characterised in that the spectral-emphasis-value of a frame is determined by applying the SpectralCentroid operator to the digital audio data forming the frame.
  - 6. Method according to claim 1, characterised in that the spectral-emphasis-value of a frame is determined by applying the AverageLSPP operator to the digital audio data forming the frame.
  - 7. Method according to claim 1, characterised in that the window defined for a frame under consideration is formed by a sequence of an odd number of adjoining frames with the frame under consideration being located in the middle of the sequence.
  - 8. Computer-software-product for enabling a determination of speech related audio data within a record of digital audio data, the computer-software-product comprising a series of state elements corresponding to instructions which are adapted to be processed by a data processing means of an audio data processing apparatus such, that a method according to claim 1 may be executed thereon.
  - 9. Audio data processing apparatus being adapted to determine speech related audio data within a record of digital audio data, the apparatus comprising a data processing means for processing a record of digital audio data according to one or more sets of instructions of a software programme of a computer-software-product according to claim 8.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Deutschland GmbH (Sony Group Corp.)
Original Assignee
Sony Deutschland GmbH (Sony Group Corp.)
Inventors
Lam, Yin Hay, Sola I Caros, Josep Maria

Granted Patent

US 8,036,884 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/201
CPC Class Codes

G10H 2210/046 for differentiation between...

G10L 25/78 Detection of presence or ab...

Identification of the presence of speech in digital audio data

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

49 Citations

9 Claims

Specification

Use Cases

Quick Links

Others

Identification of the presence of speech in digital audio data

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

9 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others