Mega speaker identification (ID) system and corresponding methods therefor

US 20030236663A1
Filed: 06/19/2002
Published: 12/25/2003
Est. Priority Date: 06/19/2002
Status: Abandoned Application

First Claim

Patent Images

1. A mega speaker identification (ID) system identifying audio signals attributed to speakers from general audio data (GAD), comprising:

means for segmenting the GAD into segments;

means for classifying each of the segments as one of N audio signal classes;

means for extracting features from the segments;

means for reclassifying the segments from one to another of the N audio signal classes when required responsive to the extracted features;

means for clustering proximate ones of the segments to thereby generate clustered segments; and

means for labeling each clustered segment with a speaker ID.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A memory storing computer readable instructions for causing a processor associated with a mega speaker identification (ID) system to instantiate functions including an audio segmentation and classification function receiving general audio data (GAD) and generating segments, a feature extraction function receiving the segments and extracting features based on mel-frequency cepstral coefficients (MFCC) therefrom, a learning and clustering function receiving the extracted features and reclassifying segments, when required, based on the extracted features, a matching and labeling function assigning a speaker ID to speech signals within the GAD, and a database function for correlating the assigned speaker ID to the respective speech signals within the GAD. The audio segmentation and classification function can assign each segment to one of N audio signal classes including silence, single speaker speech, music, environmental noise, multiple speaker'"'"'s speech, simultaneous speech and music, and speech and noise. A mega speaker identification (ID) system and corresponding method are also described.

229 Citations

26 Claims

1. A mega speaker identification (ID) system identifying audio signals attributed to speakers from general audio data (GAD), comprising:
- means for segmenting the GAD into segments;
  
  means for classifying each of the segments as one of N audio signal classes;
  
  means for extracting features from the segments;
  
  means for reclassifying the segments from one to another of the N audio signal classes when required responsive to the extracted features;
  
  means for clustering proximate ones of the segments to thereby generate clustered segments; and
  
  means for labeling each clustered segment with a speaker ID.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14)
- - 2. The mega speaker ID system as recited in claim 1, wherein the labeling means labels a plurality of the clustered segments with the speaker ID responsive to one of user input and additional source data.
  - 3. The mega speaker ID system as recited in claim 1, wherein the mega speaker ID system is included in a computer.
  - 4. The mega speaker ID system as recited in claim 1, wherein the mega speaker ID system is included in a set-top box.
  - 5. The mega speaker ID system as recited in claim 1, wherein the mega speaker ID system further comprises:
    - a memory means for storing a database relating the speaker ID'"'"'s to portions of the GAD; and
      
      means receiving the output of the labeling means for updating the database.
  - 6. The mega speaker ID system as recited in claim 5, wherein the mega speaker ID system further comprises:
    - means for querying the database; and
      
      means for providing query results.
  - 7. The mega speaker ID system as recited in claim 1, wherein the N audio signal classes comprise silence, single speaker speech, music, environmental noise, multiple speaker'"'"'s speech, simultaneous speech and music, and speech and noise.
  - 8. The mega speaker ID system as recited in claim 1, wherein a plurality of the extracted features are based on mel-frequency cepstral coefficients (MFCC).
  - 9. The mega speaker ID system as recited in claim 1, wherein the mega speaker ID system is included in a telephone system.
  - 10. The mega speaker ID system as recited in claim 9, wherein the mega speaker ID system operates in real time.
  - 13. The mega speaker ID method as recited in claim 1, wherein the method further comprises:
    - storing a database relating the speaker ID'"'"'s to portions of the GAD; and
      
      updating the database whenever new clustered segments are labeled with a speaker ID.
  - 14. The mega speaker ID method as recited in claim 13, wherein the method further comprises:
    - querying the database; and
      
      providing query results to a user.

11. A mega speaker identification (ID) method for identifying speakers from general audio data (GAD), comprising:
- partitioning the GAD into segments;
  
  assigning a label corresponding to one of N audio signal classes to each of the segments;
  
  extracting features from the segments;
  
  reassigning the segments from one to another of the N audio signal classes when required based on the extracted features to thereby generate classified segments;
  
  clustering adjacent ones of the classified segments to thereby generate clustered segments; and
  
  labeling each clustered segment with a speaker ID.
- View Dependent Claims (12, 15, 16)
- - 12. The mega speaker ID method as recited in claim 11, wherein the labeling step labels a plurality of the clustered segments with the speaker ID responsive to one of user input and additional source data.
  - 15. The mega speaker ID method as recited in claim 11, wherein the N audio signal classes comprise silence, single speaker speech, music, environmental noise, multiple speaker'"'"'s speech, simultaneous speech and music, and speech and noise.
  - 16. The mega speaker ID method as recited in claim 11, wherein a plurality of the extracted features are based on mel-frequency cepstral coefficients (MFCC).

17. An operating method for an mega speaker ID system including M tuners, an analyzer, a storage device, an input device, and an output device, comprising:
- operating the M tuners to acquire R audio signals from R audio sources;
  
  operating the analyzer to partition the N audio signals into segments, to assign a label corresponding to one of N audio signal classes to each of the segments, to extract features from the segments;
  
  to reassign the segments from one to another of the N audio signal classes when required based on the extracted features thereby generating classified segments, to cluster adjacent ones of the classified segments to thereby generate clustered segments, and to label each clustered segment with a speaker ID;
  
  storing both the clustered segments included in the R audio signals and the corresponding label in the storage device;
  
  generating query results capable of operating the output device responsive to a query input via the input device. where M, N, and R are positive integers.
- View Dependent Claims (18, 19)
- - 18. The operating method as recited in claim 17, wherein the N audio signal classes comprise silence, single speaker speech, music, environmental noise, multiple speaker'"'"'s speech, simultaneous speech and music, and speech and noise.
  - 19. The operating method as recited in claim 17, wherein a plurality of the extracted features are based on mel-frequency cepstral coefficients (MFCC).

20. A memory storing computer readable instructions for causing a processor associated with a mega speaker identification (ID) system to instantiate functions including:
- an audio segmentation and classification function receiving general audio data (GAD) and generating segments;
  
  a feature extraction function receiving the segments and extracting features therefrom;
  
  a learning and clustering function receiving the extracted features and reclassifying segments, when required, based on the extracted features;
  
  a matching and labeling function assigning a speaker ID to speech signals within the GAD; and
  
  a database function for correlating the assigned speaker ID to the respective speech signals within the GAD.
- View Dependent Claims (21, 22)
- - 21. The memory as recited in claim 20, wherein the audio segmentation and classification function assigns each segment to one of N audio signal classes including silence, single speaker speech, music, environmental noise, multiple speaker'"'"'s speech, simultaneous speech and music, and speech and noise.
  - 22. The memory as recited in claim 20, wherein a plurality of the extracted features are based on mel-frequency cepstral coefficients (MFCC).

23. An operating method for an mega speaker ID system receiving M audio signals and operatively coupled to an input device and an output device, the mega speaker ID system including an analyzer and a storage device, comprising:
- operating the analyzer to partition an Mth audio signal into segments, to assign a label corresponding to one of N audio signal classes to each of the segments, to extract features from the segments;
  
  to reassign the segments from one to another of the N audio signal classes when required based on the extracted features thereby generating classified segments, to cluster adjacent ones of the classified segments to thereby generate clustered segments, and to label each clustered segment with a speaker ID;
  
  storing both the clustered segments included in the audio signals and the corresponding label in the storage device;
  
  generating a database relating the Mth audio signal with statistical information derived from at least one of the extracted features and the speaker ID for the M audio signals analyzed; and
  
  generating query results capable of operating the output device responsive to a query input to the database via the input device, where M, N, and R are positive integers.
- View Dependent Claims (24, 25, 26)
- - 24. The operating method as recited in claim 23, wherein the N audio signal classes comprise silence, single speaker speech, music, environmental noise, multiple speaker'"'"'s speech, simultaneous speech and music, and speech and noise.
  - 25. The operating method as recited in claim 23, wherein the generating step further comprises generating query results corresponding to calculations performed on selected data stored in the database capable of operating the output device responsive to a query input to the database via the input device.
  - 26. The operating method as recited in claim 23, wherein the generating step further comprises generating query results corresponding to one of statistics on the types of M audio signals, duration of each class, average duration within each class, duration associated with each speaker ID, duration of a selected speaker ID with respect to all speaker IDs reflected in the database, the query results being capable of operating the output device responsive to a query input to the database via the input device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Original Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Inventors
Dimitrova, Nevenka, Li, Dongge

Application Number

US10/175,391
Publication Number

US 20030236663A1
Time in Patent Office

Days
Field of Search
US Class Current

704/245
CPC Class Codes

G10L 15/04 Segmentation; Word boundary...

G10L 17/00 Speaker identification or v...

Mega speaker identification (ID) system and corresponding methods therefor

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

229 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Mega speaker identification (ID) system and corresponding methods therefor

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

229 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links