Mega speaker identification (ID) system and corresponding methods therefor
First Claim
1. A mega speaker identification (ID) system identifying audio signals attributed to speakers from general audio data (GAD), comprising:
- means for segmenting the GAD into segments;
means for classifying each of the segments as one of N audio signal classes;
means for extracting features from the segments;
means for reclassifying the segments from one to another of the N audio signal classes when required responsive to the extracted features;
means for clustering proximate ones of the segments to thereby generate clustered segments; and
means for labeling each clustered segment with a speaker ID.
1 Assignment
0 Petitions
Accused Products
Abstract
A memory storing computer readable instructions for causing a processor associated with a mega speaker identification (ID) system to instantiate functions including an audio segmentation and classification function receiving general audio data (GAD) and generating segments, a feature extraction function receiving the segments and extracting features based on mel-frequency cepstral coefficients (MFCC) therefrom, a learning and clustering function receiving the extracted features and reclassifying segments, when required, based on the extracted features, a matching and labeling function assigning a speaker ID to speech signals within the GAD, and a database function for correlating the assigned speaker ID to the respective speech signals within the GAD. The audio segmentation and classification function can assign each segment to one of N audio signal classes including silence, single speaker speech, music, environmental noise, multiple speaker'"'"'s speech, simultaneous speech and music, and speech and noise. A mega speaker identification (ID) system and corresponding method are also described.
229 Citations
26 Claims
-
1. A mega speaker identification (ID) system identifying audio signals attributed to speakers from general audio data (GAD), comprising:
-
means for segmenting the GAD into segments;
means for classifying each of the segments as one of N audio signal classes;
means for extracting features from the segments;
means for reclassifying the segments from one to another of the N audio signal classes when required responsive to the extracted features;
means for clustering proximate ones of the segments to thereby generate clustered segments; and
means for labeling each clustered segment with a speaker ID. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14)
-
-
11. A mega speaker identification (ID) method for identifying speakers from general audio data (GAD), comprising:
-
partitioning the GAD into segments;
assigning a label corresponding to one of N audio signal classes to each of the segments;
extracting features from the segments;
reassigning the segments from one to another of the N audio signal classes when required based on the extracted features to thereby generate classified segments;
clustering adjacent ones of the classified segments to thereby generate clustered segments; and
labeling each clustered segment with a speaker ID. - View Dependent Claims (12, 15, 16)
-
-
17. An operating method for an mega speaker ID system including M tuners, an analyzer, a storage device, an input device, and an output device, comprising:
-
operating the M tuners to acquire R audio signals from R audio sources;
operating the analyzer to partition the N audio signals into segments, to assign a label corresponding to one of N audio signal classes to each of the segments, to extract features from the segments;
to reassign the segments from one to another of the N audio signal classes when required based on the extracted features thereby generating classified segments, to cluster adjacent ones of the classified segments to thereby generate clustered segments, and to label each clustered segment with a speaker ID;
storing both the clustered segments included in the R audio signals and the corresponding label in the storage device;
generating query results capable of operating the output device responsive to a query input via the input device. where M, N, and R are positive integers. - View Dependent Claims (18, 19)
-
-
20. A memory storing computer readable instructions for causing a processor associated with a mega speaker identification (ID) system to instantiate functions including:
-
an audio segmentation and classification function receiving general audio data (GAD) and generating segments;
a feature extraction function receiving the segments and extracting features therefrom;
a learning and clustering function receiving the extracted features and reclassifying segments, when required, based on the extracted features;
a matching and labeling function assigning a speaker ID to speech signals within the GAD; and
a database function for correlating the assigned speaker ID to the respective speech signals within the GAD. - View Dependent Claims (21, 22)
-
-
23. An operating method for an mega speaker ID system receiving M audio signals and operatively coupled to an input device and an output device, the mega speaker ID system including an analyzer and a storage device, comprising:
-
operating the analyzer to partition an Mth audio signal into segments, to assign a label corresponding to one of N audio signal classes to each of the segments, to extract features from the segments;
to reassign the segments from one to another of the N audio signal classes when required based on the extracted features thereby generating classified segments, to cluster adjacent ones of the classified segments to thereby generate clustered segments, and to label each clustered segment with a speaker ID;
storing both the clustered segments included in the audio signals and the corresponding label in the storage device;
generating a database relating the Mth audio signal with statistical information derived from at least one of the extracted features and the speaker ID for the M audio signals analyzed; and
generating query results capable of operating the output device responsive to a query input to the database via the input device, where M, N, and R are positive integers. - View Dependent Claims (24, 25, 26)
-
Specification