Method and apparatus for clustering-based signal segmentation
First Claim
1. A computerized method for segmenting a signal, comprising:
- sampling the signal at periodic intervals to produce a sequence of digital samples;
partitioning the digital samples into a plurality of sets of samples;
summing a product of adjacent samples of each set of samples to produce an autocorrelation matrix of the samples of each set of samples;
measuring a distance between adjacent sets of samples using the autocorrelation matrix of the samples of each set of samples to determine a set of distances; and
merging adjacent sets of samples if the distance between the adjacent sets of samples is less than a predetermined threshold value.
4 Assignments
0 Petitions
Accused Products
Abstract
In a computerized method a continuous signal is segmented in order to determine statistically stationary units of the signal. The continuous signal is sampled at periodic intervals to produce a timed sequence of digital samples. Fixed numbers of adjacent digital samples are grouped into a plurality of disjoint sets or frames. A statistical distance between adjacent frames is determined. The adjacent sets are merged into a larger set of samples or cluster if the statistical distance is less than a predetermined threshold. In an iterative process, the statistical distance between the adjacent sets are determined, and as long as the distance is less than the predetermined threshold, the sets are iteratively merged to segment the signal into statistically stationary units.
24 Citations
17 Claims
-
1. A computerized method for segmenting a signal, comprising:
-
sampling the signal at periodic intervals to produce a sequence of digital samples;
partitioning the digital samples into a plurality of sets of samples;
summing a product of adjacent samples of each set of samples to produce an autocorrelation matrix of the samples of each set of samples;
measuring a distance between adjacent sets of samples using the autocorrelation matrix of the samples of each set of samples to determine a set of distances; and
merging adjacent sets of samples if the distance between the adjacent sets of samples is less than a predetermined threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 17)
repeatedly measuring and merging adjacent sets of samples as long as the distance between adjacent sets of samples is less than the predetermined threshold value to segment the continuous signal into statistically stationary units.
-
-
3. The method of claim 1 wherein the step of representing further comprises:
summing the autocorrelation matrices of the merged pair of adjacent sets of samples.
-
4. The method of claim 1 wherein each set of samples includes an identical number of samples prior to performing the step of merging.
-
5. The method of claim 2 further comprising:
selecting an optimal number of parameters to describe each set of samples using a minimum description length likelihood.
-
6. The method of claim 1 further comprising:
-
determining a least distance of the set of statistical distances; and
first merging adjacent sets of samples having the least statistical distance.
-
-
7. The method of claim 1 wherein the continuous signals are speech signals.
-
8. The method of claim 2 wherein the continuous signals are speech signals and the statistically stationary units relate to linguistic elements.
-
17. The method of claim 1 further comprising:
determining a generalized likelihood ratio of the distances of the pair of adjacent sets of samples being separate and the pair of adjacent sets of samples being merged into a single set of samples.
-
9. A system of processing a sequence of digital samples partitioned into a plurality of non-overlapping sets of samples, said sequence of digital samples being produced by sampling a signal at periodic intervals, the system comprising:
-
a memory for storing the sequence of digital samples produced by sampling the signal at periodic intervals, the sequence of digital samples being partitioned into the plurality of non-overlapping sets of samples; and
at least one processor coupled to the memory, the at least one processor configured to;
sum a product of adjacent samples of each set of samples to produce an autocorrelation matrix of the samples of each set of samples;
measure a distance between a first of the plurality of non-overlapping sets of samples and a second of the plurality of non-overlapping sets of samples using the autocorrelation matrix; and
merge the first of the plurality of non-overlapping sets of samples and the second of the plurality of non-overlapping sets of samples if the distance is less than or equal to a particular value. - View Dependent Claims (10)
-
-
11. An article of manufacture for segmenting a continuous signal represented by a sequence of digital samples partitioned into a plurality of non-overlapping sets of samples, said sequence of digital samples being produced by sampling said signal at periodic intervals, the article of manufacture comprising:
-
a computer readable storage medium; and
computer programming stored on the storage medium;
wherein the stored computer programming is configured to be readable from the computer readable storage medium by a computer and thereby cause the computer to operate so as to;
sum a product of adjacent samples of each set of samples to produce an autocorrelation matrix of the samples of each set of samples;
measure a first distance between a first of the plurality of non-overlapping sets of samples and a second of the plurality of non-overlapping sets of samples using the autocorrelation matrix; and
merge the first of the plurality of non-overlapping sets of samples and the second of the plurality of non-overlapping sets of samples if the first distance is less than or equal to a particular value to segment the continuous signal into a statistically stationary unit. - View Dependent Claims (12, 13, 14)
measure a second distance between a third of the plurality of non-overlapping sets of samples and a fourth of the plurality of non-overlapping sets of samples; and
merge the third of the plurality of non-overlapping sets of samples and the fourth of the plurality of non-overlapping sets of samples if the second distance is less than or equal to the particular value.
-
-
13. The article of manufacture of claim 12, wherein the stored computer programming is further configured to cause the computer to operate as to:
-
determine the smaller of the first distance and the second distance; and
merge the ones of the plurality of non-overlapping sets of samples corresponding to the determined smaller distance before merging others of the plurality of non-overlapping sets of samples.
-
-
14. The article of manufacture of claim 12, wherein the third of the plurality of non-overlapping sets of samples corresponds to the merged first of the plurality of non-overlapping sets of samples and the second of the plurality of non-overlapping sets of samples.
-
15. A method for forming segments of a sequence of digital samples partitioned into a plurality of sets of samples, the method comprising the steps of:
-
receiving a sequence of digital samples;
partitioning the sequence of digital samples into the plurality of sets of samples;
determining a first generalized likelihood ratio of the distances of a first pair of adjacent sets of samples being separate and the first pair of adjacent sets of samples being merged into a single set of samples; and
merging the first pair of adjacent sets of samples if the first generalized likelihood ratio is less than or equal to a particular value to form a statistically stationary unit. - View Dependent Claims (16)
determining a second generalized likelihood ratio of the distances of a second pair of adjacent sets of samples being separate and the second pair of adjacent sets of samples being merged into a single set of samples; and
merging the second pair of adjacent sets of samples if the second generalized likelihood ratio is less than or equal to the particular value.
-
Specification