Multiple voice tracking system and method
First Claim
1. A system for tracking voices in a multiple voice environment, said system comprising:
- a) a frequency estimator for receiving an acoustic waveform comprised of a plurality of voice components, each of which corresponds to a different individual'"'"'s voice, and generating a plurality of estimates of fundamental frequencies in said waveform, each of said fundamental frequencies corresponding to one of said voice components; and
b) a neural network for receiving said estimates of said fundamental frequencies from said frequency estimator, and generating an estimate of a trajectory of each of said fundamental frequencies as a function of time.
1 Assignment
0 Petitions
Accused Products
Abstract
For tracking multiple, simultaneous voices, predicted tracking is used to follow individual voices through time, even when the voices are very similar in fundamental frequency. An acoustic waveform comprised of a group of voices is submitted to a frequency estimator, which may employ an average magnitude difference function (AMDF) calculation to determine the voice fundamental frequencies that are present for each voice. These frequency estimates are then used as input values to a recurrent neural network that tracks each of the frequencies by predicting the current fundamental frequency value for each voice present based on past fundamental frequency values in order to disambiguate any fundamental frequency trajectories that may be converging in frequency.
-
Citations
20 Claims
-
1. A system for tracking voices in a multiple voice environment, said system comprising:
-
a) a frequency estimator for receiving an acoustic waveform comprised of a plurality of voice components, each of which corresponds to a different individual'"'"'s voice, and generating a plurality of estimates of fundamental frequencies in said waveform, each of said fundamental frequencies corresponding to one of said voice components; and
b) a neural network for receiving said estimates of said fundamental frequencies from said frequency estimator, and generating an estimate of a trajectory of each of said fundamental frequencies as a function of time. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
1) an input layer for applying a set of weights and biases to said fundamental frequency estimates to generate a plurality of weighted estimates;
2) a hidden layer having an input for receiving said weighted estimates and generating a plurality of hidden layer outputs; and
3) an output layer for linearly combining said hidden layer outputs and generating said trajectory estimates of each of said fundamental frequencies as a function of time.
-
-
7. The system of claim 6, wherein said hidden layer is further comprised of a plurality of tan-sigmoidal units.
-
8. The system of claim 6, wherein said neural network further includes a feedback connection between said hidden layer outputs and said input layer for supplying said hidden layer outputs as a weight to said frequency estimates.
-
9. The system of claim 1, further comprising:
-
c) a microphone for generating said acoustic waveform; and
d) a utilization device for receiving said trajectory estimates from said neural network.
-
-
10. The system of claim 1, wherein said frequency estimator and said neural network are implemented in hardware.
-
11. The system of claim 1, wherein said frequency estimator and said neural network are implemented in software.
-
12. A system for tracking voices in a multiple voice environment, said system comprising:
-
a) a windowing filter for receiving an acoustic waveform comprised of a plurality of voice components, each of which corresponds to a different individual'"'"'s voice, and generating a plurality of successive samples of said waveform;
b) a frequency estimator for receiving said samples and generating an estimate of a plurality of fundamental frequencies in said waveform at a given point in time, each of said fundamental frequencies corresponding to one of said voice components, said frequency estimator comprising means for calculating an average magnitude difference function for subtracting successive ones of said samples from one another to identify said fundamental frequencies in said waveform; and
c) a neural network for receiving said estimates of said fundamental frequencies from said frequency estimator, and generating an estimate of a trajectory of each of said fumdamental frequencies as a function of time, said neural network comprising;
1) an input layer for receiving said fundamental frequencies from said frequency estimator and generating a plurality of weighted outputs;
2) a hidden layer comprising of a plurality of tan-sigmoidal units, said hidden layer having an input for receiving said weighted outputs and generating a plurality of hidden layer outputs, said hidden layer further including a feedback connection for supplying said hidden layer outputs back to said input layer for constraining the amount of change allowed in the processing of said hidden layer; and
3) an output layer for linearly combining said hidden layer outputs to generate said trajectory estimates of each of said fundamental frequencies as a function of time.
-
-
13. A method for identifying and tracking individual voices in an acoustic waveform comprised of a plurality of voices, said method comprising the steps of:
-
a) generating an acoustic waveform, said waveform comprised of a plurality of voice components, each of which corresponds to a different individual'"'"'s voice;
b) generating estimates of a plurality of fundamental frequencies in said waveform, each of said fundamental frequencies corresponding to one of said voice components;
c) supplying said fundamental frequency estimates to a neural network; and
d) generating with said neural network, an estimate of a trajectory of each of said fundamental frequencies as a function of time. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
1) applying said waveform to a windowing filter to generate a plurality of successive samples of said waveform; and
2) applying an average magnitude difference function to successive ones of said samples to identify and generate said estimates of said fundamental frequencies in said waveform.
-
-
16. The method of claim 15, wherein said windowing filter is a Kaiser windowing filter.
-
17. The method of claim 13, wherein said step of generating with said neural network, an estimate of a trajectory of each of said fundamental frequencies as a function of time, comprises:
-
1) applying weights and biases to said frequency estimates to generate a plurality of weighted frequency estimates;
2) applying said weighted frequency estimates to a plurality of tan-sigmoidal units, one for each of said estimates, to generate a plurality of corresponding outputs; and
3) linearly combining said plurality of outputs to generate said trajectory estimates.
-
-
18. The method of claim 17, wherein said step of applying weights and biases further comprises applying said plurality of outputs from said tan-sigmoidal units as feedback to said frequency estimates.
-
19. The method of claim 13, further comprising the step of matching said trajectory estimates with said frequency estimates.
-
20. The method of claim 13, further comprising the step of applying said trajectory estimates to a voice separation device.
Specification