Multiple voice tracking system and method

US 6,453,284 B1
Filed: 07/26/1999
Issued: 09/17/2002
Est. Priority Date: 07/26/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A system for tracking voices in a multiple voice environment, said system comprising:

a) a frequency estimator for receiving an acoustic waveform comprised of a plurality of voice components, each of which corresponds to a different individual'"'"'s voice, and generating a plurality of estimates of fundamental frequencies in said waveform, each of said fundamental frequencies corresponding to one of said voice components; and

b) a neural network for receiving said estimates of said fundamental frequencies from said frequency estimator, and generating an estimate of a trajectory of each of said fundamental frequencies as a function of time.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

For tracking multiple, simultaneous voices, predicted tracking is used to follow individual voices through time, even when the voices are very similar in fundamental frequency. An acoustic waveform comprised of a group of voices is submitted to a frequency estimator, which may employ an average magnitude difference function (AMDF) calculation to determine the voice fundamental frequencies that are present for each voice. These frequency estimates are then used as input values to a recurrent neural network that tracks each of the frequencies by predicting the current fundamental frequency value for each voice present based on past fundamental frequency values in order to disambiguate any fundamental frequency trajectories that may be converging in frequency.

Citations

20 Claims

1. A system for tracking voices in a multiple voice environment, said system comprising:
- a) a frequency estimator for receiving an acoustic waveform comprised of a plurality of voice components, each of which corresponds to a different individual'"'"'s voice, and generating a plurality of estimates of fundamental frequencies in said waveform, each of said fundamental frequencies corresponding to one of said voice components; and
  
  b) a neural network for receiving said estimates of said fundamental frequencies from said frequency estimator, and generating an estimate of a trajectory of each of said fundamental frequencies as a function of time.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, further comprising a windowing filter for receiving said waveform, generating a plurality of successive samples of said waveform, and supplying said samples to said frequency estimator.
  - 3. The system of claim 2, wherein said windowing filter is a Kaiser windowing filter.
  - 4. The system of claim 2, wherein said frequency estimator comprises means for calculating an average magnitude difference function for subtracting successive ones of said samples from one another to identify said fundamental frequencies in said waveform.
  - 5. The system of claim 1, wherein said frequency estimator comprises means for calculating an average magnitude difference function for subtracting successive ones of a plurality of time shifted samples of said waveform from said waveform to identify said fundamental frequencies in said waveform.
  - 6. The system of claim 1, wherein said neural network includes:
7. The system of claim 6, wherein said hidden layer is further comprised of a plurality of tan-sigmoidal units.
8. The system of claim 6, wherein said neural network further includes a feedback connection between said hidden layer outputs and said input layer for supplying said hidden layer outputs as a weight to said frequency estimates.
9. The system of claim 1, further comprising:
- c) a microphone for generating said acoustic waveform; and
  
  d) a utilization device for receiving said trajectory estimates from said neural network.
10. The system of claim 1, wherein said frequency estimator and said neural network are implemented in hardware.
11. The system of claim 1, wherein said frequency estimator and said neural network are implemented in software.

12. A system for tracking voices in a multiple voice environment, said system comprising:
- a) a windowing filter for receiving an acoustic waveform comprised of a plurality of voice components, each of which corresponds to a different individual'"'"'s voice, and generating a plurality of successive samples of said waveform;
  
  b) a frequency estimator for receiving said samples and generating an estimate of a plurality of fundamental frequencies in said waveform at a given point in time, each of said fundamental frequencies corresponding to one of said voice components, said frequency estimator comprising means for calculating an average magnitude difference function for subtracting successive ones of said samples from one another to identify said fundamental frequencies in said waveform; and
  
  c) a neural network for receiving said estimates of said fundamental frequencies from said frequency estimator, and generating an estimate of a trajectory of each of said fumdamental frequencies as a function of time, said neural network comprising;
  
  1) an input layer for receiving said fundamental frequencies from said frequency estimator and generating a plurality of weighted outputs;
  
  2) a hidden layer comprising of a plurality of tan-sigmoidal units, said hidden layer having an input for receiving said weighted outputs and generating a plurality of hidden layer outputs, said hidden layer further including a feedback connection for supplying said hidden layer outputs back to said input layer for constraining the amount of change allowed in the processing of said hidden layer; and
  
  3) an output layer for linearly combining said hidden layer outputs to generate said trajectory estimates of each of said fundamental frequencies as a function of time.

13. A method for identifying and tracking individual voices in an acoustic waveform comprised of a plurality of voices, said method comprising the steps of:
- a) generating an acoustic waveform, said waveform comprised of a plurality of voice components, each of which corresponds to a different individual'"'"'s voice;
  
  b) generating estimates of a plurality of fundamental frequencies in said waveform, each of said fundamental frequencies corresponding to one of said voice components;
  
  c) supplying said fundamental frequency estimates to a neural network; and
  
  d) generating with said neural network, an estimate of a trajectory of each of said fundamental frequencies as a function of time.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The method of claim 13, wherein steps b and c are periodically repeated so that said neural network can update said trajectory estimates.
  - 15. The method of claim 13, wherein said step of generating estimates of a plurality of fundamental frequencies in said waveform comprises:
16. The method of claim 15, wherein said windowing filter is a Kaiser windowing filter.
17. The method of claim 13, wherein said step of generating with said neural network, an estimate of a trajectory of each of said fundamental frequencies as a function of time, comprises:
- 1) applying weights and biases to said frequency estimates to generate a plurality of weighted frequency estimates;
  
  2) applying said weighted frequency estimates to a plurality of tan-sigmoidal units, one for each of said estimates, to generate a plurality of corresponding outputs; and
  
  3) linearly combining said plurality of outputs to generate said trajectory estimates.
18. The method of claim 17, wherein said step of applying weights and biases further comprises applying said plurality of outputs from said tan-sigmoidal units as feedback to said frequency estimates.
19. The method of claim 13, further comprising the step of matching said trajectory estimates with said frequency estimates.
20. The method of claim 13, further comprising the step of applying said trajectory estimates to a voice separation device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Tech University Health Sciences Center (Texas Tech University System)
Original Assignee
Texas Tech University Health Sciences Center (Texas Tech University System)
Inventors
Paschall, D. Dwayne
Primary Examiner(s)
Chawan, Vijay B.

Application Number

US09/360,697
Time in Patent Office

1,149 Days
Field of Search

381/94.3, 381/94.7, 381/98, 381/56, 381/92, 704/225, 704/226, 704/258, 704/202, 704/207, 704/219, 704/233, 704/253, 704/268, 704/208, 704/206, 704/232, 367/118-127
US Class Current

704/208
CPC Class Codes

G10L 21/028 using properties of sound s...

G10L 25/30 using neural networks

Multiple voice tracking system and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multiple voice tracking system and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links