SOUND SIGNAL PROCESSING APPARATUS, SOUND SIGNAL PROCESSING METHOD, AND PROGRAM

US 20140328487A1
Filed: 03/21/2014
Published: 11/06/2014
Est. Priority Date: 05/02/2013
Status: Active Grant

First Claim

Patent Images

1. A sound signal processing apparatus comprising:

an observed signal analysis unit that receives as an observed signal a sound signal for a plurality of channels obtained by a sound signal input unit formed of a plurality of microphones placed at different positions and estimates a sound direction and a sound segment of a target sound which is sound to be extracted; and

a sound source extraction unit that receives the sound direction and sound segment of the target sound estimated by the observed signal analysis unit and extracts the sound signal for the target sound,wherein the observed signal analysis unit includesa short time Fourier transform unit that generates an observed signal in time-frequency domain by applying short time Fourier transform to the sound signal for the plurality of channels received; and

a direction/segment estimation unit that receives the observed signal generated by the short time Fourier transform unit and detects the sound direction and sound segment of the target sound, andwherein the sound source extraction unitexecutes iterative learning in which an extracting filter U′

is iteratively updated using a result of application of the extracting filter to the observed signal,prepares, as a function to be applied in the iterative learning, an objective function G(U′

) that assumes a local minimum or a local maximum when a value of the extracting filter U′

is a value optimal for extraction of the target sound, andcomputes a value of the extracting filter U′

which is in a neighborhood of a local minimum or a local maximum of the objective function G(U′

) using an auxiliary function method during the iterative learning, and applies the computed extracting filter to extract the sound signal for the target sound.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A sound signal processing apparatus includes an observed signal analysis unit that receives as an observed signal a sound signal for channels obtained by a sound signal input unit formed of microphones and estimates a sound direction and a sound segment of a target sound which is sound to be extracted and a sound source extraction unit that receives the sound direction and sound segment of the target sound estimated by the observed signal analysis unit and extracts the sound signal for the target sound. The observed signal analysis unit includes a short time Fourier transform unit that generates an observed signal in time-frequency domain by applying short time Fourier transform to the sound signal for the channels received and a direction/segment estimation unit that receives the observed signal generated by the short time Fourier transform unit and detects the sound direction and sound segment of the target sound.

49 Citations

View as Search Results

11 Claims

1. A sound signal processing apparatus comprising:
- an observed signal analysis unit that receives as an observed signal a sound signal for a plurality of channels obtained by a sound signal input unit formed of a plurality of microphones placed at different positions and estimates a sound direction and a sound segment of a target sound which is sound to be extracted; and
  
  a sound source extraction unit that receives the sound direction and sound segment of the target sound estimated by the observed signal analysis unit and extracts the sound signal for the target sound,wherein the observed signal analysis unit includesa short time Fourier transform unit that generates an observed signal in time-frequency domain by applying short time Fourier transform to the sound signal for the plurality of channels received; and
  
  a direction/segment estimation unit that receives the observed signal generated by the short time Fourier transform unit and detects the sound direction and sound segment of the target sound, andwherein the sound source extraction unitexecutes iterative learning in which an extracting filter U′
  
  is iteratively updated using a result of application of the extracting filter to the observed signal,prepares, as a function to be applied in the iterative learning, an objective function G(U′
  
  ) that assumes a local minimum or a local maximum when a value of the extracting filter U′
  
  is a value optimal for extraction of the target sound, andcomputes a value of the extracting filter U′
  
  which is in a neighborhood of a local minimum or a local maximum of the objective function G(U′
  
  ) using an auxiliary function method during the iterative learning, and applies the computed extracting filter to extract the sound signal for the target sound.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The sound signal processing apparatus according to claim 1, whereinthe sound source extraction unitcomputes a temporal envelope which is an outline of a sound volume of the target sound in time direction based on the sound direction and the sound segment of the target sound received from the direction/segment estimation unit and substitutes the computed temporal envelope value over frame t into an auxiliary variable b(t),prepares an auxiliary function F that takes the auxiliary variable b(t) and an extracting filter U′
    - (ω
      
      ) for each frequency bin (ω
      
      ) as arguments,executes an iterative learning process in which(1) extracting filter computation for computing the extracting filter U′
      
      (ω
      
      ) that minimizes the auxiliary function F while fixing the auxiliary variable b(t), and(2) auxiliary variable computation for computing the auxiliary variable b(t) based on Z(ω
      
      ,t) which is the result of application of the extracting filter U′
      
      (ω
      
      ) to the observed signalare repeated to sequentially update the extracting filter U′
      
      (ω
      
      ), and applies the updated extracting filter to extract the sound signal for the target sound.
  - 3. The sound signal processing apparatus according to claim 1, whereinthe sound source extraction unitcomputes a temporal envelope which is an outline of the sound volume of the target sound in time direction based on the sound direction and sound segment of the target sound received from the direction/segment estimation unit and substitutes the computed temporal envelope value for each frame t into the auxiliary variable b(t),prepares an auxiliary function F that takes the auxiliary variable b(t) and the extracting filter U′
    - (ω
      
      ) for each frequency bin (ω
      
      ) as arguments,executes an iterative learning process in which(1) extracting filter computation for computing the extracting filter U′
      
      (ω
      
      ) that maximizes the auxiliary function F while fixing the auxiliary variable b(t), and(2) auxiliary variable computation for computing the auxiliary variable b(t) based on Z(ω
      
      ,t) which is the result of application of the extracting filter U′
      
      (ω
      
      ) to the observed signalare repeated to sequentially update the extracting filter U′
      
      (ω
      
      ), and applies the updated extracting filter to the observed signal to extract the sound signal for the target sound.
  - 4. The sound signal processing apparatus according to claim 2, whereinthe sound source extraction unitperforms, in the auxiliary variable computation, processing for generating Z(ω
    - ,t) which is the result of application of the extracting filter U′
      
      (ω
      
      ) to the observed signal, calculating an L-2 norm of a vector [Z(1,t), . . . , Z(Ω
      
      ,t)], Ω
      
      being a number of frequency bins and the vector representing a spectrum of the result of application for each frame t, and substituting the L-2 norm value to the auxiliary variable b(t).
  - 5. The sound signal processing apparatus according to claim 2, whereinthe sound source extraction unit performs, in the auxiliary variable computation, processing for further applying a time-frequency mask that attenuates sounds from directions off the sound source direction of the target sound to Z(ω
    - ,t) which is the result of application of the extracting filter U′
      
      (ω
      
      ) to the observed signal to generate a masking result Q(ω
      
      ,t), calculating for each frame t the L-2 norm of the vector [Q(1,t), . . . , Q(Ω
      
      , t)], Ω
      
      being the number of frequency bins and the vector representing the spectrum of the generated masking result, and substituting the L-2 norm value to the auxiliary variable b(t).
  - 6. The sound signal processing apparatus according to claim 1, whereinthe sound source extraction unitgenerates a steering vector containing information on phase difference among the plurality of microphones that collect the target sound, based on sound source direction information for the target sound,generates a time-frequency mask that attenuates sounds from directions off the sound source direction of the target sound based on an observed signal containing interfering sound which is a signal other than the target sound and on the steering vector,applies the time-frequency mask to observed signals in a predetermined segment to generate a masking result, andgenerates an initial value of the auxiliary variable based on the masking result.
  - 7. The sound signal processing apparatus according to claim 1, whereinthe sound source extraction unitgenerates a steering vector containing information on phase difference among the plurality of microphones that collect the target sound, based on sound source direction information for the target sound,generates a time-frequency mask that attenuates sounds from directions off the sound source direction of the target sound based on an observed signal containing interfering sound which is a signal other than the target sound and on the steering vector, andgenerates the initial value of the auxiliary variable based on the time-frequency mask.
  - 8. The sound signal processing apparatus according to claim 1, whereinthe sound source extraction unitif a length of the sound segment of the target sound detected by the observed signal analysis unit is shorter than a prescribed minimum segment length T_MIN, selects a point in time earlier than an end of the sound segment by the minimum segment length T_MIN as a start position of the observed signal to be used in the iterative learning,if the length of the sound segment of the target sound is longer than a prescribed maximum segment length T_MAX, selects the point in time earlier than the end of the sound segment by the maximum segment length T_MAX as the start position of the observed signal to be used in the iterative learning, andif the length of the sound segment of the target sound detected by the observed signal analysis unit falls within a range between the prescribed minimum segment length T_MIN and the prescribed maximum segment length T_MAX, uses the sound segment as the sound segment of the observed signal to be used in the iterative learning.
  - 9. The sound signal processing apparatus according to claim 1, whereinthe sound source extraction unitcalculates a weighted covariance matrix from the auxiliary variable b(t) and a decorrelated observed signal,applies eigenvalue decomposition to the weighted covariance matrix to compute eigenvalue(s) and eigenvector(s), andsets an eigenvector selected based on the eigenvalue(s) as an in-process extracting filter to be used in the iterative learning.

10. A sound signal processing method for execution in a sound signal processing apparatus, the method comprising:
- performing, at an observed signal analysis unit, an observed signal analysis process in which a sound signal for a plurality of channels obtained by a sound signal input unit formed of a plurality of microphones disposed at different positions is received as an observed signal and a sound direction and a sound segment of a target sound which is sound to be extracted are estimated; and
  
  performing, at a sound source extraction unit, a sound source extraction process in which the sound direction and sound segment of the target sound estimated by the observed signal analysis unit are received and the sound signal for the target sound is extracted,wherein the observed signal analysis process includesexecuting a short time Fourier transform process for generating an observed signal in time-frequency domain by applying short time Fourier transform to the sound signal for the plurality of channels received; and
  
  executing a direction and segment estimation process for receiving the observed signal generated in the short time Fourier transform process and detecting the sound direction and sound segment of the target sound, andwherein the sound source extraction process includesexecuting iterative learning in which an extracting filter U′
  
  is iteratively updated using a result of application of the extracting filter to the observed signal,preparing, as a function to be applied in the iterative learning, an objective function G(U′
  
  ) that assumes a local minimum or a local maximum when a value of the extracting filter U′
  
  is a value optimal for extraction of the target sound, andcomputing a value of the extracting filter U′
  
  which is in a neighborhood of a local minimum or a local maximum of the objective function G(U′
  
  ) using an auxiliary function method during the iterative learning, and applying the computed extracting filter to extract the sound signal for the target sound.

11. A program for causing a sound signal processing apparatus to execute sound signal processing, the program comprising:
- causing an observed signal analysis unit to perform an observed signal analysis process for receiving as an observed signal a sound signal for a plurality of channels obtained by a sound signal input unit formed of a plurality of microphones placed at different positions and estimating a sound direction and a sound segment of a target sound which is sound to be extracted; and
  
  causing a sound source extraction unit to perform a sound source extraction process for receiving the sound direction and sound segment of the target sound estimated by the observed signal analysis unit and extracting the sound signal for the target sound,wherein the observed signal analysis process includesexecuting a short time Fourier transform process for generating an observed signal in time-frequency domain by applying short time Fourier transform to the sound signal for the plurality of channels received; and
  
  executing a direction and segment estimation process for receiving the observed signal generated in the short time Fourier transform process and detecting the sound direction and sound segment of the target sound, and wherein the sound source extraction process includes executing iterative learning in which an extracting filter U′
  
  is iteratively updated using a result of application of the extracting filter to the observed signal, preparing, as a function to be applied in the iterative learning, an objective function G(U′
  
  ) that assumes a local minimum or a local maximum when a value of the extracting filter U′
  
  is a value optimal for extraction of the target sound, andcomputing a value of the extracting filter U′
  
  which is in a neighborhood of a local minimum or a local maximum of the objective function G(U′
  
  ) using an auxiliary function method during the iterative learning, and applying the computed extracting filter to extract the sound signal for the target sound.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
HIROE, Atsuo

Granted Patent

US 9,357,298 B2
Time in Patent Office

Days
Field of Search
US Class Current

381/56
CPC Class Codes

G10L 21/0272   Voice signal separating

H04R 2227/009   Signal processing in [PA] s...

H04R 27/00   Public address systems circ...

H04R 3/005   for combining the signals o...

SOUND SIGNAL PROCESSING APPARATUS, SOUND SIGNAL PROCESSING METHOD, AND PROGRAM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

49 Citations

11 Claims

Specification

Use Cases

Quick Links

Others

SOUND SIGNAL PROCESSING APPARATUS, SOUND SIGNAL PROCESSING METHOD, AND PROGRAM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

11 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others