SPEECH SEPARATING APPARATUS, SPEECH SYNTHESIZING APPARATUS, AND VOICE QUALITY CONVERSION APPARATUS

US 20100004934A1
Filed: 08/06/2008
Published: 01/07/2010
Est. Priority Date: 08/10/2007
Status: Active Grant

First Claim

Patent Images

1. A speech separating apparatus that separates an input speech signal into vocal tract information and voicing source information, said speech separating apparatus comprising:

a vocal tract information extracting unit configured to extract vocal tract information from the input speech signal;

a filter smoothing unit configured to smooth, in a first time constant, the vocal tract information extracted by said vocal tract information extracting unit;

an inverse filtering unit configured to calculate a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed by said filter smoothing unit, and to filter the input speech signal by using the calculated filter; and

a voicing source modeling unit configured to take, from the input speech signal filtered by said inverse filtering unit, a waveform included in a second time constant shorter than the first time constant, and to calculate, for each waveform that is taken, voicing source information from the each waveform.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech separating apparatus includes: a PARCOR calculating unit (102) that extracts vocal tract information from an input speech signal; a filter smoothing unit (103) that smoothes, in a first time constant, the vocal tract information extracted by the PARCOR calculating unit (102); an inverse filtering unit (104) that calculates a filter coefficient of a filter having a frequency amplitude response characteristic inverse to the vocal tract information smoothed by the filter smoothing unit (103), so as to filter the input speech signal using the filter having the calculated filter coefficient; and a voicing source modeling unit (105) that cuts out, from the input speech signal filtered by the inverse filtering unit (104), a waveform included in a second time constant shorter than the first time constant, so as to calculate, for each waveform that is taken, voicing source information from the each waveform.

Citations

18 Claims

1. A speech separating apparatus that separates an input speech signal into vocal tract information and voicing source information, said speech separating apparatus comprising:
- a vocal tract information extracting unit configured to extract vocal tract information from the input speech signal;
  
  a filter smoothing unit configured to smooth, in a first time constant, the vocal tract information extracted by said vocal tract information extracting unit;
  
  an inverse filtering unit configured to calculate a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed by said filter smoothing unit, and to filter the input speech signal by using the calculated filter; and
  
  a voicing source modeling unit configured to take, from the input speech signal filtered by said inverse filtering unit, a waveform included in a second time constant shorter than the first time constant, and to calculate, for each waveform that is taken, voicing source information from the each waveform.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The speech separating apparatus according to claim 1,wherein said voicing source modeling unit is configured to convert the each waveform into a representation of a frequency domain, and to approximate, for the each waveform, an amplitude spectrum in the frequency domain by using a function, so as to output, as parameterized voicing source information, a coefficient of the function used for the approximation.
  - 3. The speech separating apparatus according to claim 2,wherein said voicing source modeling unit is configured to convert the each waveform into the frequency domain representation, and to approximate, for the each waveform, the amplitude spectrum by using a function that is different from one frequency band to another, so as to output, as parameterized voicing source information, a coefficient of the function used for the approximation.
  - 4. The speech separating apparatus according to claim 2,wherein said voicing source modeling unit is configured to approximate the amplitude spectrum by using the function with respect to each of boundary frequency candidates previously provided, and to output, along with the coefficient of the function, one of the boundary frequency candidates at a point at which a difference between the amplitude spectrum and the function is a minimum.
  - 5. The speech separating apparatus according to claim 1,wherein said vocal tract information extracting unit includes:
    - an all-pole model analysis unit configured to analyze the input speech signal based on an all-pole model, and to calculate an all-pole vocal tract model parameter that is a parameter for an acoustic-tube model in which a vocal tract is divided into plural sections; and
      
      a reflection coefficient parameter calculating unit configured to convert the all-pole vocal tract model parameter into a reflection coefficient parameter that is a parameter for the acoustic-tube model or a parameter convertible into the reflection coefficient parameter.
  - 6. The speech separating apparatus according to claim 5,wherein said all-pole model analysis unit is configured to calculate the all-pole vocal tract model parameter by performing a linear predictive analysis on the input speech signal.
  - 7. The speech separating apparatus according to claim 5,wherein said all-pole model analysis unit is configured to calculate the all-pole vocal tract model parameter by performing an autoregressive exogenous analysis on the input speech signal.
  - 8. The speech separating apparatus according to claim 1,wherein said filter smoothing unit is configured to smooth the vocal tract information, by using a polynomial or a regression line, in a time axis direction in a predetermined unit, the vocal tract information being extracted by said vocal tract information extracting unit.
  - 9. The speech separating apparatus according to claim 8,wherein the predetermined unit is phoneme, syllable, or mora.
  - 10. The speech separating apparatus according to claim 1,wherein said voicing source modeling unit is configured to:
    - take a waveform from the input speech signal filtered by said inverse filtering unit, by gradually shifting a window function in a time axis direction in a pitch period of the input speech signal, the window function having approximately twice a length of the pitch period;
      
      convert each waveform that is taken, into the representation of the frequency domain;
      
      calculate, for the each waveform, an amplitude spectrum from which phase information included in every frequency component is removed; and
      
      approximate the amplitude spectrum by using a function, so as to output, as parameterized voicing source information, a coefficient of the function used for the approximation.

11. A speech synthesizing apparatus that generates synthesized speech by using vocal tract information and voicing source information included in an input speech signal, said speech synthesizing apparatus comprising:
- a vocal tract information extracting unit configured to extract vocal tract information from the input speech signal;
  
  a filter smoothing unit configured to smooth, in a first time constant, the vocal tract information extracted by said vocal tract information extracting unit;
  
  an inverse filtering unit configured to calculate a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed by said filter smoothing unit, and to filter the input speech signal by using the calculated filter;
  
  a voicing source modeling unit configured to take, from the input speech signal filtered by said inverse filtering unit, a waveform included in a second time constant shorter than the first time constant, and to calculate, for each waveform that is taken, parameterized voicing source information from the each waveform; and
  
  a synthesis unit configured to generate synthesized speech by generating a voicing source waveform by using a voicing source information parameter outputted from said voicing source modeling unit, and filtering the generated voicing source waveform by using the vocal tract information smoothed by said filter smoothing unit.
- View Dependent Claims (12, 13, 14)
- - 12. The speech synthesizing apparatus according to claim 11,wherein said voicing source modeling unit is configured to take a waveform from the input speech signal filtered by said inverse filtering unit, by gradually shifting a window function in a time axis direction in a pitch period of the input speech signal, and to convert into a parameter each waveform that is taken, the window function having approximately twice a length of the pitch period, andsaid synthesis unit is configured to generate synthesized speech by:
    - generating a voicing source waveform by using the parameter outputted from said voicing source modeling unit;
      
      generating a temporally-continuous voicing source waveform by laying out the generated voicing source waveform so as to create overlaps of the generated voicing source waveform in the time axis direction; and
      
      filtering the generated temporally-continuous voicing source waveform by using the vocal tract information smoothed by said filter smoothing unit.
  - 13. The speech synthesizing apparatus according to claim 12,wherein said voicing source modeling unit is configured to convert the each waveform into a representation of a frequency domain, and to calculate, for the each waveform, an amplitude spectrum from which phase information included in every frequency component is removed, andsaid synthesis unit is configured to generate synthesized speech by:
    - converting the amplitude spectrum into a voicing source waveform represented by a time domain;
      
      generating a temporally-continuous voicing source waveform by laying out the voicing source waveform so as to create overlaps of the voicing source waveform in the time axis direction; and
      
      filtering the generated temporally-continuous voicing source waveform by using the vocal tract information smoothed by said filter smoothing unit.
  - 14. The speech synthesizing apparatus according to claim 13,wherein said voicing source modeling unit is further configured to approximate the amplitude spectrum by using a function, and to output, as parameterized voicing source information, the coefficient of the function used for the approximation, andsaid synthesis unit is configured to generate synthesized speech by:
    - restoring the amplitude spectrum from the function represented by the coefficient outputted from said voicing source modeling unit;
      
      converting the amplitude spectrum into a voicing source waveform represented by the time domain;
      
      generating a temporally-continuous voicing source waveform by laying out the voicing source waveform so as to create overlaps of the voicing source waveform in the time axis direction; and
      
      filtering the generated temporally-continuous voicing source waveform by using the vocal tract information smoothed by said filter smoothing unit.

15. A voice quality conversion apparatus that converts a voice quality of an input speech signal, said voice quality conversion apparatus comprising:
- a vocal tract information extracting unit configured to extract vocal tract information from the input speech signal;
  
  a filter smoothing unit configured to smooth, in a first time constant, the vocal tract information extracted by said vocal tract information extracting unit;
  
  an inverse filtering unit configured to calculate a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed by said filter smoothing unit, and to filter the input speech signal by using the calculated filter;
  
  a voicing source modeling unit configured to take, from the input speech signal filtered by said inverse filtering unit, a waveform included in a second time constant shorter than the first time constant, and to calculate, for each waveform that is taken, parameterized voicing source information from the each waveform;
  
  a target speech information holding unit configured to hold vocal tract information and the parameterized voicing source information on a target voice quality;
  
  a conversion ratio input unit configured to input a conversion ratio for converting the input speech signal into the target voice quality;
  
  a filter transformation unit configured to convert, at the conversion ratio inputted by said conversion ratio input unit, the vocal tract information smoothed by said filter smoothing unit into the vocal tract information on the target voice quality, which is held by said target speech information holding unit;
  
  a voicing source transformation unit configured to convert, at the conversion ratio inputted by said conversion ratio input unit, the voicing source information parameterized by said voicing source modeling unit into the voicing source information on the target voice quality, which is held by said target speech information holding unit; and
  
  a synthesis unit configured to generate synthesized speech by generating a voicing source waveform by using the parameterized voicing source information transformed by said voicing source transformation unit, and filtering the generated voicing source waveform by using the vocal tract information transformed by said filter transformation unit.
- View Dependent Claims (16)
- - 16. The voice quality conversion apparatus according to claim 15,wherein said filter smoothing unit is configured to smooth the vocal tract information, through approximation using a polynomial or a regression line, in a time axis direction in a predetermined unit, the vocal tract information being extracted by said vocal tract information extracting unit, andsaid filter transformation unit is configured to convert, at the conversion ratio inputted by said conversion ratio input unit, a coefficient of the polynomial or the regression line into the vocal tract information on the target voice quality held by said target speech information holding unit, the polynomial or the regression line being used when the vocal tract information is approximated by said filter smoothing unit.

17. A method of separating an input speech signal into vocal tract information and voicing source information, said method comprising:
- extracting vocal tract information from the input speech signal;
  
  smoothing, in a first time constant, the vocal tract information extracted in said extracting;
  
  calculating a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed in said smoothing, and filtering the input speech signal by using the calculated filter; and
  
  taking, from the input speech signal filtered in said calculating, a waveform included in a second time constant shorter than the first time constant, and calculating, for each waveform that is taken, voicing source information from the each waveform.

18. A program for separating an input speech signal into vocal tract information and voicing source information, said program causing a computer to execute:
- extracting vocal tract information from the input speech signal;
  
  smoothing, in a first time constant, the vocal tract information extracted in the extracting;
  
  calculating a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed in the smoothing, and filtering the input speech signal by using the calculated filter; and
  
  taking, from the input speech signal filtered in the calculating, a waveform included in a second time constant shorter than the first time constant, and calculating, for each waveform that is taken, voicing source information from the each waveform.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Hirose, Yoshifumi, Kamai, Takahiro

Granted Patent

US 8,255,222 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/261
CPC Class Codes

G10L 13/04   Details of speech synthesis...

G10L 19/04   using predictive techniques

G10L 19/06   Determination or coding of ...

G10L 19/08   Determination or coding of ...

G10L 21/02   Speech enhancement, e.g. no...

SPEECH SEPARATING APPARATUS, SPEECH SYNTHESIZING APPARATUS, AND VOICE QUALITY CONVERSION APPARATUS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH SEPARATING APPARATUS, SPEECH SYNTHESIZING APPARATUS, AND VOICE QUALITY CONVERSION APPARATUS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links