Voice signal conversation method and system

US 7,765,101 B2
Filed: 03/09/2005
Issued: 07/27/2010
Est. Priority Date: 03/31/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A method of converting a voice signal as spoken by a source speaker into a converted voice signal the acoustic characteristics thereof resemble those of a target speaker, the method comprising:

a determination step of determining a function for transforming acoustic characteristics of the source speaker into acoustic characteristics close to those of the target speaker on the basis of samples of the voices of the source and target speakers, anda transformation step of transforming acoustic characteristics of the source speaker voice signal to be converted by applying said transformation function,wherein said determination step comprises a step of determining a function for conjoint transformation of characteristics of the source speaker relating to the spectral envelope and of characteristics of the source speaker relating to the pitch and wherein said transformation step comprises applying said conjoint transformation function,wherein said step of determining a conjoint transformation function comprises,a step of analyzing source and target speaker voice samples grouped into frames to obtain for each frame information relating to the spectral envelope and to the pitch,a step of concatenating information relating to the spectral envelope and information relating to the pitch for each of the source and target speakers,a step of determining a model representing common acoustic characteristics of source speaker and target speaker voice samples, anda step of determining said conjoint transformation function from said model and the voice samples, andwherein said steps of analyzing the source and target speaker voice samples are adapted to produce said information relating to the spectral envelope in the form of cepstral coefficients.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of converting a voice signal spoken by a source speaker into a converted voice signal having acoustic characteristics that resemble those of a target speaker. The method includes the following steps: determining (1) at least one function for the transformation of the acoustic characteristics of the source speaker into acoustic characteristics similar to those of the target speaker; and transforming the acoustic characteristics of the voice signal to be converted using the at least one transformation function. The method is characterized in that: (i) the aforementioned transformation function-determining step (1) consists in determining (1) a function for the joint transformation of characteristics relating to the spectral envelope and characteristics relating to the fundamental frequency of the source speaker; and (ii) the transformation includes the application of the joint transformation function.

38 Citations

View as Search Results

16 Claims

1. A method of converting a voice signal as spoken by a source speaker into a converted voice signal the acoustic characteristics thereof resemble those of a target speaker, the method comprising:
- a determination step of determining a function for transforming acoustic characteristics of the source speaker into acoustic characteristics close to those of the target speaker on the basis of samples of the voices of the source and target speakers, anda transformation step of transforming acoustic characteristics of the source speaker voice signal to be converted by applying said transformation function,wherein said determination step comprises a step of determining a function for conjoint transformation of characteristics of the source speaker relating to the spectral envelope and of characteristics of the source speaker relating to the pitch and wherein said transformation step comprises applying said conjoint transformation function,wherein said step of determining a conjoint transformation function comprises,a step of analyzing source and target speaker voice samples grouped into frames to obtain for each frame information relating to the spectral envelope and to the pitch,a step of concatenating information relating to the spectral envelope and information relating to the pitch for each of the source and target speakers,a step of determining a model representing common acoustic characteristics of source speaker and target speaker voice samples, anda step of determining said conjoint transformation function from said model and the voice samples, andwherein said steps of analyzing the source and target speaker voice samples are adapted to produce said information relating to the spectral envelope in the form of cepstral coefficients.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. A method according to claim 1, wherein said analysis steps comprise respectively a step of achieving voice samples models as a summation of an harmonic signal and noise, each achieving step comprising :
    - a substep of estimating the pitch of the voice samples,a substep of synchronized analysis of the pitch of each frame, anda substep of estimating spectral envelope parameters of each frame.
  - 3. A method according to claim 1, wherein said step of determining a model determines a Gaussian probability density mixture model.
  - 4. A method according to claim 3, wherein said step of determining a model comprises:
    - a substep of determining a model corresponding to a mixture of Gaussian probability densities, anda substep of estimating parameters of the mixture of Gaussian probability densities from an estimated maximum likelihood between the acoustic characteristics of the source and target speaker samples and the model.
  - 5. A method according to claim 1, wherein said step of determining at least one transformation function further includes a step of normalizing the pitch of the frames of source and target speaker samples relative to average values of the pitch of the analyzed source and target speaker samples.
  - 6. A method according to claim 1, including a step of temporally aligning the acoustic characteristics of the source speaker with the acoustic characteristics of the target speaker, this step being executed before said step of determining a conjoint model.
  - 7. A method according to claim 1, including a step of separating voiced frames and non-voiced frames in the source speaker and target speaker voice samples, said step of determining a conjoint transformation function of the characteristics relating to the spectral envelope and to the pitch being based only on said voiced frames and the method including a step of determining a function for transformation of only the spectral envelope characteristics on the basis only of said non-voiced frames.
  - 8. A method according to claim 7, including a step of separating voiced frames and non-voiced frames in the source speaker and target speaker voice samples, said step of determining a conjoint transformation function of the characteristics relating to the spectral envelope and to the pitch being based entirely on said voiced frames and the method including a step of determining a function for transformation of only the spectral envelope characteristics on the basis only of said non-voiced frames, and including a step of separating voiced frames and non-voiced frames in said voice signal to be converted, said transformation step comprising:
    - a substep of applying said conjoint transformation function only to voiced frames of said signal to be converted, anda substep of applying said transformation function of the spectral envelope characteristics only to non-voiced frames of said signal to be converted.
  - 9. A method according to claim 1, wherein said step of determining at least one transformation function comprises only said step of determining a conjoint transformation function.
  - 10. A method according to claim 1, wherein said step of determining a conjoint transformation function is achieved on the basis of an estimate of the acoustic characteristics of the target speaker, the achievement of the acoustic characteristics of the source speaker being known.
  - 11. A method according to claim 10, wherein said estimate is the conditional expectation of the acoustic characteristics of the target speaker the achievement of the acoustic characteristics of the source speaker being known.
  - 12. A method according to claim 1, wherein said step of transforming acoustic characteristics of the voice signal to be converted includes:
    - a step of analyzing said voice signal, grouped into frames, to obtain for each frame information relating to the spectral envelope and to the pitch,a step of formatting the acoustic information relating to the spectral envelope and to the pitch of the voice signal to be converted, anda step of transforming the formatted acoustic information of the voice signal to be converted using said conjoint transformation function.
  - 13. A method according to claim 12, wherein said step of determining a transformation function comprises only said step of determining a conjoint transformation function, and wherein said transformation step comprises applying said conjoint transformation function to the acoustic characteristics of all the frames of said voice signal to be converted.
  - 14. A method according to claim 1, further including a step of synthesizing a converted voice signal from said transformed acoustic information.

15. A system for converting a voice signal as spoken by a source speaker into a converted voice signal the acoustic characteristics thereof resemble ones of a target speaker, the system comprising:
- means for determining at least one function for transforming acoustic characteristics of the source speaker into acoustic characteristics similar to ones of the target speaker on the basis of voice samples as spoken by the source and target speakers;
  
  means for transforming acoustic characteristics of the source speaker voice signal to be converted by applying said transformation function,wherein said means for determining at least one transformation function comprise a unit for determining a function for conjoint transformation of characteristics of the source speaker relating to the spectral envelope and of characteristics of the source speaker relating to the pitch and wherein said transformation means include for applying said conjoint transformation function;
  
  means for analyzing the voice signal to be converted, adapted to produce information relating to the spectral envelope in the form of cepstral coefficients and relating to the pitch of the voice signal to be converted; and
  
  synthesizer means for forming a converted voice signal from at least said spectral envelope and pitch information transformed simultaneously.
- View Dependent Claims (16)
- - 16. A system according to claim 15, wherein said means for determining an acoustic characteristic transformation function further include a unit for determining at least one transformation function for the spectral envelope of non-voiced frames, said unit for determining the conjoint transformation function being adapted to determine the conjoint transformation function only for voiced frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Orange S.A.
Original Assignee
Orange S.A.
Inventors
En-Najjary, Taoufik, Rosec, Olivier
Primary Examiner(s)
Chawan; Vijay B

Application Number

US10/594,396
Publication Number

US 20070208566A1
Time in Patent Office

1,966 Days
Field of Search

704/246, 704/206, 704/231, 704/251, 704270-278, 704/256, 704/220, 704/221, 704/207
US Class Current

704/246
CPC Class Codes

G10L 13/033   Voice editing, e.g. manipul...

G10L 2021/0135   Voice conversion or morphing

G10L 21/00   Speech or voice signal proc...

Voice signal conversation method and system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

38 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Voice signal conversation method and system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links