Method, apparatus and computer program product for providing voice conversion using temporal dynamic features

US 20080262838A1
Filed: 04/17/2007
Published: 10/23/2008
Est. Priority Date: 04/17/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

extracting dynamic feature vectors from source speech;

applying a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors, the first conversion function having been trained using at least dynamic feature data associated with training source speech and training target speech; and

producing converted speech based on an output of applying the first conversion function.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus for providing voice conversion using temporal dynamic features includes a feature extractor and a transformation element. The feature extractor may be configured to extract dynamic feature vectors from source speech. The transformation element may be in communication with the feature extractor and configured to apply a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors. The first conversion function may have been trained using at least dynamic feature data associated with training source speech and training target speech. The transformation element may be further configured to produce converted speech based on an output of applying the first conversion function.

Citations

23 Claims

1. A method comprising:
- extracting dynamic feature vectors from source speech;
  
  applying a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors, the first conversion function having been trained using at least dynamic feature data associated with training source speech and training target speech; and
  
  producing converted speech based on an output of applying the first conversion function.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. A method according to claim 1, further comprising an initial operation of training a conversion model to obtain the first conversion function.
  - 3. A method according to claim 2, wherein training the conversion model comprises:
    - extracting static and dynamic feature data from both training source data and training target data;
      
      utilizing the static feature data from both the training source data and the training target data to train a second conversion model; and
      
      utilizing the dynamic feature data from both the training source data and the training target data to train the first conversion model.
  - 4. A method according to claim 3, wherein applying the first conversion function further comprises:
    - applying the second conversion function to static feature vectors extracted from source speech; and
      
      combining an output of the first conversion function and the second conversion function for use in producing the converted speech.
  - 5. A method according to claim 2, wherein training the first conversion model comprises:
    - extracting static and dynamic feature data from both training source data and training target data;
      
      combining the static and dynamic feature data to form general feature data; and
      
      utilizing the general feature data to train the first conversion model.
  - 6. A method according to claim 1, wherein producing the converted speech further comprises integrating a result of the applying the conversion function to estimate converted static features and combining the result of the applying the conversion function and the estimated converted static features for use in converted speech production.
  - 7. A method according to claim 1, further comprising:
    - extracting static feature vectors from source speech; and
      
      combining the static feature vectors and the dynamic feature vectors to produce a general feature vector,wherein applying the first conversion function comprises applying the first conversion function to the general feature vector for use in producing the converted speech.

8. A computer program product comprising at least one computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
- a first executable portion for extracting dynamic feature vectors from source speech;
  
  a second executable portion for applying a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors, the first conversion function having been trained using at least dynamic feature data associated with training source speech and training target speech; and
  
  a third executable portion for producing converted speech based on an output of applying the first conversion function.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. A computer program product according to claim 8, further comprising a fourth executable portion for an initial operation of training a conversion model to obtain the first conversion function.
  - 10. A computer program product according to claim 9, wherein the fourth executable portion includes instructions for:
    - extracting static and dynamic feature data from both training source data and training target data;
      
      utilizing the static feature data from both the training source data and the training target data to train a second conversion model; and
      
      utilizing the dynamic feature data from both the training source data and the training target data to train the first conversion model.
  - 11. A computer program product according to claim 10, wherein the second executable portion includes instructions for:
    - applying the second conversion function to static feature vectors extracted from source speech; and
      
      combining an output of the first conversion function and the second conversion function for use in producing the converted speech.
  - 12. A computer program product according to claim 9, wherein the fourth executable portion includes instructions for:
    - extracting static and dynamic feature data from both training source data and training target data;
      
      combining the static and dynamic feature data to form general feature data; and
      
      utilizing the general feature data to train the first conversion model.
  - 13. A computer program product according to claim 8, wherein the third executable portion includes instructions for integrating a result of the applying the conversion function to estimate converted static features and combining the result of the applying the conversion function and the estimated converted static features for use in converted speech production.
  - 14. A computer program product according to claim 8, further comprising:
    - a fourth executable portion for extracting static feature vectors from source speech; and
      
      a fifth executable portion for combining the static feature vectors and the dynamic feature vectors to produce a general feature vector,wherein the second executable portion includes instructions for applying the first conversion function to the general feature vector for use in producing the converted speech.

15. An apparatus comprising:
- a feature extractor configured to extract dynamic feature vectors from source speech; and
  
  a transformation element in communication with the feature extractor and configured to apply a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors, the first conversion function having been trained using at least dynamic feature data associated with training source speech and training target speech, and produce converted speech based on an output of applying the first conversion function.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. An apparatus according to claim 15, further comprising a training element in communication with the transformation element, the training element being configured for an initial operation of training a conversion model to obtain the first conversion function.
  - 17. An apparatus according to claim 16, wherein the feature extractor is further configured to extract static and dynamic feature data from both training source data and training target data;
    - andwherein the training element is configured to utilize the static feature data from both the training source data and the training target data to train a second conversion model, and to utilize the dynamic feature data from both the training source data and the training target data to train the first conversion model.
  - 18. An apparatus according to claim 17, wherein the transformation element is further configured to:
    - apply the second conversion function to static feature vectors extracted from source speech; and
      
      combine an output of the first conversion function and an output of the second conversion function for use in producing the converted speech.
  - 19. An apparatus according to claim 16, wherein the feature extractor is configured to extract static and dynamic feature data from both training source data and training target data, and wherein the transformation element is configured to:
    - combine the static and dynamic feature data to form general feature data; and
      
      utilize the general feature data to train the first conversion model.
  - 20. An apparatus according to claim 15, wherein the transformation element is further configured to integrate a result of applying the conversion function to estimate converted static features and combining the result of the applying the conversion function and the estimated converted static features for use in converted speech production.
  - 21. An apparatus according to claim 15, wherein the feature extractor is configured to extract static feature vectors from source speech, and wherein the transformation element is configured to combine the static feature vectors and the dynamic feature vectors to produce a general feature vector, and to apply the first conversion function to the general feature vector for use in producing the converted speech.

22. An apparatus comprising:
- means for extracting dynamic feature vectors from source speech;
  
  means for applying a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors, the first conversion function having been trained using at least dynamic feature data associated with training source speech and training target speech; and
  
  means for producing converted speech based on an output of applying the first conversion function.
- View Dependent Claims (23)
- - 23. An apparatus according to claim 22, further comprising means for an initial operation of training a conversion model to obtain the first conversion function.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WSOU Investments, LLC (WSOU Holdings, LLC)
Original Assignee
Nokia Corporation
Inventors
Tian, Jilei, Nurminen, Jani K., Popa, Victor

Granted Patent

US 7,848,924 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/222
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

Method, apparatus and computer program product for providing voice conversion using temporal dynamic features

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method, apparatus and computer program product for providing voice conversion using temporal dynamic features

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links