Multilingual text-to-speech system with limited resources

US 7,596,499 B2
Filed: 02/02/2004
Issued: 09/29/2009
Est. Priority Date: 02/02/2004
Status: Expired due to Fees

First Claim

Patent Images

1. A multilingual text-to-speech system, comprising:

a source datastore of primary source parameters providing information mainly about a speaker of a primary language;

a plurality of primary filter parameters providing information mainly about sounds in the primary language; and

a plurality of secondary filter parameters providing information mainly about sounds in a secondary language, wherein at least one secondary filter parameter of the plurality of secondary filter parameters is normalized to the plurality of primary filter parameters based on similarities between a) voice characteristics of the sounds whose information is provided by the plurality of primary filter parameters and b) voice characteristics of the sounds whose information is provided by the at least one secondary filter parameter, wherein the at least one secondary filter parameter is mapped to a primary source parameter.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multilingual text-to-speech system includes a source datastore of primary source parameters providing information about a speaker of a primary language. A plurality of primary filter parameters provides information about sounds in the primary language. A plurality of secondary filter parameters provides information about sounds in a secondary language. One or more secondary filter parameters is normalized to the primary filter parameters and mapped to a primary source parameter.

275 Citations

36 Claims

1. A multilingual text-to-speech system, comprising:
- a source datastore of primary source parameters providing information mainly about a speaker of a primary language;
  
  a plurality of primary filter parameters providing information mainly about sounds in the primary language; and
  
  a plurality of secondary filter parameters providing information mainly about sounds in a secondary language, wherein at least one secondary filter parameter of the plurality of secondary filter parameters is normalized to the plurality of primary filter parameters based on similarities between a) voice characteristics of the sounds whose information is provided by the plurality of primary filter parameters and b) voice characteristics of the sounds whose information is provided by the at least one secondary filter parameter, wherein the at least one secondary filter parameter is mapped to a primary source parameter.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The system of claim 1, further comprising a normalization module adapted to normalize the secondary filter parameters to the primary filter parameters.
  - 3. The system of claim 1, further comprising a mapping module adapted to map the secondary filter parameters to the primary source parameters based on linguistic similarities between target sounds in the secondary language and primary source parameters in the primary language.
  - 4. The system of claim 1, further comprising:
    - an input receptive of text; and
      
      a speech synthesizer adapted to convert the text-to-speech based on said primary filter parameters and said secondary filter parameters.
  - 5. The system of claim 1, wherein said secondary filter parameters are selected based on at least one of their relationships to sounds not present in the primary language and their dissimilarities to said primary filter parameters.
  - 6. The system of claim 1, further comprising:
    - a similarity assessment module adapted to assess linguistic similarity between target sounds in the secondary language and primary source parameters in the primary language;
      
      a memory management module adapted to compare the linguistic similarities to a linguistic similarity threshold, store secondary source parameters providing information mainly about a speaker in the second language in memory based on linguistic similarity between the secondary source parameters and target sounds exhibiting linguistic similarities falling below the predetermined threshold; and
      
      a mapping module adapted to map secondary filter parameters providing information mainly about the target sounds exhibiting linguistic similarities falling below the predetermined threshold to the secondary source parameters based on linguistic similarity.
  - 7. The system of claim 1, further comprising a plurality of primary prosody parameters, wherein at least one secondary filter parameter is mapped to a primary prosody parameter.
  - 8. The system of claim 7, further comprising a plurality of secondary prosody parameters selected to supplement said primary prosody parameters, wherein at least one secondary filter parameter is mapped to a secondary prosody parameter.
  - 9. The system of claim 1, further comprising:
    - a parameter output adapted to transmit an amount of available local memory and information relating to linguistic parameters stored in local memory to a supply of additional linguistic parameters not stored in local memory; and
      
      a parameter input receptive of additional linguistic parameters preselected based on the amount of available local memory, including additional filter parameters pre-mapped to said primary source parameters.
  - 10. The system of claim 9, wherein the additional filter parameters are pre-normalized to said primary filter parameters.
  - 11. The system of claim 9, wherein said parameter output is adapted to transmit a user-specified quality preference, and the additional linguistic parameters are preselected based on the user-specified quality preference.
  - 12. The system of claim 9, wherein the additional filter parameters are pre-mapped to primary prosody parameters stored in local memory.
  - 13. The system of claim 12, wherein the additional linguistic parameters include additional prosody parameters pre-selected to supplement the primary prosody parameters based on the amount of available local memory.
  - 14. The system of claim 1, further comprising an input receptive of an initial set of secondary filter parameters.
  - 15. The system of claim 14, further comprising a similarity assessment module adapted to assess similarity between the initial set of secondary filter parameters and said primary filter parameters.
  - 16. The system of claim 15, further comprising a memory management module adapted to compare similarity of the initial set of secondary filter parameters to a similarity threshold, to select a portion of the secondary filter parameters based on the comparison, to store the portion of the secondary filter parameters that are selected in a memory resource, and to discard an unselected portion of the initial set of secondary filter parameters.
  - 17. The system of claim 16, wherein the similarity threshold is selected to ensure that the secondary filter parameters of the initial set that are related to sounds not present in the primary language are not discarded.
  - 18. The system of claim 16, wherein said memory management module is adapted to monitor use of the memory resource and to dynamically adjust the similarity threshold based on scarcity of the memory resource.

19. A method of operation for use with a multilingual text-to-speech system, comprising:
- accessing primary source parameters providing information mainly about a speaker of a primary language;
  
  accessing primary filter parameters providing information mainly about sounds in the primary language;
  
  accessing secondary filter parameters providing information mainly about sounds in a secondary language, wherein at least one secondary filter parameter of the secondary filter parameters is normalized to the primary filter parameters based on similarities between a) voice characteristics of the sounds whose information is provided by the primary filter parameters and b) voice characteristics of the sounds whose information is provided by the at least one secondary filter parameter, wherein the at least one secondary filter parameter is mapped to a primary source parameterreceiving text; and
  
  converting the text to speech based on the primary filter parameters and the secondary filter parameters.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
- - 20. The method of claim 19, further comprising normalizing the secondary filter parameters to the primary filter parameters.
  - 21. The method of claim 19, further comprising mapping the primary source parameters to the secondary filter parameters based on linguistic similarities between target sounds in the secondary language and primary source parameters in the primary language.
  - 22. The method of claim 19, further comprising receiving an initial set of secondary filter parameters.
  - 23. The method of claim 19, further comprising selecting the secondary filter parameters based on at least one of their relationships to sounds not present in the primary language and their dissimilarities to the primary filter parameters.
  - 24. The method of claim 19, further comprising:
    - assessing linguistic similarity between target sounds in the secondary language and primary source parameters in the primary language;
      
      comparing the linguistic similarities to a linguistic similarity threshold;
      
      storing secondary source parameters providing information mainly about a speaker in the second language in memory based on linguistic similarity between the secondary source parameters and target sounds exhibiting linguistic similarities falling below the predetermined threshold; and
      
      mapping secondary filter parameters providing information mainly about target sounds exhibiting linguistic similarities falling below the predetermined threshold to the secondary source parameters based on linguistic similarity.
  - 25. The method of claim 19, further comprising:
    - accessing a plurality of primary prosody parameters; and
      
      mapping at least one secondary filter parameter to the primary prosody parameters.
  - 26. The method of claim 25, further comprising:
    - accessing a plurality of secondary prosody parameters selected to supplement said primary prosody parameters; and
      
      mapping at least one secondary filter parameters to said secondary prosody parameters.
  - 27. The method of claim 19, further comprising assessing similarity between the initial set of secondary filter parameters and the primary filter parameters.
  - 28. The method of claim 27, further comprising:
    - comparing similarity of the initial set of secondary filter parameters to a similarity threshold;
      
      selecting a portion of the secondary filter parameters based on the comparison;
      
      storing the portion of the secondary filter parameters that are selected in a memory resource; and
      
      discarding an unselected portion of the initial set of secondary filter parameters.
  - 29. The method of claim 28, further comprising selecting the similarity threshold to ensure that the secondary filter parameters of the initial set that are related to sounds not present in the primary language are not discarded.
  - 30. The method of claim 28, further comprising:
    - monitoring use of the memory resource; and
      
      dynamically adjusting the similarity threshold based on scarcity of the memory resource.
  - 31. The method of claim 19, further comprising:
    - transmitting an amount of available local memory and information relating to linguistic parameters stored in local memory to a supply of additional linguistic parameters not stored in local memory; and
      
      receiving additional linguistic parameters preselected based on the amount of available local memory, including additional filter parameters pre-mapped to said primary source parameters.
  - 32. The method of claim 31, wherein the additional filter parameters are pre-normalized to said primary filter parameters.
  - 33. The system of claim 31, further comprising transmitting a user-specified quality preference, wherein the additional linguistic parameters are further preselected based on the user-specified quality preference.
  - 34. The method of claim 31, wherein the additional filter parameters are pre-mapped to primary prosody parameters stored in local memory.
  - 35. The method of claim 34, wherein the additional linguistic parameters include additional prosody parameters pre-selected to supplement the primary prosody parameters based on the amount of available local memory.

36. A multilingual text-to-speech system, comprising:
- a primary source module having a plurality of primary source parameters providing information mainly about a speaker of a primary language, wherein the plurality of source parameters defines a first sound source, of human speech, that generates a first excitation signal in the primary language;
  
  a primary filter module having a plurality of primary filter parameters providing information mainly about sounds in the primary language, wherein the plurality of primary filter parameters define shaping applied to the first excitation signal to produce signal waveform of the sounds in the primary language; and
  
  a secondary filter module having a plurality of secondary filter parameters providing information mainly about sounds in a secondary language, wherein the plurality of secondary filter parameters define shaping applied to a second excitation signal, generated by a second sound source of human speech, to produce signal waveform of the sounds in the secondary language, wherein at least one of the plurality of secondary filter parameters is normalized to the primary filter parameters to imitate voice characteristics of the first sound source; and
  
  a mapping module that selects at least one from the plurality of primary source parameters to substitute at least one of a plurality of secondary source parameters based on linguistic similarities between a target sound defined by the substituted at least one secondary source parameter and a target sound defined by the selected at least one primary source parameter, wherein the plurality of secondary source parameters define the second sound source, wherein the system selectively applies at least one of the plurality of secondary filter parameters to the selected at least one primary source parameter.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Anguera Miro, Xavier, Junqua, Jean-Claude, Veprek, Peter
Primary Examiner(s)
Opsasnick; Michael N

Application Number

US10/771,256
Publication Number

US 20050182630A1
Time in Patent Office

2,066 Days
Field of Search

704/9, 704/258, 704/277
US Class Current

704/277
CPC Class Codes

G10L 13/08 Text analysis or generation...

Multilingual text-to-speech system with limited resources

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

275 Citations

36 Claims

Specification

Use Cases

Quick Links

Others

Multilingual text-to-speech system with limited resources

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

275 Citations

36 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others