SPEECH SYNTHESIZER

US 20090254349A1
Filed: 05/11/2007
Published: 10/08/2009
Est. Priority Date: 06/05/2006
Status: Abandoned Application

First Claim

Patent Images

1. A speech synthesis system that generates synthetic speech which conforms to phonetic symbols and prosody information, said speech synthesis system comprising a generation terminal, a server, and a reception terminal that are connected to each other via a computer network,said generation terminal including:

a small database holding pieces of synthetic speech generation data used for generating synthetic speech; and

a synthetic speech generation data selection unit configured to select, from said small database, pieces of synthetic speech generation data from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated,said server includinga large database holding speech elements which are greater in number than the pieces of synthetic speech generation data held in said small database and from which synthetic speech that can represent more detailed prosody information than the pieces of synthetic speech generation data held in said small database is to be generated, andsaid reception terminal including;

a conforming speech element selection unit configured to select, from said large database, speech elements which correspond to the pieces of synthetic speech generation data selected by said synthetic speech generation data selection unit and from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated; and

a speech element concatenation unit configured to generate synthetic speech by concatenating the speech elements selected by said conforming speech element selection unit.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech synthesizer can execute speech content editing at high speed and generate speech content easily. The speech synthesizer includes a small speech element DB (101), a small speech element selection unit (102), a small speech element concatenation unit (103), a prosody modification unit (104), a large speech element DB (105), a correspondence DB (106) that associates the small speech element DB (101) with the large speech element DB (105), a speech element candidate obtainment unit (107), a large speech element selection unit (108), and a large speech element concatenation unit (109). By editing synthetic speech using the small speech element DB (101) and performing quality enhancement on an editing result using the large speech element DB (105), speech content can be generated easily on a mobile terminal.

26 Citations

View as Search Results

13 Claims

1. A speech synthesis system that generates synthetic speech which conforms to phonetic symbols and prosody information, said speech synthesis system comprising a generation terminal, a server, and a reception terminal that are connected to each other via a computer network,said generation terminal including:
- a small database holding pieces of synthetic speech generation data used for generating synthetic speech; and
  
  a synthetic speech generation data selection unit configured to select, from said small database, pieces of synthetic speech generation data from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated,said server includinga large database holding speech elements which are greater in number than the pieces of synthetic speech generation data held in said small database and from which synthetic speech that can represent more detailed prosody information than the pieces of synthetic speech generation data held in said small database is to be generated, andsaid reception terminal including;
  
  a conforming speech element selection unit configured to select, from said large database, speech elements which correspond to the pieces of synthetic speech generation data selected by said synthetic speech generation data selection unit and from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated; and
  
  a speech element concatenation unit configured to generate synthetic speech by concatenating the speech elements selected by said conforming speech element selection unit.

2. A generation terminal that generates simple synthetic speech which conforms to phonetic symbols and prosody information, said generation terminal comprising:
- a small database holding speech elements used for generating synthetic speech;
  
  a synthetic speech generation data selection unit configured to select, from said small database, pieces of synthetic speech generation data from which synthetic speech that conforms to the phonetic symbols and the prosody information is to be generated; and
  
  a transmission unit configured to transmit the pieces of synthetic speech generation data,wherein said transmission unit is configured to transmit, to a server that includes a large database holding speech elements which are greater in number than the speech elements held in said small database, the pieces of synthetic speech generation data to be associated with speech elements in the large database.
- View Dependent Claims (3)
- - 3. The generation terminal according to claim 2, further comprising:
    - a small speech element concatenation unit configured to generate simple synthetic speech by concatenating speech elements selected by said synthetic speech generation data selection unit; and
      
      a prosody information modification unit configured to receive information for modifying prosody information of the simple synthetic speech and modify the prosody information according to the received information,wherein said synthetic speech generation data selection unit is configured to, when the prosody information of the simple synthetic speech is modified, re-select, from said small database, pieces of synthetic speech generation data from which synthetic speech that conforms to the phonetic symbols and the modified prosody information is to be generated, and output the re-selected pieces of synthetic speech generation data to said small speech element concatenation unit, andsaid transmission unit is configured to transmit the pieces of synthetic speech data determined as a result of the modification and the re-selection.

4. A server that generates synthetic speech which conforms to phonetic symbols and prosody information, said server comprising:
- a reception unit configured to receive pieces of synthetic speech generation data generated by a generation terminal;
  
  a large database holding speech elements which are greater in number than pieces of synthetic speech generation data held in a small database; and
  
  a correspondence database holding correspondence information that shows a relation between each piece of synthetic speech generation data held in the small database and one or more speech elements corresponding to the piece of synthetic speech generation data.

5. A speech synthesizer that generates synthetic speech which conforms to phonetic symbols and prosody information, said speech synthesizer comprising:
- a small database holding pieces of synthetic speech generation data used for generating synthetic speech;
  
  a large database holding speech elements which are greater in number than the pieces of synthetic speech generation data held in said small database;
  
  a synthetic speech generation data selection unit configured to select, from said small database, pieces of synthetic speech generation data from which synthetic speech that conforms to the phonetic symbols and the prosody information is to be generated;
  
  a conforming speech element selection unit configured to select, from said large database, speech elements which correspond to the pieces of synthetic speech generation data selected by said synthetic speech generation data selection unit; and
  
  a speech element concatenation unit configured to generate synthetic speech by concatenating the speech elements selected by said conforming speech element selection unit.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. The speech synthesizer according to claim 5, further comprising:
    - a small speech element concatenation unit configured to generate simple synthetic speech by concatenating speech elements selected by said synthetic speech generation data selection unit; and
      
      a prosody information modification unit configured to receive information for modifying prosody information of the simple synthetic speech and modify the prosody information according to the received information,wherein said synthetic speech generation data selection unit is configured to, when the prosody information of the simple synthetic speech is modified, re-select, from said small database, pieces of synthetic speech generation data from which synthetic speech that conforms to the phonetic symbols and the modified prosody information is to be generated, and output the re-selected pieces of synthetic speech generation data to said small speech element concatenation unit, andsaid conforming speech element selection unit is configured to receive the pieces of synthetic speech generation data determined as a result of the modification and the re-selection, and select, from said large database, speech elements which correspond to the received pieces of synthetic speech generation data.
  - 7. The speech synthesizer according to claim 5, further comprisinga correspondence database holding correspondence information that shows a relation between each piece of synthetic speech generation data held in said small database and one or more speech elements corresponding to the piece of synthetic speech generation data,wherein said conforming speech element selection unit includes:
    - a speech element obtainment unit configured to specify, using the correspondence information held in said correspondence database, speech elements that correspond to the pieces of synthetic speech generation data selected by said synthetic speech generation data selection unit, and obtain the specified speech elements from said large database as candidates; and
      
      a speech element selection unit configured to select, from the speech elements obtained by said speech element obtainment unit as the candidates, speech elements from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated,wherein said speech element concatenation unit is configured to generate the synthetic speech by concatenating the speech elements selected by said speech element selection unit.
  - 8. The speech synthesizer according to claim 5,wherein said large database is provided in a server that is connected to said speech synthesizer via a computer network, andsaid conforming speech element selection unit is configured to select the speech elements from said large database provided in the server.
  - 9. The speech synthesizer according to claim 5,wherein said small database holds speech elements each of which is representative of a different one of clusters generated by clustering the speech elements held in said large database.
  - 10. The speech synthesizer according to claim 9,wherein said small database holds speech elements each of which is representative of a different one of clusters generated by clustering the speech elements held in said large database in accordance with at least one of a fundamental frequency, a duration, power information, a formant parameter, and a cepstrum coefficient of each of the speech elements.
  - 11. The speech synthesizer according to claim 5,wherein said small database holds hidden Markov models, andsaid large database holds speech elements that are learning samples used when generating the hidden Markov models held in said small database.

12. A speech synthesis method for generating synthetic speech which conforms to phonetic symbols and prosody information, said speech synthesis method comprising:
- selecting, from a small database holding pieces of synthetic speech generation data used for generating synthetic speech, pieces of synthetic speech generation data from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated;
  
  selecting, from a large database holding speech elements which are greater in number than the pieces of synthetic speech generation data held in the small database and from which synthetic speech that can represent more detailed prosody information than the pieces of synthetic speech generation data held in the small database is to be generated, speech elements which correspond to the pieces of synthetic speech generation data selected in said selecting pieces of synthetic speech generation data and from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated; and
  
  generating synthetic speech by concatenating the speech elements selected in said selecting speech elements.

13. A program for generating synthetic speech which conforms to phonetic symbols and prosody information, said program causing a computer to execute:
- selecting, from a small database holding pieces of synthetic speech generation data used for generating synthetic speech, pieces of synthetic speech generation data from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated;
  
  selecting, from a large database holding speech elements which are greater in number than the pieces of synthetic speech generation data held in the small database and from which synthetic speech that can represent more detailed prosody information than the pieces of synthetic speech generation data held in the small database is to be generated, speech elements which correspond to the pieces of synthetic speech generation data selected in said selecting pieces of synthetic speech generation data and from which synthetic speech that best conforms to the phonetic symbols and the prosody information is to be generated; and
  
  generating synthetic speech by concatenating the speech elements selected in said selecting speech elements.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Original Assignee
Panasonic Corporation (Panasonic Holdings Corporation)
Inventors
Hirose, Yoshifumi, Kamai, Takahiro, Kato, Yumiko

Application Number

US12/303,455
Publication Number

US 20090254349A1
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

G10L 13/04 Details of speech synthesis...

SPEECH SYNTHESIZER

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

26 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH SYNTHESIZER

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links