Speech synthesizing device, speech synthesizing method, and program
First Claim
1. A speech synthesizing device comprising:
- an utterance form selection unit that analyzes a music signal reproduced in a user environment and determines an utterance form that matches an analysis result of the music signal;
a speech synthesizing unit that synthesizes a speech according to the utterance form;
a music signal power calculation unit that analyzes the music signal and calculates a power of the music signal;
a synthesized speech power calculation unit that analyzes the synthesized speech waveform and calculates a power of the synthesized speech; and
a synthesized speech power adjustment unit that references a ratio predetermined for each utterance form between a power of the music signal and a power of the synthesized speech and adjusts a power of the synthesized speech waveform, generated according to the utterance form, according to the power of the music signal.
1 Assignment
0 Petitions
Accused Products
Abstract
An object of the present invention is to provide a device and a method for generating a synthesized speech that has an utterance form that matches music. A musical genre estimation unit of the speech synthesizing device estimates the musical genre to which a received music signal belongs, an utterance form selection unit references an utterance form information storage unit to determine an utterance form from the musical genre. A prosody generation unit references a prosody generation rule storage unit, selected from prosody generation rule storage units 151 to 15N according to the utterance form, and generates prosody information from a phonetic symbol sequence. A unit waveform selection unit references a unit waveform data storage unit, selected from unit waveform data storage units 161 to 16N according to the utterance form, and selects a unit waveform from the phonetic symbol sequence and the prosody information. A waveform generation unit generates a synthesized speech waveform from the prosody information and the unit waveform data.
11 Citations
3 Claims
-
1. A speech synthesizing device comprising:
-
an utterance form selection unit that analyzes a music signal reproduced in a user environment and determines an utterance form that matches an analysis result of the music signal; a speech synthesizing unit that synthesizes a speech according to the utterance form; a music signal power calculation unit that analyzes the music signal and calculates a power of the music signal; a synthesized speech power calculation unit that analyzes the synthesized speech waveform and calculates a power of the synthesized speech; and a synthesized speech power adjustment unit that references a ratio predetermined for each utterance form between a power of the music signal and a power of the synthesized speech and adjusts a power of the synthesized speech waveform, generated according to the utterance form, according to the power of the music signal.
-
-
2. A speech synthesizing method that generates a synthesized speech using a speech synthesizing device, said method comprising:
-
analyzing, by said speech synthesizing device, a music signal reproduced in a user environment and determining an utterance form that matches an analysis result of the music signal; synthesizing, by said speech synthesizing device, a speech according to the utterance form; analyzing, by said speech synthesizing device, the music signal and calculating a power of the music signal; analyzing, by said speech synthesizing device, the synthesized speech waveform and calculating a power of the synthesized speech; and referencing, by said speech synthesizing device, a ratio predetermined for each utterance form between a power of the music signal and a power of the synthesized speech and adjusting a power of the synthesized speech waveform, generated according to the utterance form, according to the power of the music signal.
-
-
3. A non-transitory computer readable medium storing a computer program causing a computer, which constitutes a speech synthesizing device, to execute:
-
processing for analyzing a received music signal reproduced in a user environment and determining an utterance form, which matches an analysis result of the music signal, from utterance forms prepared in advance;
processing for synthesizing a speech according to the utterance form;processing for analyzing the music signal and estimating a musical genre to which the music belongs; processing for selecting an utterance form according to the musical genre to determine the utterance form that matches the analysis result of the music signal; processing for analyzing the music signal and calculating a power of the music signal; processing for analyzing the synthesized speech waveform and calculating a power of the synthesized speech; and processing for referencing a ratio predetermined for each utterance form between a power of the music signal and a power of the synthesized speech and adjusting a power of the synthesized speech waveform, generated according to the utterance form, according to the power of the music signal.
-
Specification