Speech synthesis device and method
First Claim
1. A speech synthesis device comprising:
- a reception section that receives a voice signal of an utterance;
a pitch detection section that, based on the voice signal received by the reception section, detects a pitch of a representative portion which is a part of the utterance;
a response acquisition section that acquires voice data of a response to the utterance;
a response pitch acquisition section that acquires a pitch based on the voice data of the response acquired by the response acquisition section;
a pitch shift amount determination section that determines a pitch shift amount for shifting the pitch acquired by the response pitch acquisition section to a target pitch having a particular relationship to the pitch of the representative portion; and
a response synthesis section that synthesizes voice of the response based on the voice data of the response, the response synthesis section being configured to shift, in accordance with the shift amount, a pitch of the voice of the response to be synthesized.
1 Assignment
0 Petitions
Accused Products
Abstract
This invention is an improvement of technology for automatically generating response voice to voice uttered by a speaker (user), and is characterized by controlling a pitch of the response voice in accordance with a pitch of the speaker'"'"'s utterance. A voice signal of the speaker'"'"'s utterance (e.g., question) is received, and a pitch (e.g., highest pitch) of a representative portion of the utterance is detected. Voice data of a responsive to the utterance is acquired, and a pitch (e.g., average pitch) based on the acquired response voice data is acquired. A pitch shift amount for shifting the acquired pitch to a target pitch having a particular relationship to the pitch of the representative portion is determined. When response voice is to be synthesized on the basis of the response voice data, the pitch of the response voice to be synthesized is shifted in accordance with the pitch shift amount.
-
Citations
17 Claims
-
1. A speech synthesis device comprising:
-
a reception section that receives a voice signal of an utterance; a pitch detection section that, based on the voice signal received by the reception section, detects a pitch of a representative portion which is a part of the utterance; a response acquisition section that acquires voice data of a response to the utterance; a response pitch acquisition section that acquires a pitch based on the voice data of the response acquired by the response acquisition section; a pitch shift amount determination section that determines a pitch shift amount for shifting the pitch acquired by the response pitch acquisition section to a target pitch having a particular relationship to the pitch of the representative portion; and a response synthesis section that synthesizes voice of the response based on the voice data of the response, the response synthesis section being configured to shift, in accordance with the shift amount, a pitch of the voice of the response to be synthesized. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A speech synthesis method comprising:
-
receiving a voice signal of an utterance; detecting, based on the received voice signal, a pitch of a representative portion which is a part of the utterance; acquiring voice data of a response to the utterance; acquiring a pitch based on the acquired voice data of the response; determining a pitch shift amount for shifting the acquired pitch to a target pitch having a particular relationship to the pitch of the representative portion; and synthesizing voice of the response based on the voice data of the response and with shifting, in accordance with the shift amount, a pitch of the voice of the response to be synthesized.
-
-
17. A non-transitory computer-readable storage medium storing a group of instructions executable by a processor for implementing a speech synthesis method, the speech synthesis method comprising:
-
receiving a voice signal of an utterance; detecting, based on the received voice signal, a pitch of a representative portion which is a part of the utterance; acquiring voice data of a response to the utterance; acquiring a pitch based on the acquired voice data of the response; determining a pitch shift amount for shifting the acquired pitch to a target pitch having a particular relationship to the pitch of the representative portion; and synthesizing voice of the response based on the voice data of the response and with shifting, in accordance with the shift amount, a pitch of the voice of the response to be synthesized.
-
Specification