Speech synthesis device and method

US 10,217,452 B2
Filed: 04/19/2017
Issued: 02/26/2019
Est. Priority Date: 10/20/2014
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis device comprising:

a reception section that receives a voice signal of an utterance;

a pitch detection section that, based on the voice signal received by the reception section, detects a pitch of a representative portion which is a part of the utterance;

a response acquisition section that acquires voice data of a response to the utterance;

a response pitch acquisition section that acquires a pitch based on the voice data of the response acquired by the response acquisition section;

a pitch shift amount determination section that determines a pitch shift amount for shifting the pitch acquired by the response pitch acquisition section to a target pitch having a particular relationship to the pitch of the representative portion; and

a response synthesis section that synthesizes voice of the response based on the voice data of the response, the response synthesis section being configured to shift, in accordance with the shift amount, a pitch of the voice of the response to be synthesized.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This invention is an improvement of technology for automatically generating response voice to voice uttered by a speaker (user), and is characterized by controlling a pitch of the response voice in accordance with a pitch of the speaker'"'"'s utterance. A voice signal of the speaker'"'"'s utterance (e.g., question) is received, and a pitch (e.g., highest pitch) of a representative portion of the utterance is detected. Voice data of a responsive to the utterance is acquired, and a pitch (e.g., average pitch) based on the acquired response voice data is acquired. A pitch shift amount for shifting the acquired pitch to a target pitch having a particular relationship to the pitch of the representative portion is determined. When response voice is to be synthesized on the basis of the response voice data, the pitch of the response voice to be synthesized is shifted in accordance with the pitch shift amount.

Citations

17 Claims

1. A speech synthesis device comprising:
- a reception section that receives a voice signal of an utterance;
  
  a pitch detection section that, based on the voice signal received by the reception section, detects a pitch of a representative portion which is a part of the utterance;
  
  a response acquisition section that acquires voice data of a response to the utterance;
  
  a response pitch acquisition section that acquires a pitch based on the voice data of the response acquired by the response acquisition section;
  
  a pitch shift amount determination section that determines a pitch shift amount for shifting the pitch acquired by the response pitch acquisition section to a target pitch having a particular relationship to the pitch of the representative portion; and
  
  a response synthesis section that synthesizes voice of the response based on the voice data of the response, the response synthesis section being configured to shift, in accordance with the shift amount, a pitch of the voice of the response to be synthesized.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The speech synthesis device as claimed in claim 1, wherein the pitch shift amount determination section determines the target pitch such that the target pitch falls within a predetermined range from the acquired pitch.
  - 3. The speech synthesis device as claimed in claim 2, wherein the pitch shift amount determination section adjusts the target pitch on an octave-by-octave basis such that the target pitch falls within the predetermined range from the acquired pitch.
  - 4. The speech synthesis device as claimed in claim 1, wherein the pitch detection section detects, as the pitch of the representative portion, a highest pitch in a portion of the received voice signal where volume is of a predetermined value or over.
  - 5. The speech synthesis device as claimed in claim 1, wherein the pitch detection section detects, as the pitch of the representative portion, a pitch in a trailing end portion of the received voice signal.
  - 6. The speech synthesis device as claimed in claim 1, wherein the pitch shift amount determination section determines, as the target pitch, a pitch having a consonant-interval relationship to the pitch of the representative portion.
  - 7. The speech synthesis device as claimed in claim 6, wherein the pitch shift amount determination section determines, as the target pitch, a pitch that is five semitones lower than the pitch of the representative portion.
  - 8. The speech synthesis device as claimed in claim 1, wherein the response synthesis section is configured to variably control a reproduction speed of the voice of the response to be synthesized.
  - 9. The speech synthesis device as claimed in claim 1, wherein the response synthesis section is further configured to control over time volume of the voice of the response to be synthesized.
  - 10. The speech synthesis device as claimed in claim 1, wherein the response synthesis section is further configured to control over time the pitch of the voice of the response to be synthesized.
  - 11. The speech synthesis device as claimed in claim 1, wherein a trailing end portion of the utterance of the received voice signal is set as the representative portion.
  - 12. The speech synthesis device as claimed in claim 11, wherein a highest pitch in the trailing end portion is the pitch to be detected by the pitch detection section.
  - 13. The speech synthesis device as claimed in claim 11, wherein the trailing end portion is a portion between an end point and a time point preceding the end point by a predetermined time.
  - 14. The speech synthesis device as claimed in claim 1, wherein the representative portion is a portion of the utterance where the pitch of the utterance impresses listeners more than other portions of the utterance.
  - 15. The speech synthesis device as claimed in claim 1, wherein the pitch acquired by the response pitch acquisition section is an average pitch of the response.

16. A speech synthesis method comprising:
- receiving a voice signal of an utterance;
  
  detecting, based on the received voice signal, a pitch of a representative portion which is a part of the utterance;
  
  acquiring voice data of a response to the utterance;
  
  acquiring a pitch based on the acquired voice data of the response;
  
  determining a pitch shift amount for shifting the acquired pitch to a target pitch having a particular relationship to the pitch of the representative portion; and
  
  synthesizing voice of the response based on the voice data of the response and with shifting, in accordance with the shift amount, a pitch of the voice of the response to be synthesized.

17. A non-transitory computer-readable storage medium storing a group of instructions executable by a processor for implementing a speech synthesis method, the speech synthesis method comprising:
- receiving a voice signal of an utterance;
  
  detecting, based on the received voice signal, a pitch of a representative portion which is a part of the utterance;
  
  acquiring voice data of a response to the utterance;
  
  acquiring a pitch based on the acquired voice data of the response;
  
  determining a pitch shift amount for shifting the acquired pitch to a target pitch having a particular relationship to the pitch of the representative portion; and
  
  synthesizing voice of the response based on the voice data of the response and with shifting, in accordance with the shift amount, a pitch of the voice of the response to be synthesized.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yamaha Corporation
Original Assignee
Yamaha Corporation
Inventors
Kayama, Hiraku, Matsubara, Hiroaki
Primary Examiner(s)
Abebe, Daniel

Application Number

US15/491,414
Publication Number

US 20170221470A1
Time in Patent Office

678 Days
Field of Search

704275
US Class Current
CPC Class Codes

G10L 13/0335   Pitch control

G10L 15/22   Procedures used during a sp...

G10L 21/0364   for improving intelligibility

G10L 25/90   Pitch determination of spee...

Speech synthesis device and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis device and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links