Technology for responding to remarks using speech synthesis

US 10,490,181 B2
Filed: 12/12/2016
Issued: 11/26/2019
Est. Priority Date: 05/31/2013
Status: Active Grant

First Claim

Patent Images

1. A voice synthesis apparatus comprising:

a voice input section configured to receive a voice signal of a remark;

a pitch analysis section configured to analyze a pitch of a first segment of the remark, wherein the first segment is a word ending of the remark;

an acquisition section configured to acquire a reply to the remark; and

a voice generation section configured to generate voice of the reply acquired by said acquisition section, said voice generation section shifting pitches of the entire voice waveform data of the reply by a same amount so that a second segment of the reply has a pitch associated with the pitch of the first segment analyzed by said pitch analysis section, wherein the second segment is a word beginning or word ending of the reply.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is provided with: a voice input section that receives a remark (a question) via a voice signal; a reply creation section that creates a voice sequence of a reply (response) to the remark; a pitch analysis section that analyzes the pitch of a first segment (e.g., word ending) of the remark; and a voice generation section (a voice synthesis section, etc.) that generates a reply, in the form of voice, represented by the voice sequence. The voice generation section controls the pitch of the entire reply in such a manner that the pitch of a second segment (e.g., word ending) of the reply assumes a predetermined pitch (e.g., five degrees down) with respect to the pitch of the first segment of the remark. Such arrangements can realize synthesis of replying voice capable of giving a natural feel to the user.

Citations

17 Claims

1. A voice synthesis apparatus comprising:
- a voice input section configured to receive a voice signal of a remark;
  
  a pitch analysis section configured to analyze a pitch of a first segment of the remark, wherein the first segment is a word ending of the remark;
  
  an acquisition section configured to acquire a reply to the remark; and
  
  a voice generation section configured to generate voice of the reply acquired by said acquisition section, said voice generation section shifting pitches of the entire voice waveform data of the reply by a same amount so that a second segment of the reply has a pitch associated with the pitch of the first segment analyzed by said pitch analysis section, wherein the second segment is a word beginning or word ending of the reply.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The voice synthesis apparatus as claimed in claim 1, wherein the remark is a question, and the reply corresponds to the question.
  - 3. The voice synthesis apparatus as claimed in claim 1,wherein said voice generation section is configured to provisionally set the pitch of the second segment of the reply at the pitch associated with the pitch of the first segment, andwherein said voice generation section is further configured to perform at least one of:
    - an operation of, if the provisionally-set pitch of the second segment is lower than a predetermined first threshold value, shifting the provisionally-set pitch of the second segment to a pitch one octave up; and
      
      an operation of, if the provisionally-set pitch of the second segment is higher than a predetermined second threshold value, shifting the provisionally-set pitch of the second segment to a pitch one octave down.
  - 4. The voice synthesis apparatus as claimed in claim 1,wherein said voice generation section is configured to provisionally set the pitch of the second segment of the reply at the pitch associated with the pitch of the first segment, andwherein said voice generation section is further configured to shift the provisionally-set pitch to a pitch one octave up or down in accordance with a designated attribute.
  - 5. The voice synthesis apparatus as claimed in claim 1,wherein any one of a first mode and a second mode is settable as an operation mode of said voice generation section,wherein, in said first mode, said voice generation section is configured to shift the pitches of the entire voice waveform data of the reply in such a manner that the interval of the pitch of said second segment relative to the pitch of said first segment becomes a consonant interval, andwherein, in said second mode, said voice generation section is configured to shift the pitches of the entire voice waveform data of the reply in such a manner that the interval of the pitch of said second segment relative to the pitch of said first segment becomes a dissonant interval.
  - 6. The voice synthesis apparatus as claimed in claim 1, further comprising:
    - a non-linguistic analysis section configured to analyze non-linguistic information, other than pitch, related to the remark; and
      
      a control section configured to control voice generation, in said voice generation section, of the reply in accordance with the analyzed non-linguistic information.
  - 7. The voice synthesis apparatus as claimed in claim 1, further comprising:
    - a linguistic analysis section configured to analyze linguistic information included in the remark and the reply; and
      
      a control section configured to control voice generation, in said voice generation section, of the reply in accordance with the analyzed linguistic information.
  - 8. The voice synthesis apparatus as claimed in claim 1, further comprising:
    - a non-linguistic analysis section configured to analyze pitch variation in the remark; and
      
      a control section configured to control the pitches of the entire voice waveform data of the reply, generated in said voice generation section, to vary in accordance with the pitch variation in the remark.
  - 9. The voice synthesis apparatus as claimed in claim 1, wherein said voice generation section is configured to associate the pitch of said second segment with the pitch of said first segment in accordance with a given rule and generate voice with a characteristic based on a given agent attribute, the voice synthesis apparatus further comprising:
    - a control section configured to determine the rule based on at least one of an agent attribute and an attribute of a speaker of the remark.
  - 10. The voice synthesis apparatus as claimed in claim 1, wherein said voice generation section is configured to associate the pitch of said second segment with the pitch of said first segment in accordance with a given rule, the voice synthesis apparatus further comprising:
    - a control section configured to update the rule based on receiving voice of a further remark via said voice input section after sounding of the voice of the reply.
  - 11. The voice synthesis apparatus as claimed in claim 1, wherein said voice generation section is configured to shift the pitches of the entire voice waveform data of the reply in such a manner that an interval of the pitch of said second segment relative to the pitch of said first segment becomes a consonant interval.
  - 12. The voice synthesis apparatus as claimed in claim 11, wherein said voice generation section is configured to shift the pitches of the entire voice waveform data of the reply in such a manner that the interval of the pitch of said second segment relative to the pitch of said first segment becomes a consonant interval of five degrees lower than the pitch of said first segment.

13. A computer-implemented method comprising:
- receiving a voice signal of a remark;
  
  analyzing a pitch of a first segment of the remark, wherein the first segment is a word ending of the remark;
  
  acquiring a reply to the remark;
  
  synthesizing voice of the acquired reply; and
  
  shifting pitches of the entire voice waveform data of the reply by a same amount so that a pitch of a second segment of the voice of the reply has a pitch associated with the analyzed pitch of the first segment, wherein the second segment is a word beginning or word ending of the reply.

14. A coding/decoding device comprising:
- an A/D converter configured to convert an input voice signal of a remark into a digital signal;
  
  a pitch analysis section configured to analyze a pitch of a first segment of the remark based on the digital signal, wherein the first segment is a word ending of the remark;
  
  a back-channel feedback acquisition section configured to, when back-channel feedback is to be returned to the remark, acquire back-channel feedback data corresponding to a meaning of the remark;
  
  a pitch control section configured to shift pitches of the entire back-channel feedback data by a same amount so that a second segment of the back-channel feedback data has a pitch associated with the analyzed pitch of the first segment, wherein the second segment is a word beginning or word ending of the back-channel feedback data; and
  
  a D/A converter configured to convert the pitch-shifted back-channel feedback data into an analog signal.
- View Dependent Claims (15)
- - 15. The coding/decoding device as claimed in claim 14,wherein the digital signal converted by the A/D converter is supplied to a host computer,wherein the pitch control section is further configured to receive replying voice data, responsive to the remark, returned from the host computer and shift pitches of the replying voice data by a same amount so that a third segment of the received replying voice data has a pitch associated with the analyzed pitch of the first segment, andwherein the D/A converter is further configured to convert the pitch-shifted replying voice data, into an analog signal.

16. A voice synthesis system comprising a coding/decoding device and a host computer, said coding/decoding device comprising:
- an A/D converter configured to convert an input voice signal of a remark into a digital signal;
  
  a pitch analysis section configured to analyze a pitch of a first segment of the remark based on the digital signal, wherein the first segment is a word ending of the remark;
  
  a back-channel feedback acquisition section configured to, when back-channel feedback is to be returned to the remark, acquire back-channel feedback data corresponding to a meaning of the remark;
  
  a pitch control section configured to shift pitches of the entire back-channel feedback data by a same amount so that a second segment of the back-channel feedback data has a pitch associated with the analyzed pitch of the first segment, wherein the second segment is a word beginning or word ending of the back-channel feedback data; and
  
  a D/A converter configured to convert the pitch-shifted back-channel feedback data into an analog signal,wherein said host computer is configured to, when replying voice other than the back-channel feedback is to be returned to the remark, acquire replying voice data, responsive to the remark, in accordance with the digital signal converted by said A/D converter and return the acquired replying voice data to said coding/decoding device,wherein said pitch control section is further configured to shift pitches of the replying voice data by a same amount so that a third segment of the replying voice data returned from the host computer has a pitch associated with the analyzed pitch of the first segment, andwherein said D/A converter is further configured to convert the pitch-shifted replying voice data, into an analog signal.

17. A method comprising:
- converting, by means of an A/D converter, an input voice signal of a remark into a digital signal;
  
  analyzing, by means of a processor, a pitch of a first segment of the remark based on the digital signal, wherein the first segment is a word ending of the remark;
  
  acquiring, by means of the processor, back-channel feedback data corresponding to a meaning of the remark, when back-channel feedback is to be returned to the remark;
  
  shifting, by means of the processor, pitches of the entire back-channel feedback data by a same amount so that a second segment of the back-channel feedback data has a pitch associated with the analyzed pitch of the first segment, wherein the second segment is a word beginning or word ending of the back-channel feedback data; and
  
  converting, by means of a D/A converter, the pitch-shifted back-channel feedback data into an analog signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yamaha Corporation
Original Assignee
Yamaha Corporation
Inventors
Matsubara, Hiroaki, Ura, Junya, Kawahara, Takehiko, Hisaminato, Yuji, Yoshimura, Katsuji
Primary Examiner(s)
Baker, Matthew H

Application Number

US15/375,984
Publication Number

US 20170110111A1
Time in Patent Office

1,079 Days
Field of Search
US Class Current
CPC Class Codes

G10L 13/027   Concept to speech synthesis...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/0335   Pitch control

G10L 13/06   Elementary speech units use...

G10L 13/10   Prosody rules derived from ...

G10L 15/18   using natural language mode...

G10L 25/90   Pitch determination of spee...

H04M 2201/39   using speech synthesis

Technology for responding to remarks using speech synthesis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Technology for responding to remarks using speech synthesis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links