TECHNOLOGY FOR RESPONDING TO REMARKS USING SPEECH SYNTHESIS

US 20170110111A1
Filed: 12/12/2016
Published: 04/20/2017
Est. Priority Date: 05/31/2013
Status: Active Grant

First Claim

Patent Images

1. A voice synthesis apparatus comprising:

a voice input section configured to receive a voice signal of a remark;

a pitch analysis section configured to analyze a pitch of a first segment of the remark;

an acquisition section configured to acquire a reply to the remark; and

a voice generation section configured to generate voice of the reply acquired by said acquisition section, said voice generation section controlling a pitch of the voice of the reply in such a manner that a second segment of the reply has a pitch associated with the pitch of the first segment analyzed by said pitch analysis section,wherein said voice generation section controls the pitch of the voice of the reply in such a manner that an interval of the pitch of said second segment relative to the pitch of said first segment becomes a consonant interval.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is provided with: a voice input section that receives a remark (a question) via a voice signal; a reply creation section that creates a voice sequence of a reply (response) to the remark; a pitch analysis section that analyzes the pitch of a first segment (e.g., word ending) of the remark; and a voice generation section (a voice synthesis section, etc.) that generates a reply, in the form of voice, represented by the voice sequence. The voice generation section controls the pitch of the entire reply in such a manner that the pitch of a second segment (e.g., word ending) of the reply assumes a predetermined pitch (e.g., five degrees down) with respect to the pitch of the first segment of the remark. Such arrangements can realize synthesis of replying voice capable of giving a natural feel to the user.

Citations

16 Claims

1. A voice synthesis apparatus comprising:
- a voice input section configured to receive a voice signal of a remark;
  
  a pitch analysis section configured to analyze a pitch of a first segment of the remark;
  
  an acquisition section configured to acquire a reply to the remark; and
  
  a voice generation section configured to generate voice of the reply acquired by said acquisition section, said voice generation section controlling a pitch of the voice of the reply in such a manner that a second segment of the reply has a pitch associated with the pitch of the first segment analyzed by said pitch analysis section,wherein said voice generation section controls the pitch of the voice of the reply in such a manner that an interval of the pitch of said second segment relative to the pitch of said first segment becomes a consonant interval.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The voice synthesis apparatus as claimed in claim 1, wherein the first segment is a word ending of the remark being a question, and said second segment is a word beginning or word ending of the reply.
  - 3. The voice synthesis apparatus as claimed in claim 1, wherein said voice generation section controls the pitch of the voice of the reply in such a manner that the interval of the pitch of said second segment relative to the pitch of said first segment becomes a consonant interval of five degrees lower than the pitch of said first segment.
  - 4. The voice synthesis apparatus as claimed in claim 1,wherein said voice generation section provisionally sets the pitch of the second segment of the voice of the reply at the pitch associated with the pitch of the first segment, andwherein said voice generation section is further configured to perform at least one of:
    - an operation of, if the provisionally-set pitch of the second segment is lower than a predetermined first threshold value, changing the provisionally-set pitch of the second segment to a pitch shifted one octave up; and
      
      an operation of, if the provisionally-set pitch of the second segment is higher than a predetermined second threshold value, changing the provisionally-set pitch of the second segment to a pitch one octave down.
  - 5. The voice synthesis apparatus as claimed in claim 1,wherein said voice generation section provisionally sets the pitch of the second segment of the voice of the reply at the pitch associated with the pitch of the first segment, andwherein said voice generation section is further configured to change the provisionally-set pitch to a pitch shifted one octave up or down in accordance with a designated attribute.
  - 6. The voice synthesis apparatus as claimed in claim 1,wherein any one of a first mode and a second mode is settable as an operation mode of said voice generation section,wherein, in said first mode, said voice generation section controls the pitch of the voice of the reply in such a manner that the interval of the pitch of said second segment relative to the pitch of said first segment becomes a consonant interval, andwherein, in said second mode, said voice generation section controls the pitch of the voice of the reply in such a manner that the interval of the pitch of said second segment relative to the pitch of said first segment becomes a dissonant interval.
  - 7. The voice synthesis apparatus as claimed in claim 1, further comprising:
    - a non-linguistic analysis section configured to analyze non-linguistic information, other than pitch, related to the remark; and
      
      a control section configured to control voice generation, in said voice generation section, of the reply in accordance with the analyzed non-linguistic information.
  - 8. The voice synthesis apparatus as claimed in claim 1, further comprising:
    - a linguistic analysis section configured to analyze linguistic information included in the remark and the reply; and
      
      a control section configured to control voice generation, in said voice generation section, of the reply in accordance with the analyzed linguistic information.
  - 9. The voice synthesis apparatus as claimed in claim 1, further comprising:
    - a non-linguistic analysis section configured to analyze pitch variation in the remark; and
      
      a control section configured to control a pitch of voice of the reply, generated in said voice generation section, to vary in accordance with the pitch variation in the remark.
  - 10. The voice synthesis apparatus as claimed in claim 1, wherein said voice generation section is configured to associate the pitch of said second segment with the pitch of said first segment in accordance with a given rule and generate voice with a characteristic based on a given agent attribute, further comprising:
    - a control section configured to determine the rule based on at least one agent attribute and an attribute of a speaker of the remark.
  - 11. The voice synthesis apparatus as claimed in claim 1, wherein said voice generation section is configured to associate the pitch of said second segment with the pitch of said first segment in accordance with a given rule and generate voice with a characteristic based on a given agent attribute, further comprising:
    - a control section configured to update the rule based on receiving voice of a further remark via said sound input section after sounding of the voice of the reply.

12. A computer-implemented method comprising:
- receiving a voice signal of a remark;
  
  analyzing a pitch of a first segment of the remark;
  
  acquiring a reply to the remark;
  
  synthesizing voice of the acquired reply; and
  
  controlling a pitch of the reply in such a manner that a pitch of a second segment of the voice of the reply has a pitch associated with the analyzed pitch of the first segment and an interval of the pitch of the second segment relative to the pitch of the first segment becomes a consonant interval.

13. A coding/decoding device comprising:
- an A/D converter configured to convert an input voice signal of a remark into a digital signal;
  
  a pitch analysis section configured to analyze a pitch of a first segment of the remark based on the digital signal;
  
  a back-channel feedback acquisition section configured to, when back-channel feedback is to be returned to the remark, acquire back-channel feedback data corresponding to a meaning of the remark;
  
  a pitch control section configured to control a pitch of the back-channel feedback data in such a manner that a second segment of the back-channel feedback data has a pitch associated with the analyzed pitch of the first segment and an interval of the pitch of the second segment relative to the pitch of the first segment becomes a consonant interval; and
  
  a D/A converter configured to convert the pitch-controlled back-channel feedback data into an analogue signal.
- View Dependent Claims (14)
- - 14. The coding/decoding device as claimed in claim 13,wherein the digital signal converted by the A/D converter is supplied to a host computer,wherein the pitch control section is further configured to receive replying voice data, responsive to the remark, returned from the host computer and control a pitch of the replying voice data in such a manner that a third segment of the received replying voice data has a pitch associated with the analyzed pitch of the first segment, andwherein the D/A converter is further configured to convert the pitch-controlled replying voice data, into an analogue signal.

15. A voice synthesis system comprising a coding/decoding device and a host computer, said coding/decoding device comprising:
- an A/D converter that converts an input voice signal of a remark into a digital signal;
  
  a pitch analysis section that analyzes a pitch of a first segment of the remark based on the digital signal;
  
  a back-channel feedback acquisition section that, when back-channel feedback is to be returned to the remark, acquires back-channel feedback data corresponding to a meaning of the remark;
  
  a pitch control section that controls a pitch of the back-channel feedback data in such a manner that a second segment of the back-channel feedback data has a pitch associated with the analyzed pitch of the first segment and an interval of the pitch of the second segment relative to the pitch of the first segment becomes a consonant interval; and
  
  a D/A converter configured to convert the pitch-controlled back-channel feedback data into an analogue signal,wherein said host computer is configured in such a manner that, when replying voice other than the back-channel feedback is to be returned to the remark, said host computer acquires replying voice data, responsive to the remark, in accordance with the digital signal converted by said A/D converter and returns the acquired replying voice data to said coding/decoding device,wherein said pitch control section is further configured to control a pitch of the replying voice data in such a manner that a third segment of the replying voice data returned from the host computer has a pitch associated with the analyzed pitch of the first segment, andwherein said D/A converter is further configured to convert the pitch-controlled replying voice data, into an analogue signal.

16. A method comprising:
- converting, by means of an A/D converter, an input voice signal of a remark into a digital signal;
  
  analyzing, by means of a processor, a pitch of a first segment of the remark based on the digital signal;
  
  acquiring, by means of the processor, back-channel feedback data corresponding to a meaning of the remark, when back-channel feedback is to be returned to the remark;
  
  controlling, by means of the processor, a pitch of the back-channel feedback data in such a manner that a second segment of the back-channel feedback data has a pitch associated with the analyzed pitch of the first segment and an interval of the pitch of the second segment relative to the pitch of the first segment becomes a consonant interval; and
  
  converting, by means of a D/A converter, the pitch-controlled back-channel feedback data into an analogue signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Yamaha Corporation
Original Assignee
Yamaha Corporation
Inventors
MATSUBARA, Hiroaki, URA, Junya, YOSHIMURA, Katsuji, KAWAHARA, Takehiko, HISAMINATO, Yuji

Granted Patent

US 10,490,181 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 13/027   Concept to speech synthesis...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/0335   Pitch control

G10L 13/06   Elementary speech units use...

G10L 13/10   Prosody rules derived from ...

G10L 15/18   using natural language mode...

G10L 25/90   Pitch determination of spee...

H04M 2201/39   using speech synthesis

TECHNOLOGY FOR RESPONDING TO REMARKS USING SPEECH SYNTHESIS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

TECHNOLOGY FOR RESPONDING TO REMARKS USING SPEECH SYNTHESIS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links