Sound synthesis device, sound synthesis method and storage medium

US 9,805,711 B2
Filed: 12/15/2015
Issued: 10/31/2017
Est. Priority Date: 12/22/2014
Status: Active Grant

First Claim

Patent Images

1. A sound synthesis device, comprising a processor configured to perform the following:

receiving text data and extracting phoneme sequence from the text data;

obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data;

receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and

modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody,wherein said processor smoothes a pitch sequence in the target prosody, andwherein, in smoothing said pitch sequence in the target prosody, said processor quantizes pitches of the pitch sequence, and smoothes the pitch sequence by acquiring a weighted moving average of the quantized pitches.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A sound synthesis device that includes a processor configured to perform the following: extracting intonation information from prosodic information contained in sound data and digitally smoothing the extracted intonation information to obtain smoothed intonation information; obtaining a plurality of digital sound units based on text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data; and modifying the concatenated series of digital sound units in accordance with the smoothed intonation information with respect to at least one of parameters of the concatenated series of digital sound units to generate synthesized sound data corresponding to the text data.

Citations

10 Claims

1. A sound synthesis device, comprising a processor configured to perform the following:
- receiving text data and extracting phoneme sequence from the text data;
  
  obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data;
  
  receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and
  
  modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody,wherein said processor smoothes a pitch sequence in the target prosody, andwherein, in smoothing said pitch sequence in the target prosody, said processor quantizes pitches of the pitch sequence, and smoothes the pitch sequence by acquiring a weighted moving average of the quantized pitches.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The sound synthesis device according to claim 1, wherein said processor concatenates the plurality of digital sound units to construct the concatenated series of digital sound units that meets a prescribed matching condition with respect to the text data.
  - 3. The sound synthesis device according to claim 2,wherein the oral input speech data represents speech by a user.
  - 4. The sound synthesis device according to claim 1, wherein said processor modifies a pitch sequence in the concatenated series of digital sound units so as to substantially match the the target prosody.
  - 5. The sound synthesis device according to claim 4, wherein, in modifying the pitch sequence, said processor adjusts respective time scales of a pitch sequence in the target prosody and of said pitch sequence in the concatenated series of digital sound units, and adjusts at least one of the pitch sequence in the target prosody and the pitch sequence in the concatenated series of digital sound units so that periods during which pitches exist substantially match with each other.

6. A sound synthesis device, comprising a processor configured to perform the following:
- receiving text data and extracting phoneme sequence from the text data;
  
  obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data;
  
  receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and
  
  modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody,wherein said processor modifies a power sequence in the concatenated series of digital sound units so as to substantially match the target prosody,wherein said processor smoothes a power sequence in the target prosody, andwherein, in modifying the power sequence in the concatenated series of digital sound units, said processor smoothes the power sequence in the concatenated series of digital sound units, acquires a sequence of ratios between the smoothed power sequence in the concatenated series of digital sound units and the smoothed power sequence in the target prosody, and corrects the smoothed power sequence in the concatenated series of digital sound units in accordance with said sequence of ratios.
- View Dependent Claims (7, 8)
- - 7. The sound synthesis device according to claim 6, wherein said processor smoothes the power sequence in the target prosody by acquiring a weighted average of respective powers in the power sequence in the target prosody.
  - 8. The sound synthesis device according to claim 6, wherein, in modifying the power sequence in the concatenated series of digital sound units, said processor adjusts respective time scales of the power sequence in the target prosody and of the power sequence in the concatenated series of digital sound units.

9. A method of synthesizing sound performed by a processor in a sound synthesis device, the method comprising:
- receiving text data and extracting phoneme sequence from the text data;
  
  obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data;
  
  receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and
  
  modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody,wherein said processor smoothes a pitch sequence in the target prosody, andwherein, in smoothing said pitch sequence in the target prosody, said processor quantizes pitches of the pitch sequence, and smoothes the pitch sequence by acquiring a weighted moving average of the quantized pitches.

10. A non-transitory storage medium that stores instructions executable by a processor included in a sound synthesis device, said instructions causing the processor to perform the following:
- receiving text data and extracting phoneme sequence from the text data;
  
  obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data;
  
  receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and
  
  modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody,wherein said processor smoothes a pitch sequence in the target prosody, andwherein, in smoothing said pitch sequence in the target prosody, said processor quantizes pitches of the pitch sequence, and smoothes the pitch sequence by acquiring a weighted moving average of the quantized pitches.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Casio Computer Company Limited
Original Assignee
Casio Computer Company Limited
Inventors
Tanaka, Hyuta
Primary Examiner(s)
AZAD, ABUL K

Application Number

US14/969,150
Publication Number

US 20160180833A1
Time in Patent Office

686 Days
Field of Search

704258-269
US Class Current
CPC Class Codes

G10L 13/0335 Pitch control

G10L 13/10 Prosody rules derived from ...

Sound synthesis device, sound synthesis method and storage medium

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Sound synthesis device, sound synthesis method and storage medium

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links