Speech synthesis device, speech synthesis method, and speech synthesis program

US 9,520,125 B2
Filed: 06/08/2012
Issued: 12/13/2016
Est. Priority Date: 07/11/2011
Status: Active Grant

First Claim

Patent Images

1. A speech synthesis device comprising:

hardware including a processor, wherein the processor is configured to;

by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, update a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and

calculate a duration of each phoneme based on the updated phoneme boundary position, and generate synthesized speech based on the calculated duration of phoneme,wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, the processor is configured to determine whether a state before and after the phoneme boundary indicated a voiced state or an unvoiced state by using the voiced utterance likelihood index, andwherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, the processor is configured to update the phoneme boundary position to move in a predetermined direction according to the state.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There are provided a speech synthesis device, a speech synthesis method and a speech synthesis program which can represent a phoneme as a duration shorter than a duration upon modeling according to a statistical method. A speech synthesis device 80 according to the present invention includes a phoneme boundary updating means 81 which, by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updates a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme.

Citations

15 Claims

1. A speech synthesis device comprising:
- hardware including a processor, wherein the processor is configured to;
  
  by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, update a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and
  
  calculate a duration of each phoneme based on the updated phoneme boundary position, and generate synthesized speech based on the calculated duration of phoneme,wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, the processor is configured to determine whether a state before and after the phoneme boundary indicated a voiced state or an unvoiced state by using the voiced utterance likelihood index, andwherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, the processor is configured to update the phoneme boundary position to move in a predetermined direction according to the state.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The speech synthesis device according to claim 1, wherein the processor is further configured to specify whether or not each state which represents the phoneme indicates a voiced state or an unvoiced state, and, when one of the neighboring phonemes indicates the unvoiced sound and other one of the phonemes indicates a voiced sound, determine a moving direction of a phoneme boundary position according to a rule set in advance based on a relationship between the voiced state and the unvoiced state.
  - 3. The speech synthesis device according to claim 2, wherein the processor is further configured to specify as the voiced state a state which represents a phoneme when the voiced utterance likelihood index exceeds a threshold set in advance, and specify as the unvoiced state a state which represents a phoneme when the voiced utterance likelihood index is the threshold set in advance or less.
  - 4. The speech synthesis device according to claim 1, wherein the processor is further configured to update the phoneme boundary position based on a difference between voiced utterance likelihood indices of neighboring states.
  - 5. The speech synthesis device according to claim 4, wherein, when the difference between the voiced utterance likelihood index of one of the neighboring states and the voiced utterance likelihood index of the other state exceeds the threshold set in advance, the processor is further configured to determine as the phoneme boundary position a position between the one state and the other state.
  - 6. The speech synthesis device according to claim 1, wherein the processor is further configured to calculate a duration of the phoneme based on the updated phoneme boundary position.
  - 7. The speech synthesis device according to claim 1, wherein the processor is further configured to update the phoneme boundary position in units of a length corresponding to a width of a state.
  - 8. The speech synthesis device according to claim 1, wherein the processor is further configured to determine whether or not the voiced utterance likelihood index of each state is adequate and change the voiced utterance likelihood index which is determined to be inadequate to an adequate value.
  - 9. The speech synthesis device according to claim 8, wherein, when voiced utterance likelihood determination information which is a result of determining the voiced state or the unvoiced state based on the voiced utterance likelihood index is switched two or more times in one phoneme or when the voiced utterance likelihood determination information of a target phoneme indicates information different from phonetic piece information which is information set in advance as information indicating a property of the phoneme, the processor is further configured to determine that the voiced utterance likelihood index is inadequate.

10. A speech synthesis method comprising,by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updating a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme;
- andcalculating a duration of each phoneme based on the updated phoneme boundary position, and generating synthesized speech based on the calculated duration of phoneme,wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, determining whether a state before and after the phoneme boundary indicates a voiced state or an unvoiced state by using the voiced utterance likelihood index, andwherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, updating the phoneme boundary position to move in a predetermined direction according to the state.
- View Dependent Claims (11, 12)
- - 11. The speech synthesis method according to claim 10, further comprising specifying whether or not each state which represents the phoneme indicates a voiced state or an unvoiced state, and, when one of the neighboring phonemes indicates the unvoiced sound and other one of the phonemes indicates a voiced sound, determining a moving direction of a phoneme boundary position according to a rule set in advance based on a relationship between the voiced state and the unvoiced state.
  - 12. The speech synthesis method according to claim 10, further comprising updating the phoneme boundary position based on a difference between voiced utterance likelihood indices of neighboring states.

13. A non-transitory computer readable information recording medium storing a speech synthesis program that, when executed by a processor, performs a method for:
- by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updating a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and
  
  calculating a duration of each phoneme based on the updated phoneme boundary position, and generating synthesized speech based on the calculated duration of phoneme,wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, determining whether a state before and after the phoneme boundary indicates a voiced state or an unvoiced state by using the voiced utterance likelihood index, andwherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, updating the phoneme boundary position to move in a predetermined direction according to the state.
- View Dependent Claims (14, 15)
- - 14. The non-transitory computer readable information recording medium according to claim 13, specifying whether or not each state which represents the phoneme indicates a voiced state or an unvoiced state, and, when one of the neighboring phonemes indicates the unvoiced sound and other one of the phonemes indicates a voiced sound, determining a moving direction of a phoneme boundary position according to a rule set in advance based on a relationship between the voiced state and the unvoiced state.
  - 15. The non-transitory computer readable information recording medium according to claim 13, further comprising updating the phoneme boundary position based on a difference between voiced utterance likelihood indices of neighboring states.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Mitsui, Yasuyuki, Kato, Masanori, Kondo, Reishi
Primary Examiner(s)
Desir, Pierre-Louis
Assistant Examiner(s)
Shin, Seong Ah A

Application Number

US14/131,409
Publication Number

US 20140149116A1
Time in Patent Office

1,649 Days
Field of Search

704/207, 704/208, 704/240, 704/254, 704/260, 704/268, 704/231, 704/239
US Class Current

1/1
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/08   Speech classification or se...

G10L 2013/105   Duration

Speech synthesis device, speech synthesis method, and speech synthesis program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Speech synthesis device, speech synthesis method, and speech synthesis program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links