Prosody generating devise, prosody generating method, and program
First Claim
1. A prosody generation apparatus that receives phonological information and linguistic information so as to generate prosody, the prosody generation apparatus being operable to refer to (a) a representative prosodic pattern storage unit for accumulating beforehand representative prosodic patterns of portions of speech data, the portions including prosody changing points;
- (b) a selection rule storage unit that stores a selection rule predetermined according to attributes concerning phonology or attributes concerning linguistic information of the portions of the speech data including the prosody changing points; and
(c) a transformation rule storage unit that stores a transformation rule predetermined according to attributes concerning the phonology or the linguistic information of the portions of the speech data including the prosody changing points;
the prosody generation apparatus comprising a computer processing unit and a memory storing a program that are configured to implement;
a prosody changing point setting unit that sets a prosody changing point according to at least any one of the received phonological information and the linguistic information;
a pattern selection unit that selects a representative prosodic pattern from the representative prosodic pattern storage unit according to the selection rule, based on the received phonological information and the linguistic information; and
a prosody generation unit that transforms the representative prosodic pattern selected by the pattern selection unit according to the transformation rule and interpolates the transformed prosodic pattern for a portion between the prosodic patterns corresponding to the prosody changing points,wherein assuming that a difference in pitch between adjacent moras or adjacent syllables of the speech data is Δ
P, the prosody changing point is a point where the Δ
P and an immediately following Δ
P are different in sign.
2 Assignments
0 Petitions
Accused Products
Abstract
A prosody generation apparatus capable of suppressing distortion that occurs when generating prosodic patterns and therefore generating a natural prosody is provided. A prosody changing point extraction unit in this apparatus extracts a prosody changing point located at the beginning and the ending of a sentence, the beginning and the ending of a breath group, an accent position and the like. A selection rule and a transformation rule of a prosodic pattern including the prosody changing point is generated by means of a statistical or learning technique and the thus generate rules are stored in a representative prosodic pattern selection rule table and a transformation rule table beforehand. A pattern selection unit selects a representative prosodic pattern from the representative prosodic pattern selection rule table according to the selection rule. A prosody generation unit transforms the selected pattern according to the transformation rule and carries out interpolation with respect to portions other than the prosody changing points so as to generate prosody as a whole.
12 Citations
26 Claims
-
1. A prosody generation apparatus that receives phonological information and linguistic information so as to generate prosody, the prosody generation apparatus being operable to refer to (a) a representative prosodic pattern storage unit for accumulating beforehand representative prosodic patterns of portions of speech data, the portions including prosody changing points;
- (b) a selection rule storage unit that stores a selection rule predetermined according to attributes concerning phonology or attributes concerning linguistic information of the portions of the speech data including the prosody changing points; and
(c) a transformation rule storage unit that stores a transformation rule predetermined according to attributes concerning the phonology or the linguistic information of the portions of the speech data including the prosody changing points;
the prosody generation apparatus comprising a computer processing unit and a memory storing a program that are configured to implement;a prosody changing point setting unit that sets a prosody changing point according to at least any one of the received phonological information and the linguistic information; a pattern selection unit that selects a representative prosodic pattern from the representative prosodic pattern storage unit according to the selection rule, based on the received phonological information and the linguistic information; and a prosody generation unit that transforms the representative prosodic pattern selected by the pattern selection unit according to the transformation rule and interpolates the transformed prosodic pattern for a portion between the prosodic patterns corresponding to the prosody changing points, wherein assuming that a difference in pitch between adjacent moras or adjacent syllables of the speech data is Δ
P, the prosody changing point is a point where the Δ
P and an immediately following Δ
P are different in sign. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- (b) a selection rule storage unit that stores a selection rule predetermined according to attributes concerning phonology or attributes concerning linguistic information of the portions of the speech data including the prosody changing points; and
-
20. A prosody generation apparatus that receives phonological information and linguistic information so as to generate prosody, the prosody generation apparatus being operable to refer to (a) a representative prosodic pattern storage unit for accumulating beforehand representative prosodic patterns of portions of speech data, the portions including prosody changing points;
- (b) a selection rule storage unit that stores a selection rule predetermined according to attributes concerning phonology or attributes concerning linguistic information of the portions of the speech data including the prosody changing points; and
(c) a transformation rule storage unit that stores a transformation rule predetermined according to attributes concerning the phonology or the linguistic information of the portions of the speech data including the prosody changing points;
the prosody generation apparatus comprising a computer processing unit and a memory storing a program that are configured to implement;a prosody changing point setting unit that sets a prosody changing point according to at least any one of the received phonological information and the linguistic information; a pattern selection unit that selects a representative prosodic pattern from the representative prosodic pattern storage unit according to the selection rule, based on the received phonological information and the linguistic information; and a prosody generation unit that transforms the representative prosodic pattern selected by the pattern selection unit according to the transformation rule and interpolates the transformed prosodic pattern for a portion between the prosodic patterns corresponding to the prosody changing points, wherein the prosody changing point setting unit sets the prosody changing point using at least one of the received phonological information and linguistic information, according to a prosody changing point extraction rule predetermined based on attributes concerning the phonology and attributes concerning the linguistic information of the prosody changing point of the speech data, and wherein the prosody changing point extraction rule is obtained by formulating a relationship between (i) a classification as to whether adjacent moras or syllables of the speech data are a prosody changing point or not and (ii) attributes concerning phonology or attributes concerning linguistic information of the adjacent moras or syllables, by means of a statistical technique or a learning technique so as to predict whether a point is a prosody changing point or not using at least one of the attributes concerning phonology and the attributes concerning linguistic information. - View Dependent Claims (21)
- (b) a selection rule storage unit that stores a selection rule predetermined according to attributes concerning phonology or attributes concerning linguistic information of the portions of the speech data including the prosody changing points; and
-
22. A prosody generation apparatus that receives phonological information and linguistic information so as to generate prosody, the prosody generation apparatus being operable to refer to (a) a representative prosodic pattern storage unit for accumulating beforehand representative prosodic patterns of portions of speech data, the portions including prosody changing points;
- (b) a selection rule storage unit that stores a selection rule predetermined according to attributes concerning phonology or attributes concerning linguistic information of the portions of the speech data including the prosody changing points; and
(c) a transformation rule storage unit that stores a transformation rule predetermined according to attributes concerning the phonology or the linguistic information of the portions of the speech data including the prosody changing points;
the prosody generation apparatus comprising a computer processing unit and a memory storing a program that are configured to implement;a prosody changing point setting unit that sets a prosody changing point according to at least any one of the received phonological information and the linguistic information; a pattern selection unit that selects a representative prosodic pattern from the representative prosodic pattern storage unit according to the selection rule, based on the received phonological information and the linguistic information; and a prosody generation unit that transforms the representative prosodic pattern selected by the pattern selection unit according to the transformation rule and interpolates the transformed prosodic pattern for a portion between the prosodic patterns corresponding to the prosody changing points, wherein the transformation rule is obtained by clustering prosodic patterns of the speech data into clusters corresponding to the representative patterns so as to produce a representative pattern for each cluster and by formulating a relationship between (i) a distance between each of the prosodic patterns and a representative pattern of a cluster to which the prosodic pattern belongs and (ii) attributes concerning phonology or attributes concerning linguistic information of the prosodic pattern, by means of a statistical technique or a learning technique so as to estimate an amount of transformation of the selected prosodic pattern, using at least one of the attributes concerning phonology and the attributes concerning linguistic information. - View Dependent Claims (23, 24, 25, 26)
- (b) a selection rule storage unit that stores a selection rule predetermined according to attributes concerning phonology or attributes concerning linguistic information of the portions of the speech data including the prosody changing points; and
Specification