Speech synthesis
First Claim
1. A method of producing synthesised speech from a text, comprising:
- (a) providing a database of diphones derived from samples of natural speech;
(b) analysing the text to render the text as a succession of target diphones;
(c) identifying, for each target diphone, the value of each of a number of predetermined diphone features;
(d) identifying in the database diphones which are potential matches to each target diphone;
(e) establishing a target cost for each of said predetermined features of each potential database diphone in relation to each target diphone;
(f) modifying the target cost of each feature in accordance with predetermined factors associated with said diphone features; and
(g) calculating the least-cost combination to achieve output speech corresponding to the text.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention makes use of a database of diphones derived from natural speech. A text is rendered as a series of target diphones and for each of these a number of predetermined diphone features are identified. Potential matches from the database are identified and a target cost for each of these features is established. The target costs are modified before selecting a least-cost combination. The modification of the target costs may be done by weighting, or by use of distribution functions. The calculation of the least-cost combination may be performed by a dynamic search program such as a Viterbi search. In the preferred embodiments, diphone join costs are also included in the least-cost calculation, and are also modified before the calculation is made. In addition to, or instead of, modification of target costs, the potential matches may be pre-pruned to identify a predetermined number of potential matches in descending order of suitability.
24 Citations
21 Claims
-
1. A method of producing synthesised speech from a text, comprising:
-
(a) providing a database of diphones derived from samples of natural speech;
(b) analysing the text to render the text as a succession of target diphones;
(c) identifying, for each target diphone, the value of each of a number of predetermined diphone features;
(d) identifying in the database diphones which are potential matches to each target diphone;
(e) establishing a target cost for each of said predetermined features of each potential database diphone in relation to each target diphone;
(f) modifying the target cost of each feature in accordance with predetermined factors associated with said diphone features; and
(g) calculating the least-cost combination to achieve output speech corresponding to the text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method of producing synthesised speech from a text, comprising:
-
(a) providing a database of diphones derived from samples of natural speech;
(b) analysing the text to render the text as a succession of target diphones;
(c) identifying, for each target diphone, the value of each of a number of predetermined diphone features;
(d) identifying in the database diphones which are potential matches to each target diphone;
(e) pre-pruning said potential matches by means of sorting by category to identify a predetermined number of potential matches of descending order of suitability;
(f) establishing a target cost for each of said predetermined features of each potential database diphone in relation to each target diphone; and
(g) calculating the least-cost combination to achieve output speech corresponding to the text. - View Dependent Claims (17, 18)
-
-
19. A system for producing synthesised speech from text, the system comprising:
-
memory means storing a database of diphones derived from natural speech;
processing means arranged to;
(a) analyse the text to render the text as a succession of target diphones;
(b) identify, for each target diphone, the value of each of a number of predetermined diphone features;
(c) identify in the database diphones which are potential matches to each target diphone;
(d) establish a target cost for each of said predetermined features of each potential database diphone in relation to each target diphone;
(e) modify the target cost of each feature in accordance with predetermined factors associated with said diphone features; and
(f) calculate the least-cost combination to achieve output speech corresponding to the text; and
speech synthesis means operable to retrieve and concatenate the diphones identified as constituting said least cost combination. - View Dependent Claims (21)
-
-
20. A system for producing synthesised speech from text, the system comprising:
-
memory means storing a database of diphones derived from natural speech;
processing means arranged to;
(a) analyse the text to render the text as a succession of target diphones;
(b) identify, for each target diphone, the value of each of a number of predetermined diphone features;
(c) identify in the database diphones which are potential matches to each target diphone;
(d) pre-prune said potential matches by means of sorting by category to identify a predetermined number of potential matches of descending order of suitability;
(e) establish a target cost for each of said predetermined features of each potential database diphone in relation to each target diphone; and
(f) calculate the least-cost combination to achieve output speech corresponding to the text; and
speech synthesis means operable to retrieve and concatenate the diphones identified as constituting said least cost combination.
-
Specification