Speech synthesis using concatenation of speech waveforms
First Claim
Patent Images
1. A speech synthesizer comprising:
- a. a large speech database referencing speech waveforms;
b. a speech waveform selector in communication with the speech database that selects waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the criteria include a requirement favoring waveform candidates having pitch within a range determined as a function of high-level linguistic features, and wherein the criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
14 Assignments
0 Petitions
Accused Products
Abstract
A high quality speech synthesizer in various embodiments concatenates speech waveforms referenced by a large speech database. Speech quality is further improved by speech unit selection and concatenation smoothing.
458 Citations
108 Claims
-
1. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms;
b. a speech waveform selector in communication with the speech database that selects waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the criteria include a requirement favoring waveform candidates having pitch within a range determined as a function of high-level linguistic features, and wherein the criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
2. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms;
b. a speech waveform selector in communication with the speech database that selects waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the criteria include a requirement favoring waveform candidates having a duration within a range determined as a function of high-level linguistic features, and wherein the criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
3. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms;
b. a speech waveform selector in communication with the speech database that selects waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the criteria include a requirement favoring waveform candidates having coarse pitch continuity within a range determined as a function of high-level linguistic features, and wherein the criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
4. A speech synthesizer comprising:
-
a. a large speech database;
b. a target generator for generating a sequence of target feature vectors responsive to a phonetic transcription input;
c. a waveform selector that selects a sequence of waveforms referenced by the database, each waveform in the sequence corresponding to a first non-null set of target feature vectors, wherein the waveform selector attributes, to any waveform candidate, a node cost, wherein the node cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined using a cost function that varies nontrivially according to a second non-null set of target feature vectors in the sequence; and d. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output. - View Dependent Claims (5, 6, 7)
-
-
8. A speech synthesizer comprising:
-
a. a large speech database;
b. a target generator for generating a sequence of target feature vectors responsive to a phonetic transcription input;
c. a waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to pairs of adjacent waveform candidates, a transition cost, wherein the transition cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined using a cost function that varies nontrivially according to the features of a region in the phonetic transcription input that corresponds to adjacent waveform candidates; and d. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
9. A speech synthesizer comprising:
-
a. a large speech database;
b. a speech waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a cost function having a plurality of steep sides; and
database that concatenates the waveforms selected by the speech waveform selectorc. a speech waveform concatenator in communication with the speech datebase that concatenates the waveforms selected by the sppech waveform selector to produce a speech signal outpup. - View Dependent Claims (10, 11, 12)
-
-
13. A speech synthesizer comprising:
-
a. a large speech database;
b. a speech waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a piecewise linear cost function that has a region that approximates a flat bottom; and c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
14. A speech synthesizer comprising:
-
a. a large speech database;
b. a speech waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using an asymmetric cost function that has a region that approximates a flat bottom; and c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
15. A speech synthesizer comprising:
-
a. a large speech database;
b. a waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost of a symbolic feature is determined using a non-binary numeric function determined by recourse to a table; and c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
16. A speech synthesizer comprising:
-
a. a large speech database;
b. a waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost of a symbolic feature is determined using a non-binary numeric function determined by recourse to a set of rules; and
c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
17. A speech synthesizer comprising:
-
a. a large speech database;
b. a target generator for generating a sequence of target feature vectors responsive to a phonetic transcription input;
c. a waveform selector that selects a sequence of waveforms referenced by the database, each waveform in the sequence corresponding to a first non-null set of target feature vectors, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of weighted individual costs associated with each of a plurality of features, and wherein the weight associated with at least one of the individual costs varies nontrivially according to a second non-null set of target feature vectors in the sequence, such target features including at least one feature other than target phoneme identity; and d. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output. - View Dependent Claims (18, 19, 20)
-
-
21. A speech synthesizer comprising:
-
a. a large speech database;
b. a waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a waveform cost, wherein the waveform cost is a function of individual costs associated with each of a plurality of features, and wherein calculation of the waveform cost is aborted after it is determined that the waveform cost will exceed a threshold; and c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
22. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms and associated symbolic prosodic features, wherein the database is accessed by speech waveform designators, each designator being associated with a sequence of diphones, the sequence having at least one diphone;
b. a speech waveform selector, in communication with the speech database, that selects, based, at least in part, on the symbolic prosodic features, waveforms referenced by the database using speech waveform designators that correspond to a phonetic transcription input wherein the waveform selector attributes, to pairs of adjacent waveform candidates, a transition cost, wherein the transition cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined by using, as an argument, an acoustic distance value selected from one of a first set of tables, each table in the first set corresponding to a non-null set of phonemes; and
c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output. - View Dependent Claims (23, 24)
-
-
25. A speech synthesizer comprising:
-
a. a speech database referencing speech waveforms;
b. a speech waveform selector, in communication with the speech database, that selects waveforms referenced by the database using designators that correspond to a phonetic transcription input; and
c. a speech waveform concatenator, in communication with the speech database, that concatenates waveforms selected by the speech waveform selector to produce a speech signal output, wherein, for at least one ordered sequence of a first waveform and a second waveform, the concatenator selects (i) a location of a trailing edge of the first waveform and (ii) a location of a leading edge of the second waveform, each location being selected so as to produce an optimization of a phase match between the first and second waveforms in regions near the locations, the optimization being determined in a plurality of successive stages in which time resolution associated. with the first and second waveforms is made successively finer. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40)
-
-
26. A speech synthesizer comprising:
-
a. a speech database referencing speech waveforms;
b. a speech reform selector, in communication with the speech database, that selects waveforms referenced by the database using designators that correspond to a phonetic transcription input; and
c. a speech waveform concatenator, in communication with the speech database, that concatenates waveforms selected by the speech waveform selector to produce a speech signal output, wherein, for at least one ordered sequence of a first waveform and a second waveform, the second waveform having a leading edge, the concatenator selects the location of a trailing edge of the first waveform, the location being selected so as to produce an optimization of a phase match between the first and second waveforms in regions near the location and the leading edge, the optimization being determined in a plurality of successive stages in which time resolution associated with the first and second waveforms is made successively finer.
-
-
27. A speech synthesizer comprising:
-
a. a speech database referencing speech waveforms;
b. a speech waveform selector, in communication with the speech database, that selects waveforms referenced by the database using designators that correspond to a phonetic transcription input; and
c. a speech waveform concatenator, in communication with the speech database, that concatenates waveforms selected by the speech waveform selector to produce a speech signal output, wherein, for at least one ordered sequence of a first waveform and a second waveform, the first waveform having a trailing edge, the concatenator selects the location of a leading edge of the second waveform, the location being selected so as to produce an optimization of a phase match between the first and second waveforms in regions near the location and the trailing edge, the optimization being determined in a plurality of successive stages in which time resolution associated with the first and second waveforms is made successively finer.
-
-
41. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms and associated symbolic prosodic features, wherein the database is accessed by speech waveform designators, each designator being associated with a sequence of diphones, the sequence having at least one diphone;
b. speech waveform selecting means, in communication with the speech database, for selecting, based, at least in part, on the symbolic prosodic features, waveforms referenced by the database using speech waveform designators that correspond to a phonetic transcription input, and wherein the waveform selecting means attributes, to pairs of adjacent waveform candidates, a transition cost, wherein the transition cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined by using, as an argument, an acoustic distance value selected from one of a first set of tables, each table in the first set corresponding to a non-null set of phonemes; and
speech waveform concatenating means in communication with the speech database for concatenating the waveforms selected by the speech waveform selecting means to produce a speech signal output.- View Dependent Claims (42, 43)
-
-
44. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms, wherein the database is accessed by speech waveform designators;
b. speech waveform selecting means, in communication with the speech database, for selecting, waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the criteria include a requirement favoring waveform candidates having pitch within a range determined as a function of high-level linguistic features, and wherein the criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. speech waveform concatenating means in communication with the speech database for concatenating the waveforms selected by the speech waveform selecting means to produce a speech signal output.
-
-
45. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms, wherein the database is accessed by speech waveform designators;
b. speech waveform selecting means, in communication with the speech database, for selecting, waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the criteria include a requirement favoring waveform candidates having a duration within a range determined as a function of high-level linguistic features, and wherein the criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. speech waveform concatenating means in communication with the speech database for concatenating the waveforms selected by the speech waveform selecting means to produce a speech signal output.
-
-
46. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms, wherein the database is accessed by speech waveform designators;
b. speech waveform selecting means, in communication with the speech database, for selecting, waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the criteria include a requirement favoring waveform candidates having coarse pitch continuity within a range determined as a function of high-level linguistic features, and wherein the criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. speech waveform concatenating means in communication with the speech database for concatenating the waveforms selected by the speech waveform selecting means to produce a speech signal output.
-
-
47. A speech synthesizer comprising:
-
a. a large speech database;
b. target generating means for generating a sequence of target feature vectors responsive to a phonetic transcription input;
c. waveform selecting means for selecting a sequence of waveforms referenced by the database, each waveform in the sequence corresponding to a first non-null set of target feature vectors, wherein the waveform selecting means attributes, to any waveform candidate, a node cost, wherein the node cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined using a cost function that varies nontrivially according to a second non-null set of target feature vectors in the sequence; and d. speech waveform concatenating means in communication with the speech database for concatenating the waveforms selected by the speech waveform selecting means to produce a speech signal output. - View Dependent Claims (48, 49, 50)
-
-
51. A method of speech synthesis comprising:
-
a. providing a large speech database referencing speech waveforms and associated symbolic prosodic features, wherein the database is accessed by speech waveform designators, each designator being associated with a sequence of diphones, the sequence having at least one diphone;
b. selecting, based, at least in part, on the symbolic prosodic features, waveforms referenced by the database using speech waveform designators that correspond to a phonetic transcription input, wherein the selecting attributes a transition cost to pairs of adjacent waveform candidates, wherein the transition cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined by using, as an argument, an acoustic distance value selected from one of a first set of tables, each table in the first set corresponding to a non-null set of phonemes; and
c. concatenating the selected waveforms to produce a speech signal output. - View Dependent Claims (52, 53)
-
-
54. A method of speech synthesis comprising:
-
a. providing a large speech database referencing speech waveforms;
b. selecting waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the selecting criteria include a requirement favoring waveform candidates having pitch within a range determined as a function of high-level linguistic features, and wherein the selecting criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. concatenating the selected waveforms to produce a speech signal output.
-
-
55. A method of speech synthesis comprising:
-
a. providing a large speech database referencing speech waveforms;
b. selecting waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the selecting criteria include a requirement favoring waveform candidates having a duration within a range determined as a function of high-level linguistic features, and wherein the selecting criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. concatenating the selected waveforms to produce a speech signal output.
-
-
56. A method of speech synthesis comprising:
-
a. providing a large speech database referencing speech waveforms;
b. selecting waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, wherein the selecting criteria include a requirement favoring waveform candidates having coarse pitch continuity within a range determined as a function of high-level linguistic features, and wherein the selecting criteria are implemented by cost functions, and the requirement is implemented using a function having steep sides and a region that approximates a flat bottom; and
c. concatenating the selected waveforms to produce a speech signal output.
-
-
57. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. generating a sequence of target feature vectors responsive to a phonetic transcription input;
c. selecting a sequence of waveforms referenced by the database, each waveform in the sequence corresponding to a first non-null set of target feature vectors, wherein the selecting attributes a node cost to any waveform candidate, wherein the node cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined using a cost function that varies nontrivially according to a second non-null set of target feature vectors in the sequence; and d. concatenating the selected waveforms to produce a speech signal output. - View Dependent Claims (58, 59, 60)
-
-
61. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. generating a sequence of target feature vectors responsive to a phonetic transcription input;
c. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a transition cost to pairs of adjacent waveform candidates, wherein the transition cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined using a cost function that varies nontrivially according to the features of a region in the phonetic transcription input that corresponds to adjacent waveform candidates; and d. concatenating the selected waveforms to produce a speech signal output.
-
-
62. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a cost function that has at least one steep side; and c. concatenating the selected waveforms to produce a speech signal output.
-
-
63. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a cost function that has a plurality of steep sides; and c. concatenating the selected waveforms to produce a speech signal output. - View Dependent Claims (64, 65, 66)
-
-
67. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a cost function that has a region that approximates a flat bottom; and c. concatenating the selected waveforms to produce a speech signal output. - View Dependent Claims (68, 69)
-
-
70. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost of a symbolic feature is determined using a non-binary numeric function; and c. concatenating the selected waveforms to produce a speech signal output. - View Dependent Claims (71)
-
-
72. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost of a symbolic feature is determined using a non-binary numeric function determined by recourse to a table; and c. concatenating the selected waveforms to produce a speech signal output.
-
-
73. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost of a symbolic feature is determined using a non-binary numeric function determined by recourse to a set of rules; and c. concatenating the selected waveforms to produce a speech signal output.
-
-
74. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. generating a sequence of target feature vectors responsive to a phonetic transcription input;
c. selecting a sequence of waveforms referenced by the database, each waveform in the sequence corresponding to a first non-null set of target feature vectors, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of weighted individual costs associated with each of a plurality of features, and wherein the weight associated with at least one of the individual costs varies nontrivially according to a second non-null set of target feature vectors in the sequence, such target features including at least one feature other than target phoneme identity; and d. concatenating the selected waveforms to produce a speech signal output. - View Dependent Claims (75, 76, 77)
-
-
78. A method of speech synthesis comprising:
-
a. providing a speech database referencing speech waveforms;
b. selecting waveforms referenced by the database using designators that correspond to a phonetic transcription input; and
c. concatenating the selected waveforms to produce a speech signal output, wherein, for at least one ordered sequence of a first waveform and a second waveform, the concatenating selects (i) a location of a trailing edge of the first waveform and (ii) a location of a leading edge of the second waveform, each location being selected so as to produce an optimization of a phase match between the first and second waveforms in regions near the locations, the optimization being determined in a plurality of successive stages in which time resolution associated with the first and second waveforms is made successively finer. - View Dependent Claims (81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93)
-
-
79. A method of speech synthesis comprising:
-
a. providing a speech database referencing speech waveforms;
b. selecting waveforms referenced by the database using designators that correspond to a phonetic transcription input; and
c. concatenating the selected waveforms to produce a speech signal output, wherein, for at least one ordered sequence of a first waveform and a second waveform, the second waveform having a leading edge, the concatenating selects the location of a trailing edge of the first waveform, the location being selected so as to produce an optimization of a phase match between the first and second waveforms in regions near the location and the leading edge, the optimization being determined in a plurality of successive stages in which time resolution associated with the first and second waveforms is made successively finer.
-
-
80. A method of speech synthesis comprising:
-
a. providing a speech database referencing speech waveforms;
b. selecting waveforms referenced by the database using designators that correspond to a phonetic transcription input; and
c. concatenating the selected waveforms to produce a speech signal output, wherein, for at least one ordered sequence of a first waveform and a second waveform, the first waveform having a trailing edge, the concatenating selects the location of a leading edge of the second waveform, the location being selected so as to produce an optimization of a phase match between the first and second waveforms in regions near the location and the trailing edge, the optimization being determined in a plurality of successive stages in which time resolution associated with the first and second waveforms is made successively finer.
-
-
94. A speech synthesizer comprising:
-
a. a large speech database;
b. a speech waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a piecewise linear cost function that has at least one steep side; and c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
95. A speech synthesizer comprising:
-
a. a large speech database;
b. a speech waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using an asymmetric cost function that has at least one steep side; and c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
96. A speech synthesizer comprising:
-
a. a large speech database;
b. a speech waveform selector that selects a sequence of waveforms referenced by the database, wherein the waveform selector attributes, to any waveform candidate, a cost, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a cost function that has at least one steep side and a region that approximates a flat bottom; and c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output.
-
-
97. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a piecewise linear cost function that has at least one steep side; and c. concatenating the selected waveforms to produce a speech signal output.
-
-
98. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using an asymmetric cost function that has at least one steep side; and c. concatenating the selected waveforms to produce a speech signal output.
-
-
99. A method of speech synthesis comprising:
-
a. providing a large speech database;
b. selecting a sequence of waveforms referenced by the database, wherein the selecting attributes a cost to any waveform candidate, wherein the cost is a function of individual costs associated with each of a plurality of features, and wherein, for at least one numeric feature, an individual cost is determined using a cost function that has at least one steep side and a region that approximates a flat bottom; and c. concatenating the selected waveforms to produce a speech signal output.
-
-
100. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms;
b. a speech waveform selector in communication with the speech database that selects waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, and wherein the waveform selector attributes, to pairs of adjacent waveform candidates, a transition cost, wherein the transition cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined by using, as an argument, an acoustic distance value selected from one of a first set of tables, each table in the first set corresponding to a non-null set of phonemes; and
c. a speech waveform concatenator in communication with the speech database that concatenates the waveforms selected by the speech waveform selector to produce a speech signal output. - View Dependent Claims (102)
-
-
101. A speech synthesizer according to claim 172, wherein the acoustic distance is spectral distance and each table in the first set corresponds to a different phoneme.
-
103. A speech synthesizer comprising:
-
a. a large speech database referencing speech waveforms, wherein the database is accessed by speech waveform designators;
b. speech waveform selecting means, in communication with the speech database, for selecting, waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, and wherein the waveform selector attributes, to pairs of adjacent waveform candidates, a transition cost, wherein the transition cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined by using, as an argument, an acoustic distance value selected from one of a first set of tables, each table in the first set corresponding to a non-null set of phonemes; and
c. speech waveform concatenating means in communication with the speech database for concatenating the waveforms selected by the speech waveform selecting means to produce a speech signal output. - View Dependent Claims (104, 105)
-
-
106. A method of speech synthesis comprising:
-
a. providing a large speech database referencing speech waveforms;
b. selecting waveforms referenced by the database using criteria that (i) favor waveform candidates based, at least in part, directly on high-level linguistic features, and (ii) favor approximately equally all waveform candidates in respect to low-level prosody features except those wherein the low-level prosody features are unlikely, and wherein the selecting attributes a transition cost to any waveform candidate, wherein the transition cost is a function of individual costs associated with each of a plurality of features, and wherein at least one individual cost is determined by using, as an argument, an acoustic distance value selected from one of a first set of tables, each table in the first set corresponding to a non-null set of phonemes; and
c. concatenating the selected waveforms to produce a speech signal output. - View Dependent Claims (107, 108)
-
Specification