Methods employing phase state analysis for use in speech synthesis and recognition
First Claim
1. A method of computer-implemented speech synthesis, the method comprising:
- (a) providing a database of acoustic units accessible to a processor wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time;
(b) mapping each acoustic unit to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules;
(c) calculating with the processor weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit;
(d) calculating with the processor an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units;
(e) selecting and modifying with the processor, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and
(f) generating an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.
-
Citations
20 Claims
-
1. A method of computer-implemented speech synthesis, the method comprising:
-
(a) providing a database of acoustic units accessible to a processor wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time; (b) mapping each acoustic unit to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules; (c) calculating with the processor weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit; (d) calculating with the processor an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units; (e) selecting and modifying with the processor, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and (f) generating an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-readable non-transitory storage medium storing executable instructions for computer-implemented speech synthesis, which when executed by a computer system, cause the computer system to:
-
(a) provide a database of acoustic units accessible to a processor wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time; (b) map each acoustic unit to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules; (c) calculate with the processor weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit; (d) calculate with the processor an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units; (e) select and modify with the processor, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and (f) generate an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units. - View Dependent Claims (11)
-
-
12. A system for computer-implemented speech synthesis, the system comprising:
-
a memory storing a database of acoustic units, wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time; and a processor configured to execute to; (a) map each acoustic unit in the database of acoustic units to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules; (b) calculate weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit; (c) calculate an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units; (d) select and modify, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between the weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and (e) generate an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification