SYSTEM-EFFECTED METHODS FOR ANALYZING, PREDICTING, AND/OR MODIFYING ACOUSTIC UNITS OF HUMAN UTTERANCES FOR USE IN SPEECH SYNTHESIS AND RECOGNITION
First Claim
1. A computer-implemented method for analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition, the method comprising:
- (a) initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data, the acoustic wave data being in constrained or unconstrained form;
(b) using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, one or more measurable acoustic parameters selected from the group consisting of pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals, two or more of the acoustic parameters optionally being analyzed and/or modified simultaneously;
(c) analyzing acoustic wave data representing a selected one of the acoustic units to determine the phase state of the acoustic unit; and
(d) analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.
-
Citations
29 Claims
-
1. A computer-implemented method for analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition, the method comprising:
-
(a) initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data, the acoustic wave data being in constrained or unconstrained form; (b) using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, one or more measurable acoustic parameters selected from the group consisting of pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals, two or more of the acoustic parameters optionally being analyzed and/or modified simultaneously; (c) analyzing acoustic wave data representing a selected one of the acoustic units to determine the phase state of the acoustic unit; and (d) analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 21, 22, 24, 25, 26, 27, 28, 29)
-
-
11. A method for categorically mapping the relationship of at least one text unit in a sequence of text to at least one corresponding prosodic phonetic unit, to at least one linguistic feature category in the sequence of text, and to at least one speech utterance represented in a synthesized speech signal, the method comprising:
-
(a) identifying, and optionally modifying, acoustic data representing the at least one speech utterance, to provide the synthesized speech signal; (b) identifying, and optionally modifying, the acoustic data representing the at least one utterance to provide the at least one speech utterance with an expressive prosody determined according to prosodic rules; and (c) identifying acoustic unit feature vectors for each of the at least one prosodic phonetic units, each acoustic unit feature vector comprising a bundle of feature values selected according to proximity to a statistical mean of the values of acoustic unit candidates available for matching with the respective prosodic phonetic unit and, optionally, for acoustic continuity with at least one adjacent acoustic feature vector. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A method for assigning linguistic and acoustic weights to prosodic phonetic units useful for concatenation into synthetic speech or for speech recognition, the method comprising:
-
determining individual linguistic and acoustic weights for each prosodic phonetic unit according to linguistic feature hierarchies related to a prior adjacent prosodic phonetic unit and to a next adjacent prosodic phonetic unit, wherein each candidate acoustic unit can have different target and join weights for each respective end of the candidate acoustic unit; measuring one or more acoustic parameters, optionally F0, F1, F2, F3, energy, and the like, across a particular acoustic unit corresponding to a particular prosodic phonetic unit to determine time related changes in the one or more acoustic parameters; and
, modeling the particular acoustic unit and the relevant acoustic parameter values of the prior adjacent prosodic phonetic unit and the next adjacent prosodic phonetic unit.
-
-
19. A method for deriving a path through acoustic space, the acoustic path comprising desired acoustic feature values for each sequential unit of a sequence of acoustic units to be employed in synthesizing speech from text, the method comprising:
calculating the acoustic path in absolute and/or relative coordinates, optionally in terms of fundamental frequency and/or a change in fundamental frequency over the duration of the synthesizing of the text, for the sequence of acoustic units, each desired sequential acoustic unit being represented by a representation, optionally a single point, multiple points, Hermite splines or another suitable acoustic unit representation, according to a weighted average of the acoustic parameters of the acoustic unit representation, wherein the weighted average is based on a degree of accuracy with which the acoustic parameters for each such sequentially desired acoustic unit are known, and on a degree of influence ascribed to each sequential acoustic unit according to the context of the acoustic unit in the sequence of desired acoustic units.
-
23. A method of deriving an acoustic path comprising a sequence of desired acoustic units extending through unconstrained acoustic space, the acoustic path being useful for synthesizing speech from text with a desired style of speech prosody by concatenating the sequence of desired acoustic units, the method comprising:
-
(a) providing a database of acoustic units wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature; and wherein each acoustic unit has been analyzed according to phase-state metrics so that pitch, energy, and spectral wave data can be modified simultaneously at one or more instants in time; (b) mapping each acoustic unit to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules; (c) calculating weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit; (d) calculating an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units; (e) selecting and modifying, as needed, a sequence of acoustic units, or sub-units, for the synthesized speech according to the differences between the weighted acoustic values for a candidate acoustic unit, or sub-unit, and the weighted acoustic values of a point on the calculated acoustic path.
-
Specification