×

Methods employing phase state analysis for use in speech synthesis and recognition

  • US 10,453,442 B2
  • Filed: 09/26/2016
  • Issued: 10/22/2019
  • Est. Priority Date: 12/18/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method of computer-implemented speech synthesis, the method comprising:

  • (a) providing a database of acoustic units accessible to a processor wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time;

    (b) mapping each acoustic unit to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules;

    (c) calculating with the processor weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit;

    (d) calculating with the processor an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units;

    (e) selecting and modifying with the processor, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and

    (f) generating an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×