Methods employing phase state analysis for use in speech synthesis and recognition

US 10,453,442 B2
Filed: 09/26/2016
Issued: 10/22/2019
Est. Priority Date: 12/18/2008
Status: Active Grant

First Claim

Patent Images

1. A method of computer-implemented speech synthesis, the method comprising:

(a) providing a database of acoustic units accessible to a processor wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time;

(b) mapping each acoustic unit to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules;

(c) calculating with the processor weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit;

(d) calculating with the processor an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units;

(e) selecting and modifying with the processor, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and

(f) generating an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.

Citations

20 Claims

1. A method of computer-implemented speech synthesis, the method comprising:
- (a) providing a database of acoustic units accessible to a processor wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time;
  
  (b) mapping each acoustic unit to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules;
  
  (c) calculating with the processor weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit;
  
  (d) calculating with the processor an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units;
  
  (e) selecting and modifying with the processor, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and
  
  (f) generating an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, further comprising:
    - selecting each acoustic unit of the sequence of acoustic units according to a rank ordering of acoustic units available to represent a specific prosodic phonetic unit, the rank ordering being determined by differences between the acoustic values of available acoustic units and the acoustic values of an acoustic feature vector on a predicted acoustic unit pathway.
  - 3. The method according to claim 1, further comprising:
    - determining individual linguistic and acoustic weights for each prosodic phonetic unit according to linguistic feature hierarchies related to a prior adjacent prosodic phonetic unit and to a next adjacent prosodic phonetic unit, wherein each candidate acoustic unit can have different target and join weights for each respective end of the candidate acoustic unit.
  - 4. The method according to claim 1, further comprising:
    - measuring one or more acoustic parameters across a particular acoustic unit corresponding to a particular prosodic phonetic unit to determine time related changes in the one or more acoustic parameters; and
      
      modeling the particular acoustic unit and relevant acoustic parameter values of a prior adjacent prosodic phonetic unit and a next adjacent prosodic phonetic unit.
  - 5. The method according to claim 4 wherein the modeling comprises applying combinations of fourth-order polynomials and second- and third-order polynomials to represent n-dimensional trajectories of the modeled acoustic units through unconstrained acoustic space.
  - 6. The method according to claim 4, further comprising:
    - modifying an acoustic feature vector of the prior adjacent prosodic unit based on the modeling.
  - 7. The method according to claim 4, further comprising:
    - modifying an acoustic feature vector of the next adjacent prosodic unit based on the modeling.
  - 8. The method according to claim 4, wherein the modeling comprises applying a lower order polynomial to constrained acoustic space.
  - 9. The method according to claim 1, further comprising:
    - calculating a weighted absolute and/or relative acoustic value in terms of fundamental frequency and/or a change in fundamental frequency over the duration of an acoustic unit, based on a particular linguistic context.

10. A computer-readable non-transitory storage medium storing executable instructions for computer-implemented speech synthesis, which when executed by a computer system, cause the computer system to:
- (a) provide a database of acoustic units accessible to a processor wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time;
  
  (b) map each acoustic unit to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules;
  
  (c) calculate with the processor weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit;
  
  (d) calculate with the processor an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units;
  
  (e) select and modify with the processor, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and
  
  (f) generate an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units.
- View Dependent Claims (11)
- - 11. The system of claim 10, wherein the executable instructions are further configured to cause the computer system to:
    - measure one or more acoustic parameters across a particular acoustic unit corresponding to a particular prosodic phonetic unit to determine time related changes in the one or more acoustic parameters; and
      
      model the particular acoustic unit and relevant acoustic parameter values of a prior adjacent prosodic phonetic unit and a next adjacent prosodic phonetic unit.

12. A system for computer-implemented speech synthesis, the system comprising:
- a memory storing a database of acoustic units, wherein each acoustic unit is identified according to a prosodic phonetic unit name and at least one additional linguistic feature, and wherein each acoustic unit has been analyzed according to acoustic wave phase-state metrics so that pitch, energy, and spectral coefficients can be modified simultaneously at one or more instants in time; and
  
  a processor configured to execute to;
  
  (a) map each acoustic unit in the database of acoustic units to prosodic phonetic unit categorizations and additional linguistic categorizations enabling the acoustic unit to be specified and/or altered to provide one or more acoustic units for incorporation into expressively synthesized speech according to prosodic rules;
  
  (b) calculate weighted absolute and/or relative acoustic values for a set of candidate acoustic units to match each desired acoustic unit, one candidate set per desired acoustic unit, matching being in terms of linguistic features for the corresponding mapped prosodic phonetic unit or a substitute for the corresponding mapped prosodic phonetic unit;
  
  (c) calculate an acoustic path through n-dimensional acoustic space to be sequenced as an utterance of synthesized speech, the acoustic path being defined by the weighted average values for each candidate set of acoustic units;
  
  (d) select and modify, based on the acoustic wave phase-state metrics, a sequence of acoustic units for the synthesized speech according to differences between the weighted acoustic values for a candidate acoustic unit and weighted acoustic values of a point on the calculated acoustic path, including modifying a duration of the acoustic units; and
  
  (e) generate an audible output from representative of expressively synthesized speech based on the modified acoustic values of the candidate prosodic phonetic units.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
- - 13. The system of claim 12, wherein the processor is further configured to:
    - select each acoustic unit of the sequence of acoustic units according to a rank ordering of acoustic units available to represent a specific prosodic phonetic unit, the rank ordering being determined by differences between the acoustic values of available acoustic units and the acoustic values of an acoustic feature vector on a predicted acoustic unit pathway.
  - 14. The system of claim 12, wherein the processor is further configured to:
    - determine individual linguistic and acoustic weights for each prosodic phonetic unit according to linguistic feature hierarchies related to a prior adjacent prosodic phonetic unit and to a next adjacent prosodic phonetic unit, wherein each candidate acoustic unit can have different target and join weights for each respective end of the candidate acoustic unit.
  - 15. The system of claim 12, wherein the processor is further configured to:
    - measure one or more acoustic parameters across a particular acoustic unit corresponding to a particular prosodic phonetic unit to determine time related changes in the one or more acoustic parameters; and
      
      model the particular acoustic unit and relevant acoustic parameter values of a prior adjacent prosodic phonetic unit and a next adjacent prosodic phonetic unit.
  - 16. The system of claim 15 wherein the modeling comprises applying combinations of fourth-order polynomials and second- and third-order polynomials to represent n-dimensional trajectories of the modeled acoustic units through unconstrained acoustic space.
  - 17. The system of claim 15, wherein the processor is further configured to:
    - modify an acoustic feature vector of the prior adjacent prosodic unit based on the modeling.
  - 18. The system of claim 15, wherein the processor is further configured to:
    - modify an acoustic feature vector of the next adjacent prosodic unit based on the modeling.
  - 19. The system of claim 15, wherein the modeling comprises applying a lower order polynomial to constrained acoustic space.
  - 20. The system of claim 12, wherein the processor is further configured to:
    - calculate a weighted absolute and/or relative acoustic value in terms of fundamental frequency and/or a change in fundamental frequency over the duration of an acoustic unit, based on a particular linguistic context.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lessac Technologies, Inc.
Original Assignee
Lessac Technologies, Inc.
Inventors
Chandra, Nishant, Wilhelms-Tricarico, Reiner, Nitisaroj, Rattima, Mottershead, Brian, Marple, Gary A., Reichenbach, John B.
Primary Examiner(s)
Opsasnick, Michael N

Application Number

US15/276,483
Publication Number

US 20170011733A1
Time in Patent Office

1,121 Days
Field of Search

704260
US Class Current
CPC Class Codes

G10L 13/027   Concept to speech synthesis...

G10L 13/06   Elementary speech units use...

G10L 13/10   Prosody rules derived from ...

G10L 15/1807   using prosody or stress

G10L 17/02   Preprocessing operations, e...

G10L 25/48   specially adapted for parti...

Methods employing phase state analysis for use in speech synthesis and recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods employing phase state analysis for use in speech synthesis and recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links