Method and apparatus for generating speech pattern templates
First Claim
1. A method for producing subunit speech patterns comprising the steps of:
- storing a plurality of reference speech pattern templates each comprising a time frame sequence of acoustic feature signals representative of a prescribed spoken reference speech pattern;
storing a set of signals each representative of the time of occurrence of at least one predetermined subunit in the reference acoustic feature signal sequence for each reference speech pattern;
analyzing an utterance of one of said stored reference speech patterns to generate a time frame sequence of acoustic feature signals representative of the utterance;
generating signals representative of the time alignment of said utterance feature signal sequence and said stored reference speech pattern feature signal sequence; and
determining the sequence of utterance feature signals corresponding to the predetermined subunit in said stored reference speech pattern template responsive to the time alignment signals and said reference subunit time of occurrence representative signals.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for generating speech pattern templates for use with either speech recognition or speech synthesis. Reference demisyllable templates are first generated from a reference first speaker using both manual and automatic analysis. The analysis for a second speaker is simplified and automated by comparing with the first speaker'"'"'s templates. The second speaker speaks the same words at a rate time-warped to match the first speakers rate and template. We define a demisyllable as each of the two halves of a syllable, assuming a syllable starts and ends with a noisy consonant, and the syllable is split at its vowel center, thereby simplifying concatenation and comparison. Key features of the invention include generating a set of signals representative of the time alignment between the first and second speaker'"'"'s templates, and the time-of-occurence boundaries of each syllable in a word.
34 Citations
21 Claims
-
1. A method for producing subunit speech patterns comprising the steps of:
storing a plurality of reference speech pattern templates each comprising a time frame sequence of acoustic feature signals representative of a prescribed spoken reference speech pattern;
storing a set of signals each representative of the time of occurrence of at least one predetermined subunit in the reference acoustic feature signal sequence for each reference speech pattern;
analyzing an utterance of one of said stored reference speech patterns to generate a time frame sequence of acoustic feature signals representative of the utterance;
generating signals representative of the time alignment of said utterance feature signal sequence and said stored reference speech pattern feature signal sequence; and
determining the sequence of utterance feature signals corresponding to the predetermined subunit in said stored reference speech pattern template responsive to the time alignment signals and said reference subunit time of occurrence representative signals.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
11. Apparatus for producing subunit speech patterns comprising means for storing a plurality of reference speech pattern templates, each template comprising a time frame sequence of acoustic feature signals representative of a prescribed spoken reference speech pattern and for storing a set of signals representative of the time of occurrence of at least one predetermined subunit in the reference speech pattern template for each reference speech pattern;
- means for analyzing an utterance of one of said stored reference speech patterns to generate a sequence of acoustic speech signals representative of the utterance;
means for generating signals representative of the time alignment of the utterance feature signal sequence and the stored reference speech pattern feature signal sequence; and
means for determining the sequence of utterance feature signals coresponding to the predetermined subunit in said stored reference speech pattern template responsive to the time alignment signals and the reference subunit time of occurrence representative signals. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- means for analyzing an utterance of one of said stored reference speech patterns to generate a sequence of acoustic speech signals representative of the utterance;
Specification