Method and system for automatically generating new voice files corresponding to new text from a script
First Claim
1. A method for automatically generating a new voice file from a script having at least one new speech pattern and a plurality of known speech patterns each having a known phonetic sequence associated therewith, the script having a text input and an audio input, the method comprising:
- translating the text input to obtain a corresponding textual phonetic sequence in order to identify at least one new textual phonetic sequence corresponding to the at least one new speech pattern based on the plurality of known speech patterns;
translating the audio input to obtain a corresponding audio phonetic sequence;
aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input of the script to identify at least one new audio phonetic sequence corresponding to the new at least one speech pattern; and
generating the new voice file containing a portion of the audio input based on the at least one new audio phonetic sequence.
7 Assignments
0 Petitions
Accused Products
Abstract
A method and system for automatically generating at least one new voice file corresponding to at least one new text from a script incorporating a plurality of known text having corresponding preexisting voice files associated therewith. A plurality of phonetic sequences corresponding to the plurality of known text is stored in a first memory. A text input corresponding to a textual version of the script is provided and a text-to-phonetic translator translates the text input to obtain a corresponding textual phonetic sequence based on the plurality of phonetic sequences stored in the first memory. An audio input of the script is provided and a speech recognizer generates an audio phonetic sequence of the audio input. A text-to-speech aligner aligns the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input. The at least one new voice file is generated based on the alignment. The at least one new voice file may be stored in a second memory with the plurality of pre-existing voice files for use with a concatenated voice playback system.
130 Citations
67 Claims
-
1. A method for automatically generating a new voice file from a script having at least one new speech pattern and a plurality of known speech patterns each having a known phonetic sequence associated therewith, the script having a text input and an audio input, the method comprising:
-
translating the text input to obtain a corresponding textual phonetic sequence in order to identify at least one new textual phonetic sequence corresponding to the at least one new speech pattern based on the plurality of known speech patterns; translating the audio input to obtain a corresponding audio phonetic sequence; aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input of the script to identify at least one new audio phonetic sequence corresponding to the new at least one speech pattern; and generating the new voice file containing a portion of the audio input based on the at least one new audio phonetic sequence. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
- 2. The method as recited in claim wherein the step of translating the text input includes the step of marking the at least one new speech pattern.
-
22. A method for automatically generating at least one new voice file corresponding to at least one new text from a script incorporating a plurality of known text having corresponding pre-existing voice files associated therewith, the method comprising:
-
storing a plurality of phonetic sequences corresponding to the plurality of known text in a first memory; providing a text input corresponding to a textual version of the script; translating the text input to obtain a corresponding textual phonetic sequence based on the plurality of phonetic sequences stored in the first memory; comparing the text input with the plurality of phonetic sequences stored in the first memory; marking the at least one new text; adding at least one new textual phonetic sequence corresponding to the at least one new text in the first memory, the at least one new textual phonetic transcript corresponding to the audio phonetic transcript of the at least one new text; providing an audio input corresponding to an audio version of the script; generating an audio phonetic sequence of the audio input by comparing the marked at least one new text with the aligned audio input; aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input of the script; generating the at least one new voice file based on the alignment; and editing the at least one new voice file according to a predetermined set of rules, including reducing a level of at least one breath sound of the at least one new voice file by a predetermined amount. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 48, 49, 50)
-
-
32. A system for automatically generating new voice file from a script having at least one new speech pattern and a plurality of known speech patterns each having a known phonetic sequence associated therewith, the script having a text input and an audio input, the system comprising:
-
means for translating the text input to obtain a corresponding textual phonetic sequence in order to identify at least one new textual phonetic sequence corresponding to the at least one new speech pattern based on the plurality of known speech patterns; means for translating the audio input to obtain a corresponding audio phonetic sequence; means for aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input to identify at least one new audio phonetic sequence corresponding to the new speech pattern; and means for generating the new voice file containing a portion of the audio input based on the at least one new audio phonetic sequence. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53, 54, 55, 56, 57)
-
-
58. A system for automatically generating at least one new voice file corresponding to at least one new text from a script incorporating a plurality of known text having corresponding pre-existing voice files associated therewith, the system comprising:
-
first memory for storing a plurality of phonetic sequences corresponding to the plurality of known text; means for providing a text input corresponding to a textual version of the script; means for translating text input to obtain a corresponding textual phonetic sequence based on a comparison of the textual version of the script with the plurality of phonetic sequences stored in the first memory; means for marking the at least one new text; means for adding at least one new textual phonetic sequence corresponding to the at least one new text in the first memory, the new textual phonetic transcript corresponding to the audio phonetic transcript of the at least one new text; means for providing an audio input corresponding to an audio version of the spirit; first means for generating an audio phonetic sequence of the audio input; means for aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input; second means for generating the at least one new voice file based on the alignment based on a comparison of the marked at least one new text with the aligned audio input; and means for editing the at least one new voice file according to a predetermined set of rules, including reducing a level of at least one breath sound of the at least one new voice file by a predetermined amount. - View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66, 67)
-
Specification