SYSTEM AND METHOD FOR GENERATING CUSTOMIZED TEXT-TO-SPEECH VOICES
First Claim
1. A method comprising:
- collecting, at a first time, text data from a pre-existing text data source, to yield collected text data, wherein the collected text data is associated with a website, wherein the pre-existing text data source exists at the first time, and wherein no website-related inventory of speech units exists at the first time;
selecting synthesis speech units specific to the website from a pre-existing inventory of synthesis speech units existing at the first time, wherein the selecting occurs using the collected text data, to yield selected synthesis speech units, wherein the synthesis speech units comprise one or more of phonemes, diphones, triphones and syllables;
generating an in-domain inventory of synthesis speech units based on the selected synthesis speech units; and
generating, via a processor and at a second time which is later than the first time, a custom text-to-speech voice for use with the website utilizing the in-domain inventory of synthesis speech units.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.
27 Citations
20 Claims
-
1. A method comprising:
-
collecting, at a first time, text data from a pre-existing text data source, to yield collected text data, wherein the collected text data is associated with a website, wherein the pre-existing text data source exists at the first time, and wherein no website-related inventory of speech units exists at the first time; selecting synthesis speech units specific to the website from a pre-existing inventory of synthesis speech units existing at the first time, wherein the selecting occurs using the collected text data, to yield selected synthesis speech units, wherein the synthesis speech units comprise one or more of phonemes, diphones, triphones and syllables; generating an in-domain inventory of synthesis speech units based on the selected synthesis speech units; and generating, via a processor and at a second time which is later than the first time, a custom text-to-speech voice for use with the website utilizing the in-domain inventory of synthesis speech units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; collecting, at a first time, text data from a pre-existing text data source, to yield collected text data, wherein the collected text data is associated with a website, wherein the pre-existing text data source exists at the first time, and wherein no website-related inventory of speech units exists at the first time; selecting synthesis speech units specific to the website from a pre-existing inventory of synthesis speech units existing at the first time, wherein the selecting occurs using the collected text data, to yield selected synthesis speech units, wherein the synthesis speech units comprise one or more of phonemes, diphones, triphones and syllables; generating an in-domain inventory of synthesis speech units based on the selected synthesis speech units; and generating, at a second time which is later than the first time, a custom text-to-speech voice for use with the website utilizing the in-domain inventory of synthesis speech units. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-readable storage device having instructions stored which, when executed by a processor, cause the processor to perform operations comprising:
-
collecting, at a first time, text data from a pre-existing text data source, to yield collected text data, wherein the collected text data is associated with a website, wherein the pre-existing text data source exists at the first time, and wherein no website-related inventory of speech units exists at the first time; selecting synthesis speech units specific to the website from a pre-existing inventory of synthesis speech units existing at the first time, wherein the selecting occurs using the collected text data, to yield selected synthesis speech units, wherein the synthesis speech units comprise one or more of phonemes, diphones, triphones and syllables; generating an in-domain inventory of synthesis speech units based on the selected synthesis speech units; and generating, at a second time which is later than the first time, a custom text-to-speech voice for use with the website utilizing the in-domain inventory of synthesis speech units.
-
Specification