System and method for generating customized text-to-speech voices
First Claim
1. A method comprising:
- modifying, based on identification of a user as new, a front-end which converts text into linguistic tokens, to yield a user-specific front-end;
receiving a user selection of an animated character to guide the user; and
generating a custom text-to-speech voice by;
collecting text data associated with a domain from a pre-existing text data source, to yield collected text data;
selecting, based on the user-specific front-end, synthesis speech units specific to the domain from a pre-existing inventory of synthesis speech units using the collected text data;
caching the synthesis speech units specific to the domain as an in-domain inventory of synthesis speech units; and
generating, via a processor, the custom text-to-speech voice for a specific task in the domain utilizing the in-domain inventory of synthesis speech units, wherein the animated character will use the custom text-to-speech voice.
11 Assignments
0 Petitions
Accused Products
Abstract
A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.
14 Citations
20 Claims
-
1. A method comprising:
-
modifying, based on identification of a user as new, a front-end which converts text into linguistic tokens, to yield a user-specific front-end; receiving a user selection of an animated character to guide the user; and generating a custom text-to-speech voice by; collecting text data associated with a domain from a pre-existing text data source, to yield collected text data; selecting, based on the user-specific front-end, synthesis speech units specific to the domain from a pre-existing inventory of synthesis speech units using the collected text data; caching the synthesis speech units specific to the domain as an in-domain inventory of synthesis speech units; and generating, via a processor, the custom text-to-speech voice for a specific task in the domain utilizing the in-domain inventory of synthesis speech units, wherein the animated character will use the custom text-to-speech voice. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; modifying, based on identification of a user as new, a front-end which converts text into linguistic tokens, to yield a user-specific front-end; receiving a user selection of an animated character to guide the user; and generating a custom text-to-speech voice by; collecting text data associated with a domain from a pre-existing text data source, to yield collected text data; selecting, based on the user-specific front-end, synthesis speech units specific to the domain from a pre-existing inventory of synthesis speech units using the collected text data; caching the synthesis speech units specific to the domain as an in-domain inventory of synthesis speech units; and generating, via a processor, a custom text-to-speech voice for a specific task in the domain utilizing the in-domain inventory of synthesis speech units, wherein the animated character will use the custom text-to-speech voice.
-
-
20. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
modifying, based on identification of a user as new, a front-end which converts text into linguistic tokens, to yield a user-specific front-end; receiving a user selection of an animated character to guide the user; and generating a custom text-to-speech voice by; collecting text data associated with a domain from a pre-existing text data source, to yield collected text data; selecting, based on the user-specific front-end, synthesis speech units specific to the domain from a pre-existing inventory of synthesis speech units using the collected text data; caching the synthesis speech units specific to the domain as an in-domain inventory of synthesis speech units; and generating, via a processor, a custom text-to-speech voice for a specific task in the domain utilizing the in-domain inventory of synthesis speech units, wherein the animated character will use the custom text-to-speech voice.
-
Specification