Techniques to create a custom voice font
First Claim
Patent Images
1. A computer-implemented method, comprising:
- receiving voice audio data and a corresponding text script from a client at a server;
processing the voice audio data to produce prosody labels at the server by producing of linguistic prosody labels and pronunciation prosody labels from the text script in a tagger module, and a xml-based rich script comprising of;
pronunciation, part of speech, and a prosody event for each word in the text script;
automatically verifying the voice audio data using the text script at the server by determining a degree of matching between the voice audio data and a corresponding pronunciation in the rich script, ordering sentences in the text script according to the degree of matching, and retaining a sentence having a degree of matching higher than a threshold;
training a custom voice font from the verified voice audio data and rich script at the server where prosody and acoustic models are generated based on the training; and
generating custom voice font data usable by a text-to-speech engine at the server based on the training.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques to create and share custom voice fonts are described. An apparatus may include a preprocessing component to receive voice audio data and a corresponding text script from a client and to process the voice audio data to produce prosody labels and a rich script. The apparatus may further include a verification component to automatically verify the voice audio data and the text script. The apparatus may further include a training component to train a custom voice font from the verified voice audio data and rich script and to generate custom voice font data usable by the TTS component. Other embodiments are described and claimed.
36 Citations
15 Claims
-
1. A computer-implemented method, comprising:
-
receiving voice audio data and a corresponding text script from a client at a server; processing the voice audio data to produce prosody labels at the server by producing of linguistic prosody labels and pronunciation prosody labels from the text script in a tagger module, and a xml-based rich script comprising of;
pronunciation, part of speech, and a prosody event for each word in the text script;automatically verifying the voice audio data using the text script at the server by determining a degree of matching between the voice audio data and a corresponding pronunciation in the rich script, ordering sentences in the text script according to the degree of matching, and retaining a sentence having a degree of matching higher than a threshold; training a custom voice font from the verified voice audio data and rich script at the server where prosody and acoustic models are generated based on the training; and generating custom voice font data usable by a text-to-speech engine at the server based on the training. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An article of manufacture comprising a computer-readable storage medium containing instructions that if executed enable a system to:
-
process voice audio data to produce of linguistic prosody labels and pronunciation prosody labels from a corresponding text script in a tagger module, and a xml based rich script comprising of;
pronunciation, part of speech, and a prosody event for each word in the text script;automatically verify the voice audio data and the corresponding text script by performing speech recognition on the voice audio data to produce recognized speech, determining a degree of matching between the recognized speech and the text script, ordering sentences in the text script according to the degree of matching, and retaining a sentence having a degree of matching higher than a threshold where prosody and acoustic models are generated based on the training; train a custom voice font from the verified voice audio data and rich script; and generate custom voice font data usable by a text-to-speech engine based on the training. - View Dependent Claims (10, 11)
-
-
12. An apparatus, comprising:
-
a processor; a storage medium to receive and store custom voice fonts; and a text-to-speech (TTS) component operative on the processor to convert text to speech using one of the custom voice fonts at a request of a remote client;
wherein a custom voice font is generated by;processing voice audio data received from a client to produce prosody labels by producing of linguistic prosody labels and pronunciation prosody labels from a text script corresponding to the voice audio data in a tagger module, and a rich script comprising of;
pronunciation, part of speech, and a prosody event for each word in the text script;automatically verifying the voice audio data using the text script by determining a degree of matching between the voice audio data and a corresponding pronunciation in the xml based rich script, ordering sentences in the text script according to the degree of matching, and retaining a sentence having a degree of matching higher than a threshold where prosody and acoustic models are generated based on the training; and training the custom voice font from the verified voice audio data and rich script. - View Dependent Claims (13, 14)
-
-
15. The apparatus of 14, wherein the participation activities include at least one of:
- uploading a custom voice font to the storage medium, downloading a custom voice font to a remote client from the storage medium, or receiving a highest rating for a custom voice font.
Specification