Method and system for customizing voice translation of text to speech
First Claim
Patent Images
1. A method, comprising:
- receiving text content for translation to speech;
correlating the text content to textual phrases of multiple words;
converting each textual phrase into a corresponding string of phonemes;
retrieving a phoneme identifier that uniquely represents each phoneme in the string of phonemes;
concatenating each phoneme identifier of each phoneme in the string of phonemes to produce a sequence of phoneme identifiers with each phoneme identifier separated by a comma;
creating a corresponding sequence of phoneme identifiers for each string of phonemes that corresponds to each textual phase in the text content;
concatenating each sequence of phoneme identifiers and separating each sequence of phone identifiers by a semi-colon;
accessing a voice file storing recorded phrases in a speaker'"'"'s voice;
mapping each sequence of phoneme identifiers to a corresponding recorded phrase found in the speaker'"'"'s voice file;
retrieving the recorded phrase from the voice file that corresponds to each sequence of phoneme identifiers from the text content;
concatenating together the recorded phrases from the speaker'"'"'s voice file to form a sequence of the recorded phrases as a speech translation of the text content; and
outputting the speech translation as a translation of the text content to speech.
12 Assignments
0 Petitions
Accused Products
Abstract
A method and system of customizing voice translation of a text to speech includes digitally recording speech samples of a known speaker, correlating each of the speech samples with a standardized audio representation, and organizing the recorded speech samples and correlated audio representations into a collection. The collection of speech samples correlated with audio representations is saved as a single voice file and stored in a device capable of translating the text to speech. The voice file is applied to a translation of text to speech so that the translated speech is customized according to the applied voice file.
-
Citations
21 Claims
-
1. A method, comprising:
-
receiving text content for translation to speech; correlating the text content to textual phrases of multiple words; converting each textual phrase into a corresponding string of phonemes; retrieving a phoneme identifier that uniquely represents each phoneme in the string of phonemes; concatenating each phoneme identifier of each phoneme in the string of phonemes to produce a sequence of phoneme identifiers with each phoneme identifier separated by a comma; creating a corresponding sequence of phoneme identifiers for each string of phonemes that corresponds to each textual phase in the text content; concatenating each sequence of phoneme identifiers and separating each sequence of phone identifiers by a semi-colon; accessing a voice file storing recorded phrases in a speaker'"'"'s voice; mapping each sequence of phoneme identifiers to a corresponding recorded phrase found in the speaker'"'"'s voice file; retrieving the recorded phrase from the voice file that corresponds to each sequence of phoneme identifiers from the text content; concatenating together the recorded phrases from the speaker'"'"'s voice file to form a sequence of the recorded phrases as a speech translation of the text content; and outputting the speech translation as a translation of the text content to speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A text-to-speech translation voice customization system, comprising:
-
means for receiving text content for translation to speech; means for correlating the text content to textual phrases of multiple words; means for converting each textual phrase into a corresponding string of phonemes; means for retrieving a phoneme identifier that uniquely represents each phoneme in the string of phonemes; means for concatenating each phoneme identifier of each phoneme in the string of phonemes to produce a sequence of phoneme identifiers with each phoneme identifier separated by a comma; means for creating a corresponding sequence of phoneme identifiers for each string of phonemes that corresponds to each textual phrase in the text content; means for concatenating each sequence of phoneme identifiers and separating each sequence of phone identifiers by a semi-colon; means for accessing a voice file storing recorded phrases in a speaker'"'"'s voice; means for mapping each sequence of phoneme identifiers to a corresponding recorded phrase in the speaker'"'"'s voice file; means for retrieving the recorded phrase from the voice file that corresponds to each sequence of phoneme identifiers; means for concatenating together the recorded phases from the speaker'"'"'s voice file to form a sequence of the recorded phrases as a speech translation of the text content; and means for outputting the speech translation as a translation of the text content to speech. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A storage medium on which is encoded instructions for performing a method of translating text to speech, the method comprising:
-
receiving text content for translation to speech; correlating the text content to textual phrases of multiple words; converting each textual phrase into a corresponding string of phonemes; retrieving a phoneme identifier that uniquely represents each phoneme in the string of phonemes; concatenating each phoneme identifier of each phoneme in the string of phonemes to produce a sequence of phoneme identifiers with each phoneme identifier separated by a comma; creating a corresponding sequence of phoneme identifiers for each string of phonemes that corresponds to each textual phrase in the text content; concatenating each sequence of phoneme identifiers and separating each sequence of phone identifiers by a semi-colon; accessing a voice file storing recorded phrases in a speaker'"'"'s voice; mapping each sequence of phoneme identifiers to a corresponding recorded phrase in the speaker'"'"'s voice file; retrieving the recorded phrase from the voice file that corresponds to each sequence of phoneme identifiers; concatenating together the recorded phrases from the speaker'"'"'s voice file to form a sequence of the recorded phrases as a speech translation of the text content; and outputting the speech translation as a translation of the text content to speech. - View Dependent Claims (21)
-
Specification