Providing personalized voice font for text-to-speech applications

  • US 7,693,719 B2
  • Filed: 10/29/2004
  • Issued: 04/06/2010
  • Est. Priority Date: 10/29/2004
  • Status: Expired due to Fees
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A method implemented on a computing device having instructions executable by a processor for synthesizing speech from a text, the speech being in a specified voice, the method comprising:

  • accessing a text-to-speech application through a browser in communication with a network by a user of a client computer;

    generating a personalized voice font based on the one or more waveforms, wherein the user creates a personalized speech audio data at the client computer by speaking a plurality of predetermined utterances into a microphone connected to the client computer, the personalized speech audio data is encoded into a waveform at the client computer, and the waveform is transmitted to a voice font generator of the text-to-speech application over the network, wherein generating the personal voice font after the waveform is transmitted to the voice font generator comprises;

    associating the personalized speech audio data transmitted to the voice font generator with corresponding basic phonetic units, wherein the plurality of predetermined utterances is parsed into one or more basic phonetic units comprising at least one of phonemes, diphones, semi-syllables, or syllables,identifying the one or more basic phonetic units based on corresponding characteristics of a basic phonetic unit, andassociating the one or more basic phonetic units with corresponding segments of the waveform in a data structure, wherein the data structure comprises a table having one column correspond to one or more identifiers of the one or more basic phonetic units, and having another column correspond to the segments of the waveform, wherein each identifier corresponds to one or more segments of the waveform in the table;

    selecting the personalized voice font, wherein a selection is made by the user via the browser of the client computer;

    receiving through the browser of the client computer one or more waveforms characteristic of a voice of a person selected by the user;

    submitting the text from the user'"'"'s client computer via the browser to the text-to-speech application;

    synthesizing speech in the text-to-speech application based on the selected personalized voice font;

    concatenating the personalized voice font into a chain according to an order of basic phonetic units in the text, the basic phonetic units are parsed into phonemes, diphones, semi-syllables, or syllables and identified by an associated diphone, a triphone, a semi-syllable, or a syllable that is associated with a corresponding segment in a waveform;

    downloading concatenated speech segments from a remote computer to the client computer;

    transmitting synthesized speech back to the user of the client computer through the browser; and

    delivering to the user from the text-to-speech application through the browser of the client computer the personalized voice font, whereby speech can be synthesized from text, the speech being in the voice of the selected person, the speech being synthesized using the personalized voice font.

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×