×

Method and system for text-to-speech caching

  • US 7,043,432 B2
  • Filed: 08/29/2001
  • Issued: 05/09/2006
  • Est. Priority Date: 08/29/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. In a text-to-speech system, a method of converting text-to-speech comprising:

  • receiving a text input and a plurality of attributes associated with said text input, wherein said attributes specify stress, gender, grammar, speed, and volume for an audio rendering of said text input;

    generating processed input by parsing and normalizing said text input;

    comparing said processed input to at least one entry in a text-to-speech cache memory, wherein said entry in said text-to-speech cache memory specifies a corresponding spoken output, wherein said text-to-speech cache memory contains a plurality of entries that specify spoken outputs, attributes for rendering spoken output, and callback information, and wherein each spoken output has an assigned score;

    if said processed input matches one of said entries in said text-to-speech cache memory, providing said spoken output specified by said matching entry and rendering said spoken output according to said plurality of attributes associated with said text input;

    if said processed input fails to match one of said entries, generating an additional spoken output with a text-to-speech engine, generating an entry that specifies said additional spoken output, assigning a score to said additional spoken output, storing said additional spoken output and assigned score in said cache memory, and rendering said spoken output with the text-to-speech engine according to said plurality of attributes associated with said text input, wherein each assigned score is an updatable score computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score;

    if the cache memory is full when said additional spoken output is generated, deleting from said cache memory a spoken output having a lower score; and

    generating a display of said text input wherein each word of said display is successively highlighted in coordination with an audible rendering of a word of corresponding spoken output, coordination of said display and spoken output being based on call information stored in said cache memory.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×