Method and system for text-to-speech caching
First Claim
Patent Images
1. In a text-to-speech system, a method of converting text-to-speech comprising:
- receiving a text input and a plurality of attributes associated with said text input, wherein said attributes specify stress, gender, grammar, speed, and volume for an audio rendering of said text input;
generating processed input by parsing and normalizing said text input;
comparing said processed input to at least one entry in a text-to-speech cache memory, wherein said entry in said text-to-speech cache memory specifies a corresponding spoken output, wherein said text-to-speech cache memory contains a plurality of entries that specify spoken outputs, attributes for rendering spoken output, and callback information, and wherein each spoken output has an assigned score;
if said processed input matches one of said entries in said text-to-speech cache memory, providing said spoken output specified by said matching entry and rendering said spoken output according to said plurality of attributes associated with said text input;
if said processed input fails to match one of said entries, generating an additional spoken output with a text-to-speech engine, generating an entry that specifies said additional spoken output, assigning a score to said additional spoken output, storing said additional spoken output and assigned score in said cache memory, and rendering said spoken output with the text-to-speech engine according to said plurality of attributes associated with said text input, wherein each assigned score is an updatable score computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score;
if the cache memory is full when said additional spoken output is generated, deleting from said cache memory a spoken output having a lower score; and
generating a display of said text input wherein each word of said display is successively highlighted in coordination with an audible rendering of a word of corresponding spoken output, coordination of said display and spoken output being based on call information stored in said cache memory.
8 Assignments
0 Petitions
Accused Products
Abstract
In a text-to-speech system, a method of converting text-to-speech can include receiving a text input and comparing the received text input to at least one entry in a text-to-speech cache memory. Each entry in the text-to-speech cache memory can specify a corresponding spoken output. If the text input matches one of the entries in the text-to-speech cache memory, the cached speech output specified by the matching entry can be provided.
30 Citations
29 Claims
-
1. In a text-to-speech system, a method of converting text-to-speech comprising:
-
receiving a text input and a plurality of attributes associated with said text input, wherein said attributes specify stress, gender, grammar, speed, and volume for an audio rendering of said text input; generating processed input by parsing and normalizing said text input; comparing said processed input to at least one entry in a text-to-speech cache memory, wherein said entry in said text-to-speech cache memory specifies a corresponding spoken output, wherein said text-to-speech cache memory contains a plurality of entries that specify spoken outputs, attributes for rendering spoken output, and callback information, and wherein each spoken output has an assigned score; if said processed input matches one of said entries in said text-to-speech cache memory, providing said spoken output specified by said matching entry and rendering said spoken output according to said plurality of attributes associated with said text input; if said processed input fails to match one of said entries, generating an additional spoken output with a text-to-speech engine, generating an entry that specifies said additional spoken output, assigning a score to said additional spoken output, storing said additional spoken output and assigned score in said cache memory, and rendering said spoken output with the text-to-speech engine according to said plurality of attributes associated with said text input, wherein each assigned score is an updatable score computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score; if the cache memory is full when said additional spoken output is generated, deleting from said cache memory a spoken output having a lower score; and generating a display of said text input wherein each word of said display is successively highlighted in coordination with an audible rendering of a word of corresponding spoken output, coordination of said display and spoken output being based on call information stored in said cache memory. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of converting text-to-speech using a text-to-speech cache memory having a plurality of entries, wherein said entries comprise a processed form specifying a spoken output, wherein said processed form specifying spoken output does not comprise a digitally encoded audio file, said method comprising:
-
receiving a text input and a plurality of attributes associated with said text input, wherein said attributes specify stress, gender, grammar, speed, and volume for an audio rendering of said text input; processing said text input to determine a form specifying a spoken output for said received text; comparing said determined form of said text input with said entries in said text-to-speech cache memory; assigning a score to each of said entries, wherein each score is an undatable score computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score; if said text input matches one of said entries in said text-to-speech cache memory, providing said processed form specified by said matching entry to a text-to-speech engine; said text-to-speech engine converting said processed form to said spoken output and rendering said spoken output according to said plurality of attributes associated with said text input; and generating a display of said text input wherein each word of said display is successively highlighted in coordination with an audible rendering of a word of said spoken output, coordination of said display and spoken output being based on call information stored in said cache memory. - View Dependent Claims (8, 9, 10)
-
-
11. A method of converting text-to-speech comprising:
-
storing a plurality of entries in a text-to-speech cache memory, wherein the text-to-speech cache memory is directly and locally coupled to at least one text-to-speech engine, wherein each said entry comprises a processed form specifying a spoken output, and wherein said text-to-speech cache memory contains a plurality of entries that specify spoken outputs, attributes for rendering spoken output, and callback information; assigning a score to each one of said plurality of entries; receiving a text input; processing said text input to determine a form specifying a spoken output for said received text; comparing said determined form of said text input with said entries in said text-to-speech cache memory; when at least one of the plurality of entries in said text-to-speech cache memory is matched to said determined form, retrieving the processed form for the matching entry from the text-to-speech cache memory, and using the processed form to generate said spoken output based on said attributes; when at least one of the plurality of entries in said text-to-speech cache memory is not matched to said determined form, using the at least one text-to-speech engine to generate said spoken output; logging when one of said plurality of entries in said text-to-speech cache memory is matched to said received text input generating a display of said text input wherein each word of said display is successively highlighted in coordination with an audible rendering of a word of said spoken output, coordination of said display and spoken output being based on call information stored in said cache memory; and periodically updating said score for each one of said plurality of entries of said text-to-speech cache memory, wherein an updated score is computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score.
-
-
12. A text-to-speech system comprising:
-
a text-to-speech engine for receiving text inputs and a plurality of attributes associated with said text and for producing a spoken output representative of said received text, wherein said attributes specify stress, gender, grammar, speed, and volume for an audio rendering of said text input; and a text-to-speech cache memory for storing selected entries corresponding to received text inputs and a score assigned to each entry wherein said entries specify spoken outputs corresponding to said selected received text inputs, wherein at least one processing interaction occurs between the speech-to-text engine and the text-to-speech cache memory when the text-to-speech engine uses the text-to-speech memory cache to generate the spoken output responsive to receiving text, said processing interactions comprising at least one interaction selected from the group consisting of a pre-processing interaction where the received text is processed into an intermediate form before being compared to entries of the text-to-speech cache that are stored in said intermediate form and a post-matching interaction where the specified spoken outputs retrieved from the text-to-speech cache memory are processed by the text-to-speech engine to generate the spoken output according to the associated attributes, and wherein each score is an undatable score computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score. - View Dependent Claims (13, 14)
-
-
15. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
-
receiving a text input and a plurality of attributes associated with said text input, wherein said attributes specify stress, gender, grammar, speed, and volume for an audio rendering of said text input; generating processed input by parsing and normalizing said text input; comparing said processed input to at least one entry in a text-to-speech cache memory, wherein said entry in said text-to-speech cache memory specifies a corresponding spoken output, wherein said text-to-speech cache memory contains a plurality of entries that specify spoken outputs, attributes for rendering spoken output, and a score corresponding to each entry, wherein each spoken output has an ordinal ranking and wherein each score is an updatable score computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score; if said processed input matches one of said entries in said text-to-speech cache memory, providing said spoken output specified by said matching entry and rendering said spoken output according to said plurality of attributes associated with said text input; if said processed input fails to match one of said entries, generating an additional spoken output with a text-to-speech engine, generating an entry that specifies said additional spoken output, assigning an ordinal ranking to said additional spoken output, storing said additional spoken output and assigned ordinal ranking in said cache memory, and rendering said spoken output with the text-to-speech engine according to said plurality of attributes associated with said text input; if the cache memory is full when said additional spoken output is generated, deleting from said cache memory a spoken output having a lower ordinal ranking; and generating a display of said text input wherein each word of said display is successively highlighted in coordination with an audible rendering of a word of corresponding spoken output, coordination of said display and spoken output being based on call information stored in said cache memory. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
-
storing a plurality of entries in a text-to-speech cache memory, wherein each one of said entries comprises a processed form specifying a spoken output wherein said processed form specifying spoken output does not comprise a digitally encoded audio file; assigning a score to each one of said plurality of entries, each score being an updatable score computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score; receiving a text input and a plurality of attributes associated with said text input, wherein said attributes specify stress, gender, grammar, speed, and volume for an audio rendering of said text input; processing said text input to determine a form specifying a spoken output for said received text; comparing said determined form of said text input with said entries in said text-to-speech cache memory; if said text input matches one of said entries in said text-to-speech cache memory, providing said processed form specified by said matching entry to a text-to-speech engine; said text-to-speech engine converting said processed form to said spoken output and rendering said spoken output according to said plurality of attributes associated with said text input; and generating a display of said text input wherein each word of said display is successively highlighted in coordination with an audible rendering of a word of said spoken output, coordination of said display and spoken output being based on call information stored in said cache memory. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
-
29. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
-
storing a plurality of entries in a text-to-speech cache memory, wherein the text-to-speech cache memory is directly and locally coupled to at least one text-to-speech engine, wherein each said entry comprises a processed form specifying a spoken output, and wherein said text-to-speech cache memory contains a plurality of entries that specify spoken outputs, attributes for rendering spoken output, and callback information; assigning a score to each one of said plurality of entries; receiving a text input; processing said text input to determine a form specifying a spoken output for said received text; comparing said determined form of said text input with said entries in said text-to-speech cache memory; when at least one of the plurality of entries in said text-to-speech cache memory is matched to said determined form, retrieving the processed form for the matching entry from the text-to-speech cache memory, and using the processed form to generate said spoken output based on said attributes; when at least one of the plurality of entries in said text-to-speech cache memory is not matched to said determined form, using the at least one text-to-speech engine to generate said spoken output; logging when one of said plurality of entries in said text-to-speech cache memory is matched to said received text input generating a display of said text input wherein each word of said display is successively highlighted in coordination with an audible rendering of a word of said spoken output, coordination of said display and spoken output being based on call information stored in said cache memory; and periodically updating said score for each one of said plurality of entries of said text-to-speech cache memory, wherein an updated score is computed by multiplying a previous score times a constant between zero and one and adding a number equal to the number of times a corresponding entry has been accessed since a last updating of the score.
-
Specification