Compact text-to-phone pronunciation dictionary
First Claim
Patent Images
1. A processor for creating a reduced size encoded pronunciation dictionary from an input pronunciation dictionary such that the encoded pronunciation dictionary does not need to be expanded to a larger size in order to be utilized, comprising:
- a reading and sorting processor to read an input pronunciation dictionary and sort the words of the dictionary in alphabetical order;
a word encoder that encodes each word of the pronunciation dictionary by comparing the word with the prior word encoded and either outputs the number of prefix characters that match both the word and the prior word beginning characters if the number of matching characters is greater than or equal to N followed by the suffix characters of the word after character N, or outputs all characters of the word if the number of prefix characters that match the word and prior encoded word is less than N;
a text-to-pronunciation processor that operates on the word to be encoded to generate a pronunciation hypothesis;
a pronunciation comparer that compares the pronunciation of each word by comparing the pronunciation of the word from the input pronunciation dictionary and the pronunciation hypothesis from the text-to-pronunciation processor and determines the minimum number of pronunciation differences consisting of substitutions, deletions and insertions that need to be corrected in the pronunciation hypothesis to convert it to match the pronunciation of the input pronunciation dictionary; and
a pronunciation encoder that compares the pronunciation differences of the word to the pronunciation differences of the prior encoded word and either outputs the number of prefix differences that match the beginning of both the word differences and the prior word differences followed by the suffix differences of the word if the number of prefix matching differences is greater than or equal to M, or outputs all differences to the word if the number of prefix differences that match the word and prior encoded word is less than N.
1 Assignment
0 Petitions
Accused Products
Abstract
A typical English pronunciation dictionary takes up to 1,826,302 bytes in ASCII to store. A five times compression while maintaining computability is achieved by prefix delta encoding of the word and error encoding of the pronunciation.
-
Citations
18 Claims
-
1. A processor for creating a reduced size encoded pronunciation dictionary from an input pronunciation dictionary such that the encoded pronunciation dictionary does not need to be expanded to a larger size in order to be utilized, comprising:
-
a reading and sorting processor to read an input pronunciation dictionary and sort the words of the dictionary in alphabetical order; a word encoder that encodes each word of the pronunciation dictionary by comparing the word with the prior word encoded and either outputs the number of prefix characters that match both the word and the prior word beginning characters if the number of matching characters is greater than or equal to N followed by the suffix characters of the word after character N, or outputs all characters of the word if the number of prefix characters that match the word and prior encoded word is less than N; a text-to-pronunciation processor that operates on the word to be encoded to generate a pronunciation hypothesis; a pronunciation comparer that compares the pronunciation of each word by comparing the pronunciation of the word from the input pronunciation dictionary and the pronunciation hypothesis from the text-to-pronunciation processor and determines the minimum number of pronunciation differences consisting of substitutions, deletions and insertions that need to be corrected in the pronunciation hypothesis to convert it to match the pronunciation of the input pronunciation dictionary; and a pronunciation encoder that compares the pronunciation differences of the word to the pronunciation differences of the prior encoded word and either outputs the number of prefix differences that match the beginning of both the word differences and the prior word differences followed by the suffix differences of the word if the number of prefix matching differences is greater than or equal to M, or outputs all differences to the word if the number of prefix differences that match the word and prior encoded word is less than N. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for creating a reduced size encoded pronunciation dictionary from an input pronunciation dictionary such that the encoded pronunciation dictionary does not need to be expanded to a larger size in order to be utilized, comprising the steps of:
-
reading an input pronunciation dictionary and sorting the words of the dictionary in alphabetical order using a processor; encoding each word of the pronunciation dictionary by comparing the word with the prior word encoded and either outputting the number of prefix characters that match both the word and the prior word beginning characters if the number of matching characters is greater than or equal to N followed by the suffix characters of the word after character N, or outputting all characters of the word if the number of prefix characters that match the word and prior encoded word is less than N; operating on the word to be encoded by a text-to-pronunciation processor to generate a pronunciation hypothesis; comparing the pronunciation of each word using a pronunciation comparer by comparing the pronunciation of the word from the input pronunciation dictionary and the pronunciation hypothesis from the text-to-pronunciation processor and determining the minimum number of pronunciation differences consisting of substitutions, deletions and insertions that need to be corrected in the pronunciation hypothesis to convert it to match the pronunciation of the input pronunciation dictionary; and comparing the pronunciation differences of the word to the pronunciation differences of the prior encoded word by a pronunciation encoder and either outputting the number of prefix differences that match the beginning of both the word differences and the prior word differences followed by the suffix differences of the word if the number of prefix matching differences is greater than or equal to M, or outputting all differences to the word if the number of prefix differences that match the word and prior encoded word is less than N. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification