Hybrid baseform generation
First Claim
1. A method for generating baseforms or phonetic spellings front input text, said method comprising:
- generating baseforms from said input text using said defined rules for a language;
identifying specific phones, in said language, that are exceptions to said defined rules;
associating actions with said specific phones, wherein each specific phone is associated with only one action that is defined to correct any of said baseforms that contain said specific phone;
identifying each of said baseforms that contain any of said specific phones;
for each of said baseforms that contain said specific phones, applying a statistical technique to determine whether said specific phones can be modified, wherein said statistical technique is applied solely to baseforms containing said specific phones that are exceptions to said defined rules; and
automatically correcting said baseforms containing said specific phones that can be modified by performing said actions that were associated with said specific phones.
3 Assignments
0 Petitions
Accused Products
Abstract
A method, a computer system and a computer program product for generating baseforms or phonetic spellings from input text are disclosed. The baseforms are initially generated using rules defined for a particular language. Then, phones are identified in the language that are exceptions to the defined rules and an action is associated with each identified phone. A statistical technique is applied to determine whether the identified phones can be modified. Finally, baseforms containing the identified phones that can be modified, are corrected according to the associated actions. Preferably, the statistical technique is only applied to baseforms containing phones that are exceptions to the defined rules. The defined rules can comprise spelling-to-sound rules for a particular phonetic language that incorporate all possible alternative pronunciations of each baseform.
11 Citations
29 Claims
-
1. A method for generating baseforms or phonetic spellings front input text, said method comprising:
-
generating baseforms from said input text using said defined rules for a language; identifying specific phones, in said language, that are exceptions to said defined rules; associating actions with said specific phones, wherein each specific phone is associated with only one action that is defined to correct any of said baseforms that contain said specific phone; identifying each of said baseforms that contain any of said specific phones; for each of said baseforms that contain said specific phones, applying a statistical technique to determine whether said specific phones can be modified, wherein said statistical technique is applied solely to baseforms containing said specific phones that are exceptions to said defined rules; and automatically correcting said baseforms containing said specific phones that can be modified by performing said actions that were associated with said specific phones. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer system for generating baseforms or phonetic spellings from input text, said computer system comprising:
-
a processor means for defining rules for generating a pronunciation dictionary for a particular language; a processor means for generating baseforms from said input text using said defined rules; a processor means for identifying specific phones, in said language, that are exceptions to said defined rules; a processor means for associating actions with said specific phones, wherein each specific phone is associated with only one action that is defined to correct any baseforms that contain said specific phone; a processor means for identifying each of said baseforms that contain any of said specific phones; a processor means for applying a statistical technique to each of said baseforms that contain said specific phones in order to determine whether said specific phones can be modified, wherein said statistical technique is applied solely to baseforms containing said specific phones that are exceptions to said defined rules; and a processor means for automatically correcting said baseforms containing said specific phones that can be modified by performing said actions that were associated with said specific phone. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product having a computer readable medium having a computer program recorded therein for generating baseforms or phonetic spellings from input text, said computer program product comprising:
-
a computer program code means for defining rules for generating a pronunciation dictionary for a particular language; a computer program code means for generating baseforms from said input text using said defined rules; a computer program code means far identifying specific phones, in said language, that are exceptions to said defined rules; a computer program code means for associating actions with said specific phones, wherein each specific phone is associated with only one action that is defined to correct any baseforms that contain said specific phone; a computer program code means for identifying each of said baseforms that contain any of said specific phones; a computer program code means for applying a statistical technique to each of said baseforms that contain said specific phones in order to determine whether said specific phones can be modified, wherein said statistical technique is applied solely to base forms containing said specific phones that are exceptions to said defined rules; and a computer program code means for automatically correcting said baseforms containing said specific phones that can be modified by performing said actions that were associated with said specific phones. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer program product having a computer readable medium having a computer program recorded therein for generating baseforms or phonetic spellings from input text, said computer program product including:
-
a pronunciation dictionary for a particular language; a list of actions associated with specific phones that are exceptions to rules defined for said language, wherein each specific phone is associated with only one action that is defined to correct any baseforms that contain said specific phone; a computer program code means for generating baseforms from said input text according to said defined rules; a computer program code means for identifying baseforms that contain any of said specific phones that are exceptions to said defined rules; a computer program code means for applying a statistical technique to each of said baseforms that contain said specific phones in order to determine whether said specific phones can be modified, wherein said statistical technique is applied solely to baseforms containing said specific phones that are exceptions to said defined rules; and computer program code means for automatically correcting said baseforms containing said specific phones that should can be modified by performing said actions that are associated with said specific phones. - View Dependent Claims (26, 27, 28, 29)
-
Specification