Method and System for Text Message Normalization Based on Character Transformation and Web Data
First Claim
Patent Images
1. A method for generating non-standard tokens from a standard token stored in a memory comprising:
- selecting a standard token from a plurality of standard tokens stored in the memory, the selected token having a plurality of input characters;
selecting an operation from a plurality of predetermined operations in accordance with a random field model for each input character in the plurality of input characters;
performing the selected operation on each input character to generate an output token that is different from each token in the plurality of standard tokens; and
storing the output token in the memory in association with the selected token.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for generating non-standard tokens that correspond to standard tokens used in speech synthesis systems has been developed. The method includes selecting a standard token from a plurality of standard tokens stored in memory, using a random field model to select a predetermined operation to perform on each character in the selected token, performing the selected operation on each character to generate an output token, and storing the output token in the memory in association with the selected token. The output token is different from each token in the plurality of standard tokens.
-
Citations
20 Claims
-
1. A method for generating non-standard tokens from a standard token stored in a memory comprising:
-
selecting a standard token from a plurality of standard tokens stored in the memory, the selected token having a plurality of input characters; selecting an operation from a plurality of predetermined operations in accordance with a random field model for each input character in the plurality of input characters; performing the selected operation on each input character to generate an output token that is different from each token in the plurality of standard tokens; and storing the output token in the memory in association with the selected token. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for generating operational parameters for use in a random field model comprising:
-
comparing each token in a first plurality of tokens stored in a memory to a plurality of standard tokens stored in the memory; identifying a first token in the first plurality of tokens as a non-standard token in response to the first token being different from each standard token in the plurality of standard tokens; identifying a second token in the first plurality of tokens as a context token in response to the second token providing contextual information for the first token; generating a database query including the first token and the second token; querying a database with the generated query; identifying a result token corresponding to the first token from a result obtained from the database; and storing the result token in association with the first token in a memory. - View Dependent Claims (11, 12, 13)
-
-
14. A system for generating non-standard tokens from standard tokens comprising:
-
a memory, the memory storing a plurality of standard tokens and a plurality of operational parameters for a random field model; and a processing module operatively connected to the memory, the processing module being configured to; obtain the operational parameters for the random field model from the memory; generate the random field model from the operational parameters; select a standard token from the plurality of standard tokens in the memory, the selected standard token having a plurality of input characters; select an operation from a plurality of predetermined operations in accordance with the random field model for each input character in the plurality of input characters for the selected standard token; perform the selected operation on each input character in the selected standard token to generate an output token that is different from each standard token in the plurality of standard tokens; and store the output token in the memory in association with the selected standard token. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification