Writing assistance using machine translation techniques
First Claim
Patent Images
1. A computer-implemented method for providing writing assistance to a user, the method comprising:
- automatically generating a mapping between two strings of text that are in a same language, wherein automatically generating comprises automated generation by a computing device based on statistical or heuristic analysis of the two strings of text relative to a collection of training data, wherein each of the two strings of text is a phrase consisting of more than one word, and wherein analysis of the two strings of text comprises analysis of the two strings of text relative to a collection of native speaker data, the collection of native speaker data comprising pre- and post-editing text samples, pairs of well-formed and deliberately lesioned sentences, aligned sentence pairs collected from paraphrase data, and hand-created correction examples;
providing, based at least in part on the mapping, an indication of a style transformation that could be applied to modify at least a portion of input received from the user, wherein the input received from the user comprises words that are completely in the same language as the two strings of text;
wherein providing an indication of a style transformation comprises providing a list of suggested substitutions that each represent a potential shift from a first style in a first domain to a second style in a second domain;
wherein providing a list comprises providing a list ranked so as to reflect analysis relative to a language model trained based on a collection of well-formed data;
wherein each of the suggested substitutions comprises a string of words that is presented completely in the same language as the two strings of text and as the input received from the user; and
wherein providing an indication of a style transformation comprises evaluating at least one candidate style transformation relative to a target language model.
2 Assignments
0 Petitions
Accused Products
Abstract
A system is configured to provide writing assistance within a monolingual input environment based on statistical machine translation techniques typically utilized to translate from an input language to a different target language.
108 Citations
18 Claims
-
1. A computer-implemented method for providing writing assistance to a user, the method comprising:
-
automatically generating a mapping between two strings of text that are in a same language, wherein automatically generating comprises automated generation by a computing device based on statistical or heuristic analysis of the two strings of text relative to a collection of training data, wherein each of the two strings of text is a phrase consisting of more than one word, and wherein analysis of the two strings of text comprises analysis of the two strings of text relative to a collection of native speaker data, the collection of native speaker data comprising pre- and post-editing text samples, pairs of well-formed and deliberately lesioned sentences, aligned sentence pairs collected from paraphrase data, and hand-created correction examples; providing, based at least in part on the mapping, an indication of a style transformation that could be applied to modify at least a portion of input received from the user, wherein the input received from the user comprises words that are completely in the same language as the two strings of text; wherein providing an indication of a style transformation comprises providing a list of suggested substitutions that each represent a potential shift from a first style in a first domain to a second style in a second domain; wherein providing a list comprises providing a list ranked so as to reflect analysis relative to a language model trained based on a collection of well-formed data; wherein each of the suggested substitutions comprises a string of words that is presented completely in the same language as the two strings of text and as the input received from the user; and wherein providing an indication of a style transformation comprises evaluating at least one candidate style transformation relative to a target language model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented writing assistance system comprising:
-
an alignment component that automatically generates a model of a style transformation based on a comparison of two strings of text that are in a same language, each of the two strings of text being a phrase comprising more than one word, wherein automatically generating comprises automated generation by a computing processor based on statistical or heuristic analysis of the two strings of text relative to a collection of training data also in the same language, wherein automatically generating comprises automatically generating without processing text in a language other than the same language as the two strings of text, wherein automatically generating comprises automatically generating based on a statistical or heuristic alignment of the two strings, said statistical or heuristic alignment being alignment of more than one word phrases, and wherein the analysis of the two string of text relative to the collection of training data comprises analysis of the two strings of text relative to a collection of non-native speaker data, the collection of non-native speaker data comprising pre- and post-editing text samples of typical non-native speaker errors, pairs of well-formed sentences and sentences deliberately lesioned sentences to replicate non-native speaker patterns, aligned sentence pairs lesioned with content words in the native language, and aligned sentence pairs where one member of the pair is low-quality machine translation; and a decoder component configured to process an input string that includes words or phrases and to generate, based at least in part on information received from the alignment component, a list of proposed substitutions that potentially could be applied to transform the style of at least a portion of the input string into a well-formed string, wherein the list of proposed substitutions comprises a list ranked so as to reflect analysis relative to a language model trained based on a collection of well-formed data. - View Dependent Claims (13, 14)
-
- 15. A computer-implemented method for providing writing assistance to a user, the method comprising providing writing assistance, using a processor, based on an automated statistical comparison of a collection of source texts reflecting a first style in a first domain relative to associated target texts reflecting a second style in a second domain, the source texts being in a same language as the target texts, wherein the automated statistical comparison is a monolingual process in that only data in the same language is processed without reference to any corresponding data in a different language, wherein the automated statistical comparison includes a statistical alignment of phrases in the same language, each of the phrases comprising more than one word, wherein the source texts and the target texts contain no grammatical or spelling errors, and wherein the automated statistical comparison of the source texts and the target texts comprises analysis of the source texts and the target texts relative to a collection of non-native speaker data, the collection of non-native speaker data comprising pre- and post-editing text samples of typical non-native speaker errors, pairs of well-formed sentences and sentences deliberately lesioned sentences to replicate non-native speaker patterns, aligned sentence pairs lesioned with content words in the native language, and aligned sentence pairs where one member of the pair is low-quality machine translation.
Specification