Methods and Systems for Automated Text Correction

US 20130325442A1
Filed: 09/23/2011
Published: 12/05/2013
Est. Priority Date: 09/24/2010
Status: Abandoned Application

First Claim

Patent Images

1. An apparatus, comprising:

at least one processor and a memory device coupled to the at least one processor, in which the at least one processor is configured;

to identify words of an input utterance;

to place the words in a plurality of first nodes stored in the memory device;

to assign a word-layer tag to each of the plurality of first nodes based, in part, on neighboring nodes of the plurality of first nodes; and

to generate an output sentence by combining words from the plurality of first nodes with punctuation marks selected, in part, on the word-layer tags assigned to each of the first nodes.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present embodiments demonstrate systems and methods for automated text correction. In certain embodiments, the methods and systems may be implemented through analysis according to a single text correction model. In a particular embodiment, the single text correction model may be generated through analysis of both a corpus of learner text and a corpus of non-learner text.

Citations

74 Claims

1. An apparatus, comprising:
- at least one processor and a memory device coupled to the at least one processor, in which the at least one processor is configured;
  
  to identify words of an input utterance;
  
  to place the words in a plurality of first nodes stored in the memory device;
  
  to assign a word-layer tag to each of the plurality of first nodes based, in part, on neighboring nodes of the plurality of first nodes; and
  
  to generate an output sentence by combining words from the plurality of first nodes with punctuation marks selected, in part, on the word-layer tags assigned to each of the first nodes.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The apparatus of claim 1, in which the word-layer tag is at least one of none, comma, period, question mark, and exclamation mark.
  - 3. The apparatus of claim 1, in which the plurality of first nodes is a first-order linear chain of conditional random fields.
  - 4. The apparatus of claim 1, in which each of the word-layer tags is placed in a node of a plurality of second nodes stored in the memory device, each of the second nodes coupled to at least one of the first nodes.
  - 5. The apparatus of claim 1, in which the at least one processor is further configured to assign a sentence-layer tag to each of the nodes in the plurality of first nodes based, in part, on boundaries of the input utterance, in which punctuation marks selected for the output sentence are selected, in part, on the sentence-layer tag, in which the sentence-layer tag is at least one of a declaration beginning, declaration inner, question beginning, question inner, exclamation beginning, and exclamation inner, and in which the plurality of first nodes and the plurality of second nodes comprise a two-layer factorial structure of dynamic conditional random fields.

6-7. -7. (canceled)

8. A computer program product, comprising:
- a non-transitory computer-readable medium comprising;
  
  code to identify words of an input utterance;
  
  code to place the words in a plurality of first nodes stored in the memory device;
  
  code to assign a word-layer tag to each of the plurality of first nodes based, in part, on neighboring nodes of the plurality of first nodes; and
  
  code to generate an output sentence by combining words from the plurality of first nodes with punctuation marks selected, in part, on the word-layer tags assigned to each of the first nodes.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The computer program product of claim 8, in which the word-layer tag is at least one of none, comma, period, question mark, and exclamation mark.
  - 10. The computer program product of claim 8, in which the plurality of first nodes is a first-order linear chain of conditional random fields.
  - 11. The computer program product of claim 8, in which each of the word-layer tags is placed in a node of a plurality of second nodes stored in the memory device, each of the second nodes coupled to one of the first nodes.
  - 12. The computer program product of claim 8, in which the medium further comprises code to assign a sentence-layer tag to each of the nodes in the first plurality of nodes based, in part, on boundaries of the input utterance, in which the code to generate the output sentence selects punctuation marks for the output sentence based, in part, on the sentence-layer tag, in which the sentence-layer tag is at least one of a declaration beginning, declaration inner, question beginning, question inner, exclamation beginning, and exclamation inner.

13-33. -33. (canceled)

34. An apparatus, comprising:
- at least one processor and a memory device coupled to the at least one processor, in which the at least one processor is configured;
  
  to receive a natural language text input, the text input comprising a grammatical error in which a portion of the input text comprises a class from a set of classes;
  
  to generate a plurality of selection tasks from a corpus of non-learner text that is assumed to be free of grammatical errors, wherein for each selection task a classifier re-predicts a class used in the non-learner text;
  
  to generate a plurality of correction tasks from a corpus of learner text, wherein for each correction task a classifier proposes a class used in the learner text;
  
  to train a grammar correction model using a set of binary classification problems that include the plurality of selection tasks and the plurality of correction tasks; and
  
  to use the trained grammar correction model to predict a class for the text input from the set of possible classes.
- View Dependent Claims (35, 36, 37, 39, 41, 42, 43)
- - 35. The apparatus of claim 34, in which the at least one processor is further configured to output a suggestion to change the class of the text input to the predicted class if the predicted class is different than the class in the text input.
  - 36. The apparatus of claim 34, wherein the learner text is annotated by a teacher with an assumed correct class.
  - 37. The apparatus of claim 34, wherein the class is an article associated with a noun phrase in the input text, and wherein the at least one processor is further configured to extract feature functions for the classifiers from noun phrases in the non-learner text and the learner text.
  - 39. The apparatus of claim 34, wherein the class is a preposition associated with a prepositional phrase in the input text, and wherein the at least one processor is further configured to extract feature functions for the classifiers from prepositional phrases in the non-learner text and the learner text.
  - 41. The apparatus of claim 34, wherein the non-learner text and the learner text have a different feature space, the feature space of the learner text including the word used by a writer.
  - 42. The apparatus of claim 34, wherein training the grammar correction model comprises minimizing a loss function on the training data.
  - 43. The apparatus of claim 34, wherein training the grammar correction model further comprises identifying a plurality of linear classifiers through analysis of the non-learner text, and wherein the linear classifiers further comprise a weight factor included in a matrix of weight factors, and wherein training the grammar correction model further comprises performing a Singular Value Decomposition (SVD) on the matrix of weight factors.

38. (canceled)

40. (canceled)

44-55. -55. (canceled)

56. An apparatus, comprising at least one processor and a memory device coupled to the at least one processor, in which the at least one processor is configured to correct semantic collection errors by performing the steps of:
- automatically identifying one or more translation candidates in response to analysis of a corpus of parallel-language text conducted in a processing device;
  
  determining, using the processing device, a feature associated with each translation candidate;
  
  generating a set of one or more weight values from a corpus of learner text stored in a data storage device; and
  
  calculating, using a processing device, a score for each of the one or more translation candidates in response to the feature associated with each translation candidate and the set of one or more weight values.
- View Dependent Claims (58, 59, 60, 61)
- - 58. The apparatus of claim 56, in which the at least one processor is further configured to perform the steps of:
    - selecting a parallel corpus of text from a database of parallel texts, each parallel text comprising text of a first language and corresponding text of a second language;
      
      segmenting the text of the first language using the processing device;
      
      tokenizing the text of the second language using the processing device;
      
      automatically aligning words in the first text with words in the second text using the processing device;
      
      extracting phrases from the aligned words in the first text and in the second text using the processing device; and
      
      calculating, using the processing device, a probability of a paraphrase match associated with one or more phrases in the first text and one or more phrases in the second text,wherein the feature associated with each translation candidate is the probability of a paraphrase match.
  - 59. The apparatus of claim 56, wherein the set of one or more weight values is calculated using a minimum error rate training (MERT) operation on a corpus of learner text.
  - 60. The apparatus of claim 56, wherein the at least one processor is further configured to perform the step of generating a phrase table having collocation corrections with features derived from at least one of a spelling edit distance, a homophone dictionary, a synonym dictionary, and native language-induced paraphrases.
  - 61. The apparatus of claim 60, wherein the phrase table comprises one or more penalty features for use in calculating the probability of a paraphrase match.

57. (canceled)

62. A non-transitory tangible computer-readable medium comprising computer-readable code that, when executed by a computer, cause the computer to perform the operation of correcting semantic collocation errors comprising:
- automatically identifying one or more translation candidates in response to analysis of a corpus of parallel-language text conducted in a processing device;
  
  determining, using the processing device, a feature associated with each translation candidate;
  
  generating a set of one or more weight values from a corpus of learner text stored in a data storage device; and
  
  calculating, using a processing device, a score for each of the one or more translation candidates in response to the feature associated with each translation candidate and the set of one or more weight values.
- View Dependent Claims (63, 64, 65, 66)
- - 63. The non-transitory tangible computer-readable medium of claim 62, wherein the computer-readable code further comprises computer-readable code to cause the computer to perform the operations of:
    - selecting a parallel corpus of text from a database of parallel texts, each parallel text comprising text of a first language and corresponding text of a second language;
      
      segmenting the text of the first language using the processing device;
      
      tokenizing the text of the second language using the processing device;
      
      automatically aligning words in the first text with words in the second text using the processing device;
      
      extracting phrases from the aligned words in the first text and in the second text using the processing device; and
      
      calculating, using the processing device, a probability of a paraphrase match associated with one or more phrases in the first text and one or more phrases in the second text,wherein the feature associated with each translation candidate is the probability of a paraphrase match.
  - 64. The non-transitory tangible computer-readable medium of claim 62, wherein the set of one or more weight values is calculated using a minimum error rate training (MERT) operation on a corpus of learner text.
  - 65. The non-transitory tangible computer-readable medium of claim 62, wherein the computer-readable code further comprises computer-readable code to cause the computer to perform the operation of generating a phrase table having collocation corrections with features derived from at least one of a spelling edit distance, a homophone dictionary, a synonym dictionary, and native language-induced paraphrases.
  - 66. The non-transitory tangible computer-readable medium of claim 65, wherein the phrase table comprises one or more penalty features for use in calculating the probability of a paraphrase match.

67. A non-transitory tangible computer-readable medium comprising computer-readable code that, when executed by a computer, cause the computer:
- to receive a natural language text input, the text input comprising a grammatical error in which a portion of the input text comprises a class from a set of classes;
  
  to generate a plurality of selection tasks from a corpus of non-learner text that is assumed to be free of grammatical errors, wherein for each selection task a classifier re-predicts a class used in the non-learner text;
  
  to generate a plurality of correction tasks from a corpus of learner text, wherein for each correction task a classifier proposes a class used in the learner text;
  
  to train a grammar correction model using a set of binary classification problems that include the plurality of selection tasks and the plurality of correction tasks; and
  
  to use the trained grammar correction model to predict a class for the text input from the set of possible classes.
- View Dependent Claims (68, 69, 70, 71, 72, 73, 74)
- - 68. The non-transitory tangible computer-readable medium of claim 67, wherein the computer-readable code further comprises computer-readable code that cause the computer to output a suggestion to change the class of the text input to the predicted class if the predicted class is different than the class in the text input.
  - 69. The non-transitory tangible computer-readable medium of claim 67, wherein the learner text is annotated by a teacher with an assumed correct class.
  - 70. The non-transitory tangible computer-readable medium of claim 67, wherein the class is an article associated with a noun phrase in the input text, and wherein the computer-readable code further comprises computer-readable code that cause the computer to extract feature functions for the classifiers from noun phrases in the non-learner text and the learner text.
  - 71. The non-transitory tangible computer-readable medium of claim 67, wherein the class is a preposition associated with a prepositional phrase in the input text, and wherein the computer-readable code further comprises computer-readable code that cause the computer to extract feature functions for the classifiers from prepositional phrases in the non-learner text and the learner text.
  - 72. The non-transitory tangible computer-readable medium of claim 67, wherein the non-learner text and the learner text have a different feature space, the feature space of the learner text including the word used by a writer.
  - 73. The non-transitory tangible computer-readable medium of claim 67, wherein training the grammar correction model comprises minimizing a loss function on the training data.
  - 74. The non-transitory tangible computer-readable medium of claim 67, wherein training the grammar correction model further comprises identifying a plurality of linear classifiers through analysis of the non-learner text, and wherein the linear classifiers further comprise a weight factor included in a matrix of weight factors, and wherein training the grammar correction model further comprises performing a Singular Value Decomposition (SVD) on the matrix of weight factors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National University of Singapore
Original Assignee
National University of Singapore
Inventors
Lu, Wei, Dahlmeier, Daniel Herman Richard, Ng, Hwee Tou

Application Number

US13/878,983
Publication Number

US 20130325442A1
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/169   Annotation, e.g. comment da...

G06F 40/253   Grammatical analysis; Style...

G06F 40/274   Converting codes to words; ...

Methods and Systems for Automated Text Correction

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

74 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and Systems for Automated Text Correction

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

74 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links