Automatic detection and application of editing patterns in draft documents
First Claim
1. A method performed by a computer processor executing computer program instructions tangibly stored on a first computer-readable medium, the method comprising steps of:
- (A) tangibly storing, on a second computer-readable medium, a data structure representing a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates particular content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E, wherein the original document corpus and the edited document corpus are tangibly stored on a third and fourth computer-readable medium, respectively;
(B) deriving a plurality of correction rules, tangibly stored on a fifth computer-readable medium, from the plurality of editing patterns; and
(C) deriving a classifier, tangibly stored on a sixth computer-readable medium, for particular content D based on the data structure representing the plurality of editing patterns, the classifier defining decision criteria for selecting one of the plurality of correction rules to apply to content D based on a context C* of content D.
12 Assignments
0 Petitions
Accused Products
Abstract
An error detection and correction system extracts editing patterns and derives correction rules from them by observing differences between draft documents and corresponding edited documents, and/or by observing editing operations performed on the draft documents to produce the edited documents. The system develops classifiers that partition the space of all possible contexts into equivalence classes and assigns one or more correction rules to each such class). Once the system has been trained, it may be used to detect and (optionally) correct errors in new draft documents. When presented with a draft document, the system identifies first content (e.g., text) in the draft document and identifies a context of the first content. The system identifies a correction rule based on the first content and the first context. The system may use a classifier to identify the correction rule. The system applies the correction rule to the first content to produce second content.
-
Citations
21 Claims
-
1. A method performed by a computer processor executing computer program instructions tangibly stored on a first computer-readable medium, the method comprising steps of:
-
(A) tangibly storing, on a second computer-readable medium, a data structure representing a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates particular content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E, wherein the original document corpus and the edited document corpus are tangibly stored on a third and fourth computer-readable medium, respectively; (B) deriving a plurality of correction rules, tangibly stored on a fifth computer-readable medium, from the plurality of editing patterns; and (C) deriving a classifier, tangibly stored on a sixth computer-readable medium, for particular content D based on the data structure representing the plurality of editing patterns, the classifier defining decision criteria for selecting one of the plurality of correction rules to apply to content D based on a context C* of content D. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method performed by a computer processor executing computer program instructions tangibly stored on a first computer-readable medium, the method comprising steps of:
-
(A) tangibly storing, on a second computer-readable medium, a data structure representing a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates particular content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E, wherein the original document corpus and the edited document corpus are tangibly stored on a third and fourth computer-readable medium, respectively, wherein (A) comprises; (A)(1) comparing documents in the original document corpus to corresponding documents in the edited document corpus to identify differences between them, the differences comprising at least one word that appears in the original document corpus in a particular context C0 and that does not appear in the edited document corpus in the context C0; and (A)(2) generating the plurality of editing patterns to reflect the identified differences between the documents in the original document corpus and the documents in the edited document corpus; and (B) deriving at least one correction rule, tangibly stored on a fifth computer-readable medium, from the data structure representing the plurality of editing patterns. - View Dependent Claims (14)
-
-
15. A computer program product, comprising a first computer-readable medium having computer readable program code tangible embodied therein, said computer readable program code adapted to be executed by a processor to implement a method, the method comprising:
-
(A) tangibly storing, on a second computer-readable medium, a data structure representing a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates particular content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E, wherein the original document corpus and the edited document corpus are tangibly stored on a third and fourth computer-readable medium, respectively; (B) deriving a plurality of correction rules, tangibly stored on a fifth computer-readable medium, from the plurality of editing patterns; and (C) deriving a classifier, tangibly stored on a sixth computer-readable medium, for particular content D based on the plurality of editing patterns, the classifier defining decision criteria for selecting one of the plurality of correction rules to apply to content D based on a context C* of content D. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification