Automatic detection and application of editing patterns in draft documents
First Claim
1. A method performed by a computer processor executing computer program instructions tangibly stored on a non-transitory computer readable medium, the method comprising:
- (A) identifying a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E; and
(B) identifying a plurality of likelihoods of correctness of the plurality of editing patterns, wherein (B) comprises;
(B) (1) counting a number of positive instances in which content D in context C in the original document corpus has been replaced with content E in context C in the edited document corpus, and identifying a likelihood of correctness of the editing pattern T=(D,E,C) based on the number of positive instances; and
(B) (2) counting a number of negative instances in which content D in context C in the original document corpus remained unchanged in context C in the edited document corpus, and identifying a likelihood of correctness of the editing pattern T=(D,E,C) based on the number of positive instances and the number of negative instances;
(C) selecting one of the plurality of editing patterns based on the plurality of likelihoods of correctness; and
(D) applying the selected editing pattern to replace an instance of content D in a further document with an instance of content E in the further document.
12 Assignments
0 Petitions
Accused Products
Abstract
An error detection and correction system extracts editing patterns and derives correction rules from them by observing differences between draft documents and corresponding edited documents, and/or by observing editing operations performed on the draft documents to produce the edited documents. The system develops classifiers that partition the space of all possible contexts into equivalence classes and assigns one or more correction rules to each such class). Once the system has been trained, it may be used to detect and (optionally) correct errors in new draft documents. When presented with a draft document, the system identifies first content (e.g., text) in the draft document and identifies a context of the first content. The system identifies a correction rule based on the first content and the first context. The system may use a classifier to identify the correction rule. The system applies the correction rule to the first content to produce second content.
-
Citations
16 Claims
-
1. A method performed by a computer processor executing computer program instructions tangibly stored on a non-transitory computer readable medium, the method comprising:
-
(A) identifying a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E; and (B) identifying a plurality of likelihoods of correctness of the plurality of editing patterns, wherein (B) comprises; (B) (1) counting a number of positive instances in which content D in context C in the original document corpus has been replaced with content E in context C in the edited document corpus, and identifying a likelihood of correctness of the editing pattern T=(D,E,C) based on the number of positive instances; and (B) (2) counting a number of negative instances in which content D in context C in the original document corpus remained unchanged in context C in the edited document corpus, and identifying a likelihood of correctness of the editing pattern T=(D,E,C) based on the number of positive instances and the number of negative instances; (C) selecting one of the plurality of editing patterns based on the plurality of likelihoods of correctness; and (D) applying the selected editing pattern to replace an instance of content D in a further document with an instance of content E in the further document. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer readable medium comprising computer program instructions executable by a computer processor to perform a method, the method comprising:
-
(A) identifying a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E; and (B) identifying a plurality of likelihoods of correctness of the plurality of editing patterns, wherein (B) comprises; (B) (1) counting a number of positive instances in which content D in context C in the original document corpus has been replaced with content E in context C in the edited document corpus, and identifying a likelihood of correctness of the editing pattern T=(D,E,C) based on the number of positive instances; and (B) (2) counting a number of negative instances in which content D in context C in the original document corpus remained unchanged in context C in the edited document corpus, and identifying a likelihood of correctness of the editing pattern T=(D,E,C) based on the number of positive instances and the number of negative instances; (C) selecting one of the plurality of editing patterns based on the plurality of likelihoods of correctness; and (D) applying the selected editing pattern to replace an instance of content D in a further document with an instance of content E in the further document. - View Dependent Claims (6, 7, 8)
-
-
9. A method performed by a computer processor executing computer program instructions tangibly stored on a non-transitory computer readable medium, the method comprising:
-
(A) identifying a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E; and (B) identifying a plurality of likelihoods of correctness of the plurality of editing patterns; (C) receiving an input from a user representing a selection of one of the plurality of editing patterns; (D) applying the selected editing pattern to replace an instance of content D in a further document with an instance of content E in the further document; and (E) increasing the likelihood of correctness of the selected editing pattern in response to the selection; wherein the plurality of likelihoods of correctness includes an initial likelihood of correctness of the selected editing pattern; wherein the selected editing pattern comprises a positive editing pattern; wherein the initial likelihood of correctness of the selected editing pattern is based on a number of occurrences of the positive editing pattern; and wherein (E) comprises incrementing the number of occurrences of the positive editing pattern and increasing the likelihood of correctness of the selected editing pattern based on the incremented number of occurrences of the positive editing pattern. - View Dependent Claims (10, 11, 12)
-
-
13. A non-transitory computer readable medium comprising computer program instructions executable by a computer processor to perform a method, the method comprising:
-
(A) identifying a plurality of editing patterns of the form T=(D,E,C), wherein each of the plurality of editing patterns relates content D in an original document corpus to corresponding content E in an edited document corpus in a context C shared by contents D and E; and (B) identifying a plurality of likelihoods of correctness of the plurality of editing patterns; (C) receiving an input from a user representing a selection of one of the plurality of editing patterns; (D) applying the selected editing pattern to replace an instance of content D in a further document with an instance of content E in the further document; and (E) increasing the likelihood of correctness of the selected editing pattern in response to the selection; wherein the plurality of likelihoods of correctness includes an initial likelihood of correctness of the selected editing pattern; wherein the selected editing pattern comprises a positive editing pattern; wherein the initial likelihood of correctness of the selected editing pattern is based on a number of occurrences of the positive editing pattern; and wherein (E) comprises incrementing the number of occurrences of the positive editing pattern and increasing the likelihood of correctness of the selected editing pattern based on the incremented number of occurrences of the positive editing pattern. - View Dependent Claims (14, 15, 16)
-
Specification