Linguistic error detection
First Claim
Patent Images
1. A method for using a computing device to detect linguistic errors, comprising:
- selecting a sequence of three or more words in a phrase;
applying a statistical language model to the selected sequence of three or more words to determine a first probability of occurrence of the selected sequence of three or more words in the phrase;
applying the statistical language model to determine a second probability of occurrence of a random ordering of the words in the selected sequence of three or more words;
calculating a numerical value by comparing the first probability of occurrence to the second probability of occurrence; and
determining that the phrase contains a linguistic error when the calculated numerical value deviates from a first predetermined threshold and the second probability of occurrence deviates from a second predetermined threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
Potential linguistic errors within a sequence of words of a sentence are identified based on analysis of a configurable sliding window. The analysis is performed based on an assumption that if a sequence of words occurs frequently enough within a large, well-formed corpus, its joint probability for occurring in a sentence is very likely to be greater than the same words randomly ordered.
-
Citations
20 Claims
-
1. A method for using a computing device to detect linguistic errors, comprising:
-
selecting a sequence of three or more words in a phrase; applying a statistical language model to the selected sequence of three or more words to determine a first probability of occurrence of the selected sequence of three or more words in the phrase; applying the statistical language model to determine a second probability of occurrence of a random ordering of the words in the selected sequence of three or more words; calculating a numerical value by comparing the first probability of occurrence to the second probability of occurrence; and determining that the phrase contains a linguistic error when the calculated numerical value deviates from a first predetermined threshold and the second probability of occurrence deviates from a second predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computing device, comprising:
-
a processing unit; and a system memory connected to the processing unit, the system memory including instructions that, when executed by the processing unit, cause the processing unit to implement an error detection module configured to detect a linguistic error, wherein the error detection module comprises; a classification module configured to; receive, from an application executing on the computing device, a sequence of three or more words within an electronic document of the application; apply a statistical language model to the selected sequence of three or more words to determine a first probability of occurrence of the selected sequence of three or more words; apply the statistical language model to determine a second probability of occurrence of a random ordering of the words in the selected sequence of three or more words; calculate a numerical value by comparing the first probability of occurrence to the second probability of occurrence; and determine that the sequence of three or more words contains a linguistic error when the calculated numerical value deviates from a first predetermined threshold and the second probability of occurrence deviates from a second predetermined threshold. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A computer readable hardware storage device having computer-executable instructions that, when executed by a computing device, cause the computing device to perform steps comprising:
-
sequentially selecting sequences of three or more words within an electronic document via a configurable sliding window; applying a statistical language model to each selected sequence of three or more words to determine a first probability of occurrence of the selected sequence of three or more words; applying the statistical language model to determine a second probability of occurrence of a random ordering of the words in each selected sequence of three or more words; calculating a numerical value by comparing the first probability of occurrence to the second probability of occurrence; and determining that any of the sequences of three or more words contains a linguistic error when the calculated numerical value is less than a first predetermined threshold and the second probability of occurrence is greater than a second predetermined threshold. - View Dependent Claims (18, 19, 20)
-
Specification