Proofing of word collocation errors based on a comparison with collocations in a corpus
First Claim
Patent Images
1. A method, implemented by a computing system comprising one or more processors, the method comprising:
- comparing, using one or more of the processors, one or more collocations from a text sample with a corpus;
identifying, using one or more of the processors, whether the collocations are disfavored in the corpus; and
providing indications of whether the collocations are disfavored via an output device;
in which comparing the collocations with the corpus comprises performing one or more searches of the World Wide Web using one or more query terms that comprise each of one or more of the collocations; and
in which for each of one or more of the collocations for which searches are performed, a search is performed for each of the one or more query terms that comprise the collocation until either one of the query terms provides search results that meet a preselected threshold for matching the collocation, or all the query terms that comprise the collocation are used without meeting the preselected threshold, and further comprising;
composing one or more query terms with a wild card replacing a word in one of the disfavored collocations;
searching a word collocation reference for the query terms;
identifying results of the search having a relatively high proportion of a candidate word replacing the wild card; and
providing the results of the search having the candidate word via the output device as potentially proper word collocations.
2 Assignments
0 Petitions
Accused Products
Abstract
Collocation errors can be automatically proofed using local and network-based corpora, including the Web. For example, according to one illustrative method, one or more collocations from a text sample are compared with a corpus such as the content of the Web. The collocations are identified for whether they are disfavored in the corpus. Indications are provided via an output device of whether the collocations are disfavored in the corpus. Additional steps may then be taken such as searching for and providing potentially proper word collocations via a user output.
-
Citations
17 Claims
-
1. A method, implemented by a computing system comprising one or more processors, the method comprising:
-
comparing, using one or more of the processors, one or more collocations from a text sample with a corpus; identifying, using one or more of the processors, whether the collocations are disfavored in the corpus; and providing indications of whether the collocations are disfavored via an output device; in which comparing the collocations with the corpus comprises performing one or more searches of the World Wide Web using one or more query terms that comprise each of one or more of the collocations; and in which for each of one or more of the collocations for which searches are performed, a search is performed for each of the one or more query terms that comprise the collocation until either one of the query terms provides search results that meet a preselected threshold for matching the collocation, or all the query terms that comprise the collocation are used without meeting the preselected threshold, and further comprising; composing one or more query terms with a wild card replacing a word in one of the disfavored collocations; searching a word collocation reference for the query terms; identifying results of the search having a relatively high proportion of a candidate word replacing the wild card; and providing the results of the search having the candidate word via the output device as potentially proper word collocations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer readable storage medium comprising instructions executable by a computing system comprising one or more processors, wherein the instructions configure the computing system to:
-
receive an indication of a word collocation in a text; using one or more of the processors, perform a Web search for each of one or more query templates associated with the indicated word collocation, wherein one of the query templates comprises a sentence in which the word collocation was found, one of the query templates comprises a reduced sentence based on the sentence in which the word collocation was found, one of the query templates comprises a chunk pair comprising the word collocation, and one of the query templates comprises an individual word pair comprising the word collocation; using one or more of the processors, evaluate whether results of the Web search for each of the one or more query templates indicates that the word collocation corresponds to normal usage or is a disfavored collocation, as indicated by either an exact match of the query template comprising the sentence, or an exact match of the query template comprising the reduced sentence, or a matching score for the query template comprising the chunk pair that is larger than a preselected threshold for a chunk pair, or a matching score for the query template comprising the individual word pair that is larger than a preselected threshold for an individual word pair; indicate via a user-perceptible output device whether the word collocation corresponds to normal usage or whether the word collocation is disfavored; and further comprising; composing one or more query terms with a wild card replacing a word in one of the disfavored collocations; searching a word collocation reference for the query terms; identifying results of the search having a relatively high proportion of a candidate word replacing the wild card; and providing the results of the search having the candidate word via the output device as potentially proper word collocations.
-
-
17. A computing system comprising:
a computer processor and a data store accessed by the computer processor, the data store storing computer readable instructions and the computer processor accessing the computer readable instructions to; identify word collocations in a text; search the World Wide Web for a set of query templates based on each of one or more of the word collocations; and indicate via a user output device whether results of the search indicate that the word collocations are relatively scarce on the World Wide Web; compose one or more query terms with a wild card replacing a word in one of the relatively scarce word collocations; search the World Wide Web for the query terms; identify results of the search having a relatively high proportion of a candidate word replacing the wild card; and provide the results of the search having the candidate word via the output device as possible replacements for the relatively scarce word collocations.
Specification