Cooccurrence and constructions
First Claim
1. A computer-implemented method for generating an automated evaluation of a corpus by ranking contexts within which a word in said corpus appears, comprising:
- for each word in said corpus, determining with a computer a local ranking for each of one or more contexts,wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises a relative ordering of the word sequence and the word,wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word;
for each context, determining a global ranking;
computing with the computer a statistic for each context based on one or more of the local ranking and the global ranking;
ordering with the computer the one or more contexts based on the computed statistic for each context; and
deriving with the computer said automated evaluation from said ordered one or more ordered contexts.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for performing automatic text analysis is described. A local ranking for one or more contexts with respect to a word and a global ranking for one or more contexts are generated. The rankings are based on the frequency with which the contexts appear in a corpus. A statistic may be generated using the local and global rankings, such as a log ratio rank statistic equal to the logarithm of the global rank divided by local rank, to measure the similarity of contexts with respect to words with which they combine. A source matrix of word to context values is then created. Singular value decomposition is used to create sub-matrices from the source matrix. Vectors from the sub-matrices corresponding to context(s) and/or word(s) are then selected to determine term-term or context-context similarity or term-context correspondence.
16 Citations
33 Claims
-
1. A computer-implemented method for generating an automated evaluation of a corpus by ranking contexts within which a word in said corpus appears, comprising:
-
for each word in said corpus, determining with a computer a local ranking for each of one or more contexts, wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises a relative ordering of the word sequence and the word, wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word; for each context, determining a global ranking; computing with the computer a statistic for each context based on one or more of the local ranking and the global ranking; ordering with the computer the one or more contexts based on the computed statistic for each context; and deriving with the computer said automated evaluation from said ordered one or more ordered contexts. - View Dependent Claims (2, 3)
-
-
4. A computer-implemented method for generating an automated evaluation of a corpus by ranking contexts within which a word in said corpus appears, comprising:
-
for each word in a corpus, determining with a computer a local ranking for each of one or more contexts, wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises a relative ordering of the word sequence and the word, wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word; for each context, determining with the computer a global ranking; computing with the computer a statistic for each context based on one or more of the local ranking and the global ranking; producing with the computer a source matrix of words by contexts in which an attribute of each context is used as a value for the context-word combination; and deriving with the computer said automated evaluation from said source matrix of words by contexts. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer apparatus for producing a document evaluation by ranking contexts within which a word in a corpus appears, comprising:
-
a processing system; and a memory, wherein the processing system is configured to execute steps comprising; for each word in a corpus, determining a local ranking for each of one or more contexts, wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises a relative ordering of the word sequence and the word, wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word; determining a global ranking for each context; computing a statistic for each context based on one or more of the local ranking and the global ranking; ordering the one or more contexts based on the computed statistic for each context; and returning the ordered one or more contexts for use in automatically evaluating a written document. - View Dependent Claims (13, 14)
-
-
15. A computer apparatus for producing a document evaluation by ranking contexts within which a word in a corpus appears, comprising:
-
a processing system; and a memory, wherein the processing system is configured to execute steps comprising; for each word in a corpus, determining a local ranking for each of one or more contexts, wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises a relative ordering of the word sequence and the word, wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word; determining a global ranking for each context; computing a statistic for each context based on one or more of the local ranking and the global ranking; producing a source matrix of words by contexts in which an attribute of each context is used as a value for the context-word combination; and returning the source matrix for use in automatically evaluating a written document. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. An article of manufacture comprising a non-transitory computer readable medium, the computer readable medium comprising instructions for generating an automated evaluation of a corpus by ranking contexts within which a word in said corpus appears, the instructions when executed causing a computer to carry out steps comprising:
-
for each word in said corpus, determining a local ranking for each of one or more contexts, wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises a relative ordering of the word sequence and the word, wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word; for each context, determining a global ranking; computing a statistic for each context based on one or more of the local ranking and the global ranking; ordering the one or more contexts based on the computed statistic for each context; and deriving said automated evaluation from said ordered one or more ordered contexts. - View Dependent Claims (24, 25)
-
-
26. An article of manufacture comprising a non-transitory computer readable medium, the computer readable medium comprising instructions for generating an automated evaluation of a corpus by ranking contexts within which a word in said corpus appears, the instructions when executed causing a computer to carry out steps comprising:
-
for each word in a corpus, determining a local ranking for each of one or more contexts, wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises a relative ordering of the word sequence and the word, wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word; for each context, determining a global ranking; computing a statistic for each context based on one or more of the local ranking and the global ranking; producing a source matrix of words by contexts in which an attribute of each context is used as a value for the context-word combination; and deriving said automated evaluation from said source matrix of words by contexts. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33)
-
Specification