Cooccurrence and constructions
First Claim
1. A system for ranking contexts within which a word in a corpus appears, comprising:
- a processor; and
a processor-readable storage medium operably connected to the processor,wherein the processor-readable storage medium contains one or more programming instructions for performing a method for ranking contexts within which a word in a corpus appears, the method comprising;
for each word in a corpus, determining a local ranking for each of one or more contexts,wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises the relative ordering of the word sequence and the word,wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word;
for each context, determining a global ranking;
computing a statistic for each context based on one or more of the local ranking and the global ranking, andordering the one or more contexts based on the computed statistic for each context.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for performing automatic text analysis is described. A local ranking for one or more contexts with respect to a word and a global ranking for one or more contexts are generated. The rankings are based on the frequency with which the contexts appear in a corpus. A statistic may be generated using the local and global rankings, such as a log ratio rank statistic equal to the logarithm of the global rank divided by local rank, to measure the similarity of contexts with respect to words with which they combine. A source matrix of word to context values is then created. Singular value decomposition is used to create sub-matrices from the source matrix. Vectors from the sub-matrices corresponding to context(s) and/or word(s) are then selected to determine term-term or context-context similarity or term-context correspondence.
-
Citations
11 Claims
-
1. A system for ranking contexts within which a word in a corpus appears, comprising:
-
a processor; and a processor-readable storage medium operably connected to the processor, wherein the processor-readable storage medium contains one or more programming instructions for performing a method for ranking contexts within which a word in a corpus appears, the method comprising; for each word in a corpus, determining a local ranking for each of one or more contexts, wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises the relative ordering of the word sequence and the word, wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word; for each context, determining a global ranking; computing a statistic for each context based on one or more of the local ranking and the global ranking, and ordering the one or more contexts based on the computed statistic for each context. - View Dependent Claims (2, 3)
-
-
4. A system for ranking contexts within which a word in a corpus appears, comprising:
-
a processor; and a processor-readable storage medium operably connected to the processor, wherein the processor-readable storage medium contains one or more programming instructions for performing a method for ranking contexts within which a word in a corpus appears, the method comprising; for each word in a corpus, determining a local ranking for each of one or more contexts, wherein each context comprises a word sequence located in a particular arrangement relative to the word, wherein the particular arrangement comprises the relative ordering of the word sequence and the word, wherein the local ranking comprises an ordering based on the frequency with which each context appears with the word, for each context, determining a global ranking, computing a statistic for each context based on one or more of the local ranking and the global ranking, and producing a source matrix of words by contexts in which an attribute of each context is used as a value for the context-word combination. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
-
Specification