Method for tagging collocations in text
First Claim
Patent Images
1. A method for performing thematic part-of-speech tagging for collocations having content-word pairs in a natural language text processing system comprising the steps of:
- identifying collocations of content-word pairs in a large corpus of text;
calculating, for each of said collocation content-word pair identified, a variability factor which is a measure of variability of said collocation content-word pairs occurring in said text;
storing said collocation content word pairs and associated variability factors in a collocation database; and
using said database to tag collocation content-word pairs according to said variability factors, wherein collocation content-word pairs with high variability factors are tagged as having a verb and a noun thereat and collocation content-word pairs with low variability factors are tagged as having an adjective and a noun thereat or a noun and noun thereat.
1 Assignment
0 Petitions
Accused Products
Abstract
A technique for injecting corpus-based preference into syntactic text parsing is provided. Specifically, the problem of tagging content-word pairs by part-of-speech is solved by using thematic analysis. A new measure of the fixed or variable nature of such word pairs is created and used to classify word pairs as either noun-verb, adjective-noun, or verb-noun.
57 Citations
4 Claims
-
1. A method for performing thematic part-of-speech tagging for collocations having content-word pairs in a natural language text processing system comprising the steps of:
-
identifying collocations of content-word pairs in a large corpus of text; calculating, for each of said collocation content-word pair identified, a variability factor which is a measure of variability of said collocation content-word pairs occurring in said text; storing said collocation content word pairs and associated variability factors in a collocation database; and using said database to tag collocation content-word pairs according to said variability factors, wherein collocation content-word pairs with high variability factors are tagged as having a verb and a noun thereat and collocation content-word pairs with low variability factors are tagged as having an adjective and a noun thereat or a noun and noun thereat. - View Dependent Claims (2, 3, 4)
-
Specification