System and method for identifying special word usage in a document
First Claim
1. A method of identifying potential novel word usage in a document, comprising:
- determining a part-of-speech assignment for each word in the document using a first part-of-speech tagger;
determining a part-of-speech assignment for each word in the document using a second part-of-speech tagger different from the first part-of-speech tagger;
comparing the part-of-speech assignment of the first and second part-of-speech taggers;
generating a differential word set having words with different part-of-speech assignment by the first and second part-of-speech taggers, the words in the differential word set being candidates of words of novel usage; and
determining a weight to each word in the differential word set in response to the part-of-speech assignment of the word by the first part-of-speech tagger.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of identifying potential novel word usage in a document comprises determining a part-of-speech assignment for each word in the document using a first part-of-speech tagger, determining a part-of-speech assignment for each word in the document using a second part-of-speech tagger different from the first part-of-speech tagger, and comparing the part-of-speech assignment of the first and second part-of-speech taggers. The method then generates a differential word set having words with different part-of-speech assignment by the first and second part-of-speech taggers. The words in the differential word set are candidates of words of novel usage.
173 Citations
19 Claims
-
1. A method of identifying potential novel word usage in a document, comprising:
-
determining a part-of-speech assignment for each word in the document using a first part-of-speech tagger; determining a part-of-speech assignment for each word in the document using a second part-of-speech tagger different from the first part-of-speech tagger; comparing the part-of-speech assignment of the first and second part-of-speech taggers; generating a differential word set having words with different part-of-speech assignment by the first and second part-of-speech taggers, the words in the differential word set being candidates of words of novel usage; and determining a weight to each word in the differential word set in response to the part-of-speech assignment of the word by the first part-of-speech tagger. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of identifying potential novel word usage in a document, comprising:
-
determining a part-of-speech assignment for each word in the document using a first part-of-speech tagger; determining a part-of-speech assignment for each word in the document using a second part-of-speech tagger different from the first part-of-speech tagger; comparing the part-of-speech assignment of the first and second part-of-speech taggers; generating a differential word set having words with different part-of-speech assignment by the first and second part-of-speech taggers, the words in the differential word set being candidates of words of novel usage; and
determining a weight to each word in the differential word set, wherein determining a weight to each word comprises determining a weight in response to a deviation from an expected part-of-speech usage of the word. - View Dependent Claims (7, 8, 9)
-
-
10. A computer-readable article encoded with a computer-executable process, the process comprising:
-
assigning a first part-of-speech tag to words in at least one document according to a first part-of-speech tagging method; assigning a second part-of-speech tag for words in the at least one document according to a second part-of-speech tagging method more simplistic than the first part-of-speech tagging method; comparing the first and second part-of-speech tags; generating a differential word set having words with different first and second part-of-speech tags; and determining a weight to each word in the differential word set in response to the first part-of-speech tag of the word. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A system for identifying potential novel word usage in a document set comprising:
-
a microprocessor; and a series of computer instructions comprising a method of; assigning a first part-of-speech tag to words in at least one document according to a first part-of-speech tagging method; assigning a second part-of-speech tag for words in at least one document according to a second part-of-speech tagging method more simplistic than the first part-of-speech tagging method; comparing the first and second part-of-speech tags; generating a differential word set having words with different first and second part-of-speech tags; and selecting words of novel usage from the differential word set meeting a predetermined weight criteria. - View Dependent Claims (18, 19)
-
Specification