Method for word sense disambiguation for homonym words based on part of speech (POS) tag of a non-homonym word
First Claim
1. A computer-implemented method of processing a first text stream for execution by a processor, the method comprising:
- accessing, from a non-transitory computer-readable medium, the first text stream;
parsing the first text stream by breaking the first text stream down into a first collection of words;
analysing the first collection of words to identify a homonym candidate, the homonym candidate being associated with a first meaning and a second meaning;
generating a homonym word pattern, the homonym word pattern comprising at least one word of the first collection of words, the at least one word being selected based on a distance between the at least one word and the homonym candidate in the first text stream, the distance being a number of words separating the at least one word of the first collection of words from the homonym candidate in the first text stream;
determining, for at least one word of the homonym word pattern, a first context element;
generating a homonym context pattern, the homonym context pattern being at least partially based on the first context element;
parsing a second text stream by breaking the second text stream down into a second collection of words, the second collection of words being distinct from the first collection of words, the second text stream being in a same language as the first text stream;
analysing the second collection of words to identify a non-homonym candidate, the non-homonym candidate being associated with a lexical tag;
if a non-homonym context pattern at least partially matches the homonym context pattern, the non-homonym context pattern being at least partially based on a second context element determined for at least one word of a non-homonym word pattern,assigning the lexical tag associated with the non-homonym candidate to the homonym candidate;
storing, to a memory coupled to the processor, the lexical tag; and
rendering the lexical tag on a display of an electronic device.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method of (600) and a system (222, 208) for processing a text stream. The method comprises accessing (602) the text stream; parsing (604) the text stream; analyzing (606) a first collection of words to identify a homonym candidate; generating (608) a homonym word pattern, the homonym word pattern comprising at least one word of the first collection of words; determining (610), for at least one word of the homonym word pattern, a first context element; generating (612) a homonym context pattern; analyzing (614) a second collection of words to identify a non-homonym candidate having a non-homonym context pattern at least partially matching the homonym context pattern, the non-homonym candidate being associated with a lexical tag; and assigning (616) the lexical tag associated with the non-homonym candidate to the homonym candidate.
-
Citations
18 Claims
-
1. A computer-implemented method of processing a first text stream for execution by a processor, the method comprising:
-
accessing, from a non-transitory computer-readable medium, the first text stream; parsing the first text stream by breaking the first text stream down into a first collection of words; analysing the first collection of words to identify a homonym candidate, the homonym candidate being associated with a first meaning and a second meaning; generating a homonym word pattern, the homonym word pattern comprising at least one word of the first collection of words, the at least one word being selected based on a distance between the at least one word and the homonym candidate in the first text stream, the distance being a number of words separating the at least one word of the first collection of words from the homonym candidate in the first text stream; determining, for at least one word of the homonym word pattern, a first context element; generating a homonym context pattern, the homonym context pattern being at least partially based on the first context element; parsing a second text stream by breaking the second text stream down into a second collection of words, the second collection of words being distinct from the first collection of words, the second text stream being in a same language as the first text stream; analysing the second collection of words to identify a non-homonym candidate, the non-homonym candidate being associated with a lexical tag; if a non-homonym context pattern at least partially matches the homonym context pattern, the non-homonym context pattern being at least partially based on a second context element determined for at least one word of a non-homonym word pattern, assigning the lexical tag associated with the non-homonym candidate to the homonym candidate; storing, to a memory coupled to the processor, the lexical tag; and rendering the lexical tag on a display of an electronic device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer-implemented method of processing a first text stream for execution by a processor, the method comprising:
-
accessing, from a non-transitory computer-readable medium, the first text stream; parsing the first text stream by breaking the first text stream down into a first collection of words; analysing the first collection of words to identify a homonym candidate, the homonym candidate being associated with a first meaning and a second meaning; generating a homonym word pattern, the homonym word pattern comprising at least one word of the collection of words, the at least one word being selected based on a distance between the at least one word and the homonym candidate in the first text stream, the distance being a number of words separating the at least one word of the first collection of words from the homonym candidate in the first text stream; determining, for at least one word of the homonym word pattern, a first context element; generating a homonym context pattern, the homonym context pattern being at least partially based on the first context element; parsing a second text stream by breaking the second text stream down into a second collection of words, the second collection of words being distinct from the first collection of words, the second text stream being in a same language as the first text stream; analysing the second collection of words to identify a non-homonym candidate, the non-homonym candidate being associated with a lexical tag; if a non-homonym context pattern at least partially matches the homonym context pattern, the non-homonym context pattern being at least partially based on a second context element determined for at least one word of a non-homonym word pattern, determining which one of the first meaning and the second meaning of the homonym candidate is to be retained based on the lexical tag associated with the non-homonym candidate; and rendering a retained meaning on a display of an electronic device.
-
-
15. A computer-implemented system for processing a first text stream, the system comprising:
-
a non-transitory computer-readable medium; a processor configured to perform; accessing, from the non-transitory computer-readable medium, the first text stream; parsing the first text stream by breaking the first text stream down into a first collection of words; analysing the first collection of words to identify a homonym candidate, the homonym candidate being associated with a first meaning and a second meaning; generating a homonym word pattern, the homonym word pattern comprising at least one word of the first collection of words, the at least one word being selected based on a distance between the at least one word and the homonym candidate in the first text stream, the distance being a number of words separating the at least one word of the first collection of words from the homonym candidate in the first text stream; determining, for at least one word of the homonym word pattern, a first context element; generating a homonym context pattern, the homonym context pattern being at least partially based on the first context element; parsing a second text stream by breaking the second text stream down into a second collection of words, the second collection of words being distinct from the first collection of words, the second text stream being in a same language as the first text stream; analysing the second collection of words to identify a non-homonym candidate, the non-homonym candidate being associated with a lexical tag; if a non-homonym context pattern at least partially matches the homonym context pattern, the non-homonym context pattern being at least partially based on a second context element determined for at least one word of a non-homonym word pattern, assigning the lexical tag associated with the non-homonym candidate to the homonym candidate; and storing, to a memory coupled to the processor, the lexical tag assigned to the homonym candidate. - View Dependent Claims (16, 17, 18)
-
Specification