Extraction of lexical kernel units from a domain-specific lexicon
First Claim
1. A computer program product comprising:
- a tangible storage medium readable by a processor and storing instructions for execution by the processor to perform a method comprising;
receiving a candidate lexical kernel unit comprising a word token sequence that includes two or more words;
retrieving domain terms that contain the two or more words from a terminology resource file of domain terms associated with a domain;
analyzing the candidate lexical kernel unit and the retrieved domain terms to determine whether the candidate lexical kernel unit satisfies specified criteria for use as a building block by a natural-language processing (NLP) tool for building larger lexical units in the domain, each of the larger lexical units including a greater number of words than the candidate lexical kernel unit;
identifying the candidate lexical kernel unit as a lexical kernel unit based on determining that the candidate lexical kernel unit satisfies the specified criteria; and
outputting the lexical kernel unit to a domain-specific lexical kernel unit file for input to the NLP tool for use as a lexical resource in parsing natural language text in the domain, the parsing including identifying domain-specific terms in the natural language text in the domain.
1 Assignment
0 Petitions
Accused Products
Abstract
According to an aspect, a candidate lexical kernel unit that includes a word token sequence having two or more words is received. Domain terms that contain the two or more words are retrieved from a terminology resource file of domain terms associated with a domain. The candidate lexical kernel unit and the retrieved domain terms are analyzed to determine whether the candidate lexical kernel unit satisfies specified criteria for use as a building block by a natural-language processing (NLP) tool for building larger lexical units in the domain. Each of the larger lexical units includes a greater number of words than the candidate lexical kernel unit. The candidate lexical kernel unit is identified as a lexical kernel unit based on determining that the candidate lexical kernel unit satisfies the specified criteria. The lexical kernel unit is output to a domain-specific lexical kernel unit file for input to the NLP tool.
-
Citations
12 Claims
-
1. A computer program product comprising:
-
a tangible storage medium readable by a processor and storing instructions for execution by the processor to perform a method comprising; receiving a candidate lexical kernel unit comprising a word token sequence that includes two or more words; retrieving domain terms that contain the two or more words from a terminology resource file of domain terms associated with a domain; analyzing the candidate lexical kernel unit and the retrieved domain terms to determine whether the candidate lexical kernel unit satisfies specified criteria for use as a building block by a natural-language processing (NLP) tool for building larger lexical units in the domain, each of the larger lexical units including a greater number of words than the candidate lexical kernel unit; identifying the candidate lexical kernel unit as a lexical kernel unit based on determining that the candidate lexical kernel unit satisfies the specified criteria; and outputting the lexical kernel unit to a domain-specific lexical kernel unit file for input to the NLP tool for use as a lexical resource in parsing natural language text in the domain, the parsing including identifying domain-specific terms in the natural language text in the domain. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
a memory having computer readable instructions; and a processor for executing the computer readable instructions, the computer readable instructions including; receiving a candidate lexical kernel unit comprising a word token sequence that includes two or more words; retrieving domain terms that contain the two or more words from a terminology resource file of domain terms associated with a domain; analyzing the candidate lexical kernel unit and the retrieved domain terms to determine whether the candidate lexical kernel unit satisfies specified criteria for use as a building block by a natural-language processing (NLP) tool for building larger lexical units in the domain, each of the larger lexical units including a greater number of words than the candidate lexical kernel unit; identifying the candidate lexical kernel unit as a lexical kernel unit based on determining that the candidate lexical kernel unit satisfies the specified criteria; and outputting the lexical kernel unit to a domain-specific lexical kernel unit file for input to the NLP tool for use as a lexical resource in parsing natural language text in the domain, the parsing including identifying domain-specific terms in the natural language text in the domain. - View Dependent Claims (10, 11, 12)
-
Specification