System and method of inferring synonyms using ensemble learning techniques
First Claim
1. A computer implemented method in a data processing system comprising a processor and a memory comprising instructions which are executed by the processor to cause the processor to implement a method for utilizing information sources to identify domain-specific synonyms for an information handling system capable of processing natural language, the method comprising:
- receiving, by the processor, a domain-specific information source;
identifying, by the processor, domain-specific terms to form a candidate synonym list, by comparing each term of the domain-specific information source to an open domain source;
applying, by the processor, at least two ensemble machine learning techniques to train a synonym finder, wherein the synonym finder is a distributional semantics ensemble comprising at least two distributional semantics systems, wherein each distributional semantics system is trained separately, wherein one distributional semantics system is DiSSect, and another distributional semantics system is Glimpse; and
utilizing, by the processor, the synonym finder to find synonyms of terms in the candidate synonym list.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments are directed to a method of utilizing an ensemble of distributional semantics systems in conjunction with a domain term extractor for generating domain-specific synonyms. The method allows for extraction of high-quality, domain-specific synonyms that can be used in an information handling system, such as a question-answer system or in an information retrieval (IR) system, capable of processing natural language. According to embodiments, the domain term extractor identifies the words for which synonyms are sought, and the ensemble of distributional semantics systems determines the synonyms.
13 Citations
14 Claims
-
1. A computer implemented method in a data processing system comprising a processor and a memory comprising instructions which are executed by the processor to cause the processor to implement a method for utilizing information sources to identify domain-specific synonyms for an information handling system capable of processing natural language, the method comprising:
-
receiving, by the processor, a domain-specific information source; identifying, by the processor, domain-specific terms to form a candidate synonym list, by comparing each term of the domain-specific information source to an open domain source; applying, by the processor, at least two ensemble machine learning techniques to train a synonym finder, wherein the synonym finder is a distributional semantics ensemble comprising at least two distributional semantics systems, wherein each distributional semantics system is trained separately, wherein one distributional semantics system is DiSSect, and another distributional semantics system is Glimpse; and utilizing, by the processor, the synonym finder to find synonyms of terms in the candidate synonym list. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for identifying domain-specific synonyms for an information handling system capable of processing natural language, comprising:
a processor configured to; receive a domain-specific information source; identify domain-specific terms to form a candidate synonym list, by comparing each term of the domain-specific information source to an open domain source; apply at least two ensemble machine learning techniques to train a synonym finder, wherein the synonym finder is a distributional semantics ensemble comprising at least two distributional semantics systems, wherein each distributional semantics system is trained separately, wherein one distributional semantics system is DiSSect, and another distributional semantics system is Glimpse; and utilize the synonym finder to find synonyms of terms in the candidate synonym list. - View Dependent Claims (7, 8, 9, 14)
-
10. A computer program product for identifying domain-specific synonyms for an information handling system capable of processing natural language, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
-
receive, by the processor, a domain-specific information source; identify, by the processor, domain-specific terms to form a candidate synonym list, by comparing each term of the domain-specific information source to an open domain source; apply, by the processor, at least two ensemble machine learning techniques to train a synonym finder, wherein the synonym finder is a distributional semantics ensemble comprising at least two distributional semantics systems, wherein each distributional semantics system is trained separately, wherein one distributional semantics system is DiSSect, and another distributional semantics system is Glimpse; and utilize, by the processor, the synonym finder to find synonyms of terms in the candidate synonym list. - View Dependent Claims (11, 12, 13)
-
Specification