Method and system for learning ontological relations from documents
First Claim
1. A computer-implemented method of learning ontological relations in a document stored in a computer memory, the method executed in a processor-based platform and comprising:
- extracting terms from noun-phrases in the document, wherein the terms are included in word sequences comprising the noun-phrases;
generating short-distance patterns for both coordinate terms and ontological relations in the document, wherein short-distance patterns comprise terms separated by at most one additional term, and the coordinate terms comprise terms that share the same hypernym/holonym parent;
generalizing the short-distance patterns for both the coordinate terms and the ontological relations;
identifying short-distance coordinate relations and ontological relations by grouping the short-distance patterns by verbs or prepositions, extracting the longest common substring within each group of pattern strings, and deriving generalized patterns for every verb or preposition;
respectively from the generalized short-distance coordinate terms and short-distance ontological relations; and
deriving long-distance ontological relations from the identified short-distance coordinate relations and ontological relations, wherein the long-distance ontological relations comprise ontological relations between terms separated by at least two additional terms.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of an ontological determination method for use in natural language processing applications are described. In one embodiment, shallow lexico-syntactic patterns are applied to identify relations by extracting term features to distinguish relation terms from non-relation terms, identifying coordinate relations for every adjacent terms; identifying short-distance ontological (e.g., hypernym or part-whole relations) for other adjacent terms based on term features and lexico-syntactic patterns; and then inferring long-distance hypernym and part-whole relations based on the identified coordinate relations and the short-distance relations.
-
Citations
13 Claims
-
1. A computer-implemented method of learning ontological relations in a document stored in a computer memory, the method executed in a processor-based platform and comprising:
-
extracting terms from noun-phrases in the document, wherein the terms are included in word sequences comprising the noun-phrases; generating short-distance patterns for both coordinate terms and ontological relations in the document, wherein short-distance patterns comprise terms separated by at most one additional term, and the coordinate terms comprise terms that share the same hypernym/holonym parent; generalizing the short-distance patterns for both the coordinate terms and the ontological relations; identifying short-distance coordinate relations and ontological relations by grouping the short-distance patterns by verbs or prepositions, extracting the longest common substring within each group of pattern strings, and deriving generalized patterns for every verb or preposition;
respectively from the generalized short-distance coordinate terms and short-distance ontological relations; andderiving long-distance ontological relations from the identified short-distance coordinate relations and ontological relations, wherein the long-distance ontological relations comprise ontological relations between terms separated by at least two additional terms. - View Dependent Claims (2, 3, 4, 5, 6, 13)
-
-
7. An apparatus, comprising:
-
an input stage for inputting noun phrase portions of sentences in the document; a database functionally coupled to the input stage and configured to store a knowledge base including words from the document; and an ontological learning component coupled to the input stage, and including a first processor identifying relations by extracting terms from the noun phrase portions to distinguish relation terms from non-relation terms, a second processor identifying coordinate relations for every pair of adjacent terms, a third processor identifying short-distance ontological relations for other adjacent terms based on term features and lexico-syntactic patterns, and inferring long-distance ontological relations based on the identified coordinate relations and the short-distance relations, wherein short-distance ontological relations comprise ontological relations between adjacent terms separated by at most one intermediate term, and the long-distance ontological relations comprise ontological relations between terms separated by at least two additional terms, and further wherein the coordinate terms comprise terms that share the same hypernym/holonym parent, the third processor further generalizing the short-distance pattern for both the coordinate terms and the ontological relations by grouping the short-distance patterns by verbs or prepositions, extracting the longest common substring within each group of pattern strings, and deriving generalized patterns for every verb or preposition. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification