ONTOLOGY EXPANSION USING ENTITY-ASSOCIATION RULES AND ABSTRACT RELATIONS
First Claim
1. A method for refining an initial ontology via processing of communication data, wherein the initial ontology is a structural representation of language elements comprising a set of entities, a set of terms, a set of term-entity associations, a set of entity-association rules, a set of abstract relations, and a set of relation instances, the method comprising:
- providing the initial ontology;
providing a training set of communication data;
processing the training set of communication data to extract significant phrases and significant phrase pairs from within the training set of communication data,creating new abstract relations based on the significant phrase pairs;
creating new relation instances that correspond to the significant term pairs;
storing the significant phrases as ontology terms ontology and associating an entity for the added terms;
storing the new relation instances and new abstract relations to the initial ontology.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for expanding an initial ontology via processing of communication data, wherein the initial ontology is a structural representation of language elements comprising a set of entities, a set of terms, a set of term-entity associations, a set of entity-association rules, a set of abstract relations, and a set of relation instances. A method for extracting a set of significant phrases and a set of significant phrase co-occurrences from an input set of documents further includes utilizing the terms to identify relations within the training set of communication data, wherein a relation is a pair of terms that appear in proximity to one another.
39 Citations
2 Claims
-
1. A method for refining an initial ontology via processing of communication data, wherein the initial ontology is a structural representation of language elements comprising a set of entities, a set of terms, a set of term-entity associations, a set of entity-association rules, a set of abstract relations, and a set of relation instances, the method comprising:
-
providing the initial ontology; providing a training set of communication data; processing the training set of communication data to extract significant phrases and significant phrase pairs from within the training set of communication data, creating new abstract relations based on the significant phrase pairs; creating new relation instances that correspond to the significant term pairs; storing the significant phrases as ontology terms ontology and associating an entity for the added terms; storing the new relation instances and new abstract relations to the initial ontology.
-
-
2. A method for extracting a set of significant phrases and a set of significant phrase co-occurrences from an input set of documents, the method comprising:
-
providing a generic language model; providing the set of documents; extracting a set of significant phrases by at least; generating a source-specific language model by subdividing each document into meaning units; accumulating phrase candidates by creating a set of candidates where each candidate is an n-gram and integrating over the n-grams to compute a prominence score for each n-gram and a stickiness core; and filtering the candidate phrases by calculating a frequency for each of the candidate phrases and calculating an overall phrase score for each of the candidate phrases; and extracting significant phase co-occurrences by at least; iterating over the meaning units and locating the occurrences of individual phrases; counting the number of co-occurrences of pairs of phrases in the same meaning unit; based on the count, computing a probability of a phrase and a probability of the co-occurrence of a pair of phrases; calculating a log-likelihood of the co-occurrence using the probability of the phrase and the probability of the co-occurrence of a pair of phrases identifying a significant co-occurrence of the pair of phrases if the log-likelihood is over a predetermined log-likelihood threshold.
-
Specification