Selecting keywords representative of a document
First Claim
1. A non-natural language processing (NLP) computer-implemented method of performing purely an ontology-based search of an electronic document by selecting keywords representative of said electronic document from an ontology, said non-NLP computer-implemented method comprising:
- retrieving, by a computer, said electronic document;
retrieving, by said computer, said ontology associated with said electronic document, said ontology comprising one of a directed acyclic graph (DAG), a collection of trees, and a collection of DAGs,wherein said ontology comprises one or more root vertices, a plurality of descendent vertices, and a plurality of descendent leaves, said descendent vertices and said descendent leaves corresponding to terms in said ontology;
scanning, by said computer, said electronic document and computing, for each term in said ontology, a first value representative of a frequency of occurrence of said each term in said electronic document;
assigning, by said computer, said first value for said each term to corresponding vertices in said ontology;
propagating, by said computer, said first value from leaf vertices of said ontology upwards to said one or more root vertices of said ontology by assigning to each of said descendent vertices a second value, wherein said second value equals a sum of said first value of said each of said descendent vertices plus second values of immediate descendents of said each of said descendent vertices multiplied by a propagation factor, wherein said propagation factor comprises a fractional weight-propagation;
inputting, to said computer, an integer value, k; and
traversing, by said computer, all said descendent leaves and all said descendent vertices of said ontology and selecting those k said terms of said ontology having the k highest said second values as k said keywords representative of said electronic document.
0 Assignments
0 Petitions
Accused Products
Abstract
The method makes use of a given ontology to select keywords representative of a given document. The method finds all the terms in an ontology that occur in a document, and computes their frequency of occurrences in the document. The method then propagates these values from the leaves upwards to the root of the ontology during which it weights them. The method then selects a subset of terms of the ontology structure as keywords representative of the document based on these weights.
28 Citations
3 Claims
-
1. A non-natural language processing (NLP) computer-implemented method of performing purely an ontology-based search of an electronic document by selecting keywords representative of said electronic document from an ontology, said non-NLP computer-implemented method comprising:
-
retrieving, by a computer, said electronic document; retrieving, by said computer, said ontology associated with said electronic document, said ontology comprising one of a directed acyclic graph (DAG), a collection of trees, and a collection of DAGs, wherein said ontology comprises one or more root vertices, a plurality of descendent vertices, and a plurality of descendent leaves, said descendent vertices and said descendent leaves corresponding to terms in said ontology; scanning, by said computer, said electronic document and computing, for each term in said ontology, a first value representative of a frequency of occurrence of said each term in said electronic document; assigning, by said computer, said first value for said each term to corresponding vertices in said ontology; propagating, by said computer, said first value from leaf vertices of said ontology upwards to said one or more root vertices of said ontology by assigning to each of said descendent vertices a second value, wherein said second value equals a sum of said first value of said each of said descendent vertices plus second values of immediate descendents of said each of said descendent vertices multiplied by a propagation factor, wherein said propagation factor comprises a fractional weight-propagation; inputting, to said computer, an integer value, k; and traversing, by said computer, all said descendent leaves and all said descendent vertices of said ontology and selecting those k said terms of said ontology having the k highest said second values as k said keywords representative of said electronic document.
-
-
2. A non-transitory computer program storage medium readable by a computer, tangibly embodying a program of instructions executable by said computer to perform a non-natural language processing (NLP) computer-implemented method of performing purely an ontology-based search of an electronic document by selecting keywords representative of said electronic document from an ontology, said non-NLP computer-implemented method comprising:
-
retrieving said electronic document; retrieving said ontology associated with said electronic document, said ontology comprising one of a directed acyclic graph (DAG), a collection of trees, and a collection of DAGs, wherein said ontology comprises one or more root vertices, a plurality of descendent vertices, and a plurality of descendent leaves, said descendent vertices and said descendent leaves corresponding to terms in said ontology; scanning said electronic document and computing, for each term in said ontology, a first value representative of a frequency of occurrence of said each term in said electronic document; assigning said first value for said each term to corresponding vertices in said ontology; propagating said first value from leaf vertices of said ontology upwards to said one or more root vertices of said ontology by assigning to each of said descendent vertices a second value, wherein said second value equals a sum of said first value of said each of said descendent vertices plus second values of immediate descendents of said each of said descendent vertices multiplied by a propagation factor, wherein said propagation factor comprises a fractional weight-propagation; inputting an integer value, k; and traversing all said descendent leaves and all said descendent vertices of said ontology and selecting those k said terms of said ontology having the k highest said second values as k said keywords representative of said electronic document.
-
-
3. A computer system for performing purely an ontology-based search of an electronic document using non-natural language processing (NLP) by selecting keywords representative of said electronic document from an ontology, said computer system comprising:
-
a memory that stores said electronic document in an electronic format; a computer processor configured to; retrieve said electronic document; retrieve said ontology associated with said electronic document, said ontology comprising one of a directed acyclic graph (DAG), a collection of trees, and a collection of DAGs, wherein said ontology comprises one or more root vertices, a plurality of descendent vertices, and a plurality of descendent leaves, said descendent vertices and said descendent leaves corresponding to terms in said ontology; scan said electronic document and computing, for each term in said ontology, a first value representative of a frequency of occurrence of said each term in said electronic document; assign said first value for said each term to corresponding vertices in said ontology; propagate said first value from leaf vertices of said ontology upwards to said one or more root vertices of said ontology by assigning to each of said descendent vertices a second value, wherein said second value equals a sum of said first value of said each of said descendent vertices plus second values of immediate descendents of said each of said descendent vertices multiplied by a propagation factor, wherein said propagation factor comprises a fractional weight-propagation; receive an inputted integer value, k; traverse all said descendent leaves and all said descendent vertices of said ontology and select those k said terms of said ontology having the k highest said second values as k said keywords representative of said electronic document; and a video display configured to display said keywords as an output of a purely ontology-based search of said electronic document to a user.
-
Specification