Selecting keywords representative of a document
First Claim
Patent Images
1. A method of selecting keywords representative of a document from an ontology, said method comprising:
- computing, for each term in the ontology, a value representative of a frequency of occurrence of said term in the document; and
selecting a subset of terms of the ontology as keywords representative of the document based on said value.
1 Assignment
0 Petitions
Accused Products
Abstract
The method makes use of a given ontology to select keywords representative of a given document. The method finds all the terms in an ontology that occur in a document, and computes their frequency of occurrences in the document. The method then propagates these values from the leaves upwards to the root of the ontology during which it weights them. The method then selects a subset of terms of the ontology structure as keywords representative of the document based on these weights.
27 Citations
18 Claims
-
1. A method of selecting keywords representative of a document from an ontology, said method comprising:
-
computing, for each term in the ontology, a value representative of a frequency of occurrence of said term in the document; and
selecting a subset of terms of the ontology as keywords representative of the document based on said value.
-
-
2. A method of selecting keywords representative of a document from an ontology, wherein the ontology comprises terms arranged in a tree-like structure, said method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning said first value to corresponding vertices in the ontology;
propagating said first value from leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor; and
selecting k terms of the ontology as keywords representative of the document that have a largest k second value.
-
-
3. A method of selecting keywords representative of a document from an ontology, wherein the ontology comprises terms arranged in a tree-like structure having one or more root vertices, vertices and leaf vertices, said method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning first values to corresponding vertices in the ontology;
propagating said first values from the leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor;
generating a sub-structure of the ontology, wherein the sub-structure comprises a unique path for each term so as to disambiguates a context of the terms; and
performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero second values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document. - View Dependent Claims (4, 5, 6)
-
-
7. A method of selecting keywords representative of a document from an ontology, wherein the ontology comprises terms arranged in a tree-like structure having one or more root vertices, vertices and leaf vertices, said method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning frequency of occurrence values to corresponding vertices in the ontology; and
performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero first values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document. - View Dependent Claims (8, 9, 10)
-
-
11. A computer program product for selecting keywords representative of a document from an ontology, the computer program product comprising computer software recorded on a computer-readable medium for performing a method comprising:
-
computing, for each term in the ontology, a value representative of a frequency of occurrence of said term in the document; and
selecting a subset of terms of the ontology as keywords representative of the document based on said value.
-
-
12. A computer system for selecting keywords representative of a document from an ontology, the computer system comprising computer software recorded on a computer-readable medium for performing a method comprising:
-
computing, for each term in the ontology, a value representative of a frequency of occurrence of said term in the document; and
selecting a subset of terms of the ontology as keywords representative of the document based on said value.
-
-
13. A computer program product for selecting keywords representative of a document from an ontology, the computer program product comprising computer software recorded on a computer-readable medium for performing a method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning said first value to corresponding vertices in the ontology;
propagating said first value from leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor; and
selecting k terms of the ontology as keywords representative of the document that have a largest k second value.
-
-
14. A computer system for selecting keywords representative of a document from an ontology, the computer system comprising computer software recorded on a computer-readable medium for performing a method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning said first value to corresponding vertices in the ontology;
propagating said first value from leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor; and
selecting k terms of the ontology as keywords representative of the document that have a largest k second value.
-
-
15. A computer program product for selecting keywords representative of a document from an ontology, the computer program product comprising computer software recorded on a computer-readable medium for performing a method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning first values to corresponding vertices in the ontology;
propagating said first values from the leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor;
generating a sub-structure of the ontology, wherein the sub-structure comprises a unique path for each term so as to disambiguates a context of the terms; and
performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero second values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.
-
-
16. A computer system for selecting keywords representative of a document from an ontology, the computer system comprising computer software recorded on a computer-readable medium for performing a method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning first values to corresponding vertices in the ontology;
propagating said first values from the leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor;
generating a sub-structure of the ontology, wherein the sub-structure comprises a unique path for each term so as to disambiguates a context of the terms; and
performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero second values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.
-
-
17. A computer program product for selecting keywords representative of a document from an ontology, the computer program product comprising computer software recorded on a computer-readable medium for performing a method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning frequency of occurrence values to corresponding vertices in the ontology; and
performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero first values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.
-
-
18. A computer system for selecting keywords representative of a document from an ontology, the computer system comprising computer software recorded on a computer-readable medium for performing a method comprising:
-
computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
assigning frequency of occurrence values to corresponding vertices in the ontology; and
performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero first values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.
-
Specification