Selecting keywords representative of a document

US 20060074900A1
Filed: 09/30/2004
Published: 04/06/2006
Est. Priority Date: 09/30/2004
Status: Abandoned Application

First Claim

Patent Images

1. A method of selecting keywords representative of a document from an ontology, said method comprising:

computing, for each term in the ontology, a value representative of a frequency of occurrence of said term in the document; and

selecting a subset of terms of the ontology as keywords representative of the document based on said value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The method makes use of a given ontology to select keywords representative of a given document. The method finds all the terms in an ontology that occur in a document, and computes their frequency of occurrences in the document. The method then propagates these values from the leaves upwards to the root of the ontology during which it weights them. The method then selects a subset of terms of the ontology structure as keywords representative of the document based on these weights.

27 Citations

View as Search Results

18 Claims

1. A method of selecting keywords representative of a document from an ontology, said method comprising:
- computing, for each term in the ontology, a value representative of a frequency of occurrence of said term in the document; and
  
  selecting a subset of terms of the ontology as keywords representative of the document based on said value.

2. A method of selecting keywords representative of a document from an ontology, wherein the ontology comprises terms arranged in a tree-like structure, said method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning said first value to corresponding vertices in the ontology;
  
  propagating said first value from leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor; and
  
  selecting k terms of the ontology as keywords representative of the document that have a largest k second value.

3. A method of selecting keywords representative of a document from an ontology, wherein the ontology comprises terms arranged in a tree-like structure having one or more root vertices, vertices and leaf vertices, said method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning first values to corresponding vertices in the ontology;
  
  propagating said first values from the leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor;
  
  generating a sub-structure of the ontology, wherein the sub-structure comprises a unique path for each term so as to disambiguates a context of the terms; and
  
  performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero second values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.
- View Dependent Claims (4, 5, 6)
- - 4. The method of claim 3, wherein the optimization process comprises a greedy facility location process.
  - 5. The method of claim 3, wherein the optimization process comprises a greedy facility location process, wherein the vertices having non-zero second values are clients, the selected k vertices are facilities serving the clients, the weighted distance between a client and a facility is a number of edges of the tree-like structure between the client and the facility multiplied by a sum of the second values of the vertices in a subtree of the facility, wherein facilities can serve only descendent clients and clients can be served by multiple facilities.
  - 6. The method of claim 3, wherein the optimization process comprises an optimal dynamic programming based process.

7. A method of selecting keywords representative of a document from an ontology, wherein the ontology comprises terms arranged in a tree-like structure having one or more root vertices, vertices and leaf vertices, said method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning frequency of occurrence values to corresponding vertices in the ontology; and
  
  performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero first values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.
- View Dependent Claims (8, 9, 10)
- - 8. The method of claim 7, wherein the optimization process comprises a greedy facility location process.
  - 9. The method of claim 7, wherein the optimization process comprises a greedy facility location process, wherein the vertices having non-zero second values are clients, the selected k vertices are facilities serving the clients, the weighted distance between a client and a facility is a number of edges of the tree-like structure between the client and the facility multiplied by a sum of the second values of the vertices in a subtree of the facility, wherein facilities can serve only descendent clients and clients can be served by multiple facilities.
  - 10. The method of claim 7, wherein the optimization process comprises an optimal dynamic programming based process.

11. A computer program product for selecting keywords representative of a document from an ontology, the computer program product comprising computer software recorded on a computer-readable medium for performing a method comprising:
- computing, for each term in the ontology, a value representative of a frequency of occurrence of said term in the document; and
  
  selecting a subset of terms of the ontology as keywords representative of the document based on said value.

12. A computer system for selecting keywords representative of a document from an ontology, the computer system comprising computer software recorded on a computer-readable medium for performing a method comprising:
- computing, for each term in the ontology, a value representative of a frequency of occurrence of said term in the document; and
  
  selecting a subset of terms of the ontology as keywords representative of the document based on said value.

13. A computer program product for selecting keywords representative of a document from an ontology, the computer program product comprising computer software recorded on a computer-readable medium for performing a method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning said first value to corresponding vertices in the ontology;
  
  propagating said first value from leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor; and
  
  selecting k terms of the ontology as keywords representative of the document that have a largest k second value.

14. A computer system for selecting keywords representative of a document from an ontology, the computer system comprising computer software recorded on a computer-readable medium for performing a method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning said first value to corresponding vertices in the ontology;
  
  propagating said first value from leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor; and
  
  selecting k terms of the ontology as keywords representative of the document that have a largest k second value.

15. A computer program product for selecting keywords representative of a document from an ontology, the computer program product comprising computer software recorded on a computer-readable medium for performing a method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning first values to corresponding vertices in the ontology;
  
  propagating said first values from the leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor;
  
  generating a sub-structure of the ontology, wherein the sub-structure comprises a unique path for each term so as to disambiguates a context of the terms; and
  
  performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero second values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.

16. A computer system for selecting keywords representative of a document from an ontology, the computer system comprising computer software recorded on a computer-readable medium for performing a method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning first values to corresponding vertices in the ontology;
  
  propagating said first values from the leaf vertices of the ontology upwards to the one or more root vertices of the ontology by assigning to each vertex a second value, wherein said second value equals a sum of said first value of the vertex plus the second values of immediate descendent vertices of said vertex each multiplied by a corresponding propagation factor;
  
  generating a sub-structure of the ontology, wherein the sub-structure comprises a unique path for each term so as to disambiguates a context of the terms; and
  
  performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero second values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.

17. A computer program product for selecting keywords representative of a document from an ontology, the computer program product comprising computer software recorded on a computer-readable medium for performing a method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning frequency of occurrence values to corresponding vertices in the ontology; and
  
  performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero first values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.

18. A computer system for selecting keywords representative of a document from an ontology, the computer system comprising computer software recorded on a computer-readable medium for performing a method comprising:
- computing, for each term in the ontology, a first value representative of a frequency of occurrence of said term in the document;
  
  assigning frequency of occurrence values to corresponding vertices in the ontology; and
  
  performing an optimization process, wherein k vertices are selected such that a sum of weighted distances of all the vertices having non-zero first values to associated selected k vertices is minimized, and wherein k terms associated with the selected k vertices are selected as keywords representative of the document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Nanavati, Amit A., Dutta, Chinmoy

Application Number

US10/954,899
Publication Number

US 20060074900A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/332   Query formulation

G06F 16/3331   Query processing

G06F 16/367   Ontology

Selecting keywords representative of a document

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

27 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Selecting keywords representative of a document

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

27 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others