Method and apparatus for classifying documents within a class hierarchy creating term vector, term file and relevance ranking

US 6,185,550 B1
Filed: 06/13/1997
Issued: 02/06/2001
Est. Priority Date: 06/13/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for classifying a document based on content within a class hierarchy, the method comprising:

initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;

displaying the class hierarchy;

accepting a user-selected command for manipulating the class hierarchy;

processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing the category command further comprising;

storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;

storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;

storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;

storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;

storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;

creating a class hierarchy by providing a plurality of category nodes stored in a tree data structure within a memory, each of said plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms;

creating a plurality of terms files, each of said plurality of terms files corresponding to one of said plurality of category nodes and including a corresponding set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;

creating one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms file according to frequency of occurrence in the corresponding terms file;

creating a document vector for the document, said document vector containing a weight assigned to the terms of the document according to frequency of occurrence;

providing a relevance ranking between said terms files and said document by comparing said document vector with said one or more term vectors; and

storing said document within said document directory hierarchy at a location corresponding to a category node having a term vector which has a relevance ranking that matches a selected criteria.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for classifying a document based on content within a class hierarchy. The class hierarchy comprises a plurality of category nodes stored within a tree data structure. Each of the plurality of category nodes includes a category name corresponding to a unique directory and a category definition comprising a set of defining terms. The class hierarchy is searched to determine appropriate categories for classification of the document. The document is then stored in directories corresponding to the categories selected for classification. If no categories are produced by the search, a system administrator is notified of the unsuccessful search.

356 Citations

75 Claims

1. A method for classifying a document based on content within a class hierarchy, the method comprising:
- initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;
  
  displaying the class hierarchy;
  
  accepting a user-selected command for manipulating the class hierarchy;
  
  processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing the category command further comprising;
  
  storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;
  
  storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;
  
  storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;
  
  storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;
  
  storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;
  
  creating a class hierarchy by providing a plurality of category nodes stored in a tree data structure within a memory, each of said plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms;
  
  creating a plurality of terms files, each of said plurality of terms files corresponding to one of said plurality of category nodes and including a corresponding set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;
  
  creating one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms file according to frequency of occurrence in the corresponding terms file;
  
  creating a document vector for the document, said document vector containing a weight assigned to the terms of the document according to frequency of occurrence;
  
  providing a relevance ranking between said terms files and said document by comparing said document vector with said one or more term vectors; and
  
  storing said document within said document directory hierarchy at a location corresponding to a category node having a term vector which has a relevance ranking that matches a selected criteria.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further including a step of building a path-to-name translation listing containing a directory path and category name for at least one of the plurality of category nodes.
  - 3. The method of claim 1, wherein the step of storing further includes a step of defining the location according to a directory path within a path-to-name translation listing which corresponds to the category node having a term vector which has a relevance ranking that matches a selected criteria.
  - 4. The method of claim 1, wherein the step of providing at least one relevance ranking includes using the Fulcrum™
    - indexing application.
  - 5. The method of claim 1, wherein each of said contiguous multi-term portions is one sentence or longer.
  - 6. The method of claim 1, wherein each of said contiguous multi-term portions is one paragraph or longer.
  - 7. The method of claim 1, wherein each of said contiguous multi-term portions is 25 words or longer.

8. A method for classifying a document based on content within a class hierarchy, the class hierarchy comprising a plurality of category nodes stored in a tree data structure within a memory, each of the plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms, the method comprising:
- initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;
  
  displaying the class hierarchy;
  
  accepting a user-selected command for manipulating the class hierarchy;
  
  processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing the category command further comprising;
  
  storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;
  
  storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;
  
  storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;
  
  storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;
  
  storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;
  
  creating a plurality of terms files, each of the plurality of terms files corresponding to one of the plurality of category nodes and including a corresponding set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;
  
  building a path-to-name translation listing containing each category name and unique directory pair for each of the plurality of category nodes;
  
  indexing the class hierarchy using Fulcrum to create an index file containing term vectors corresponding to the plurality of terms files;
  
  classifying at least one document within a document directory hierarchy using said term vectors; and
  
  indexing the document directory hierarchy using Fulcrum.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 9. The method according to claim 8, wherein the step of creating is performed for each one of the plurality of category nodes defined as a lead node.
  - 10. The method according to claim 8, the step of creating further comprising the following steps:
11. The method according to claim 8, the step of classifying further comprising the following steps:
- creating a document vector for a document to be classified within the class hierarchy;
  
  searching the class hierarchy for the term vectors which are relevant to the document to determine appropriate categorization of the document, the step of searching if successful returning a list of matching category names, and otherwise notifying a system administrator that the step of searching was unsuccessful; and
  
  storing the document in directories corresponding to the matching category names if the step of searching is successful.
12. The method according to claim 11, wherein the step of searching comprises:
- comparing the document vector to the term vectors using Fulcrum, the step of comparing returning a list of matching categories within the class hierarchy according to user-defined criteria, the list of matching categories corresponding to the at least one of the plurality of category nodes.
13. The method according to claim 11, the step of storing further comprising the following steps:
- retrieving a directory path corresponding to one of the matching category names utilizing the path-to-name translation listing;
  
  creating directories within the document directory hierarchy corresponding to the directory path if not already existing within the document directory hierarchy; and
  
  adding the document to a leaf directory within the document directory hierarchy corresponding to the retrieved directory path.
14. The method according to claim 13, wherein the step of adding comprises linking the document to the directory.
15. The method according to claim 11, the step of storing further comprising the following steps:
- creating a directory within the document directory hierarchy corresponding to each of the unique directories within the class hierarchy;
  
  retrieving a directory path corresponding to one of the matching category names utilizing the path-to-name translation listing; and
  
  adding the document to a directory within the document directory hierarchy corresponding to the retrieved directory path.
16. The method according to claim 15, wherein the step of adding comprises linking the document to the directory.
17. The method of claim 8, wherein each of said contiguous multi-term portions is one sentence or longer.
18. The method of claim 8, wherein each of said contiguous multi-term portions is one paragraph or longer.
19. The method of claim 8, wherein each of said contiguous multi-term portions is 25 words or longer.

20. A computer system for classifying a document comprising:
- a processor; and
  
  a memory having stored therein the following;
  
  means for initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;
  
  means for displaying the class hierarchy;
  
  accepting a user-selected command for manipulating the class hierarchy;
  
  means for processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing means further comprising;
  
  means for storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;
  
  means for storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;
  
  means for storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;
  
  means for storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;
  
  means for storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;
  
  means for creating a class hierarchy having a plurality of category nodes stored in a tree data structure within a memory, each of the plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms;
  
  means for creating a plurality of terms files, each of the plurality of terms files corresponding to one of the plurality of category nodes and including a corresponding set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;
  
  means for creating one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms file according to frequency of occurrence in the corresponding terms file;
  
  means for building a path-to-name translation listing containing a directory path and category name for each of the plurality of category nodes;
  
  an indexing means for providing a relevance ranking between the terms files and the document by comparing the document vector with the at least one term vector; and
  
  means for storing the document within the document directory hierarchy at a location corresponding to a category node having a term vector which has a relevance ranking that matches a selected criteria.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The computer system of claim 20, further including means for building a path-to-name translation listing containing a directory path and category name for at least one of the plurality of category nodes.
  - 22. The computer system of claim 20, wherein the means for storing further includes means for defining the location according to a directory path within a path-to-name translation listing which corresponds to the category node having a term file which has a relevance ranking that matches a selected criteria.
  - 23. The computer system of claim 20, wherein each of said contiguous multi-term portions is one sentence or longer.
  - 24. The computer system of claim 20, wherein each of said contiguous multi-term portions is one paragraph or longer.
  - 25. The computer system of claim 20, wherein each of said contiguous multi-term portions is 25 words or longer.

26. A computer system for classifying a document based on content within a class hierarchy, the class hierarchy comprising a plurality of category nodes stored in a tree data structure within a memory, each of the plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms, the computer system comprising:
- a processor; and
  
  a memory having stored therein the following;
  
  means for initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;
  
  means for displaying the class hierarchy;
  
  means for accepting a user-selected command for manipulating the class hierarchy;
  
  means for processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing means further comprising;
  
  means for storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;
  
  means for storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;
  
  means for storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;
  
  means for storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;
  
  means for storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;
  
  means for creating a plurality of terms files, each of the plurality of terms files corresponding to one of the plurality of category nodes and including a corresponding set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;
  
  means for making one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms file according to the frequency of occurrence in the corresponding terms file;
  
  means for making a document vector for the document, said document vector containing a weight assigned to the terms of the document according to frequency of occurrence, means for building a path-to-name translation listing containing each category name and unique directory pair for each of the plurality of category nodes;
  
  means for indexing the class hierarchy using Fulcrum to create an index file containing term file vectors corresponding to the plurality of terms files;
  
  means for classifying at least one document within a document directory hierarchy using the term file vectors; and
  
  means for indexing the document directory hierarchy using Fulcrum.
- View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 27. The computer system according to claim 26, wherein the means for creating is performed for each one of the plurality of category nodes being a leaf node.
  - 28. The computer system of claim 27, wherein each of said contiguous multi-term portions is 25 words or longer.
  - 29. The computer system according to claim 26, the means for creating further comprising:
30. The computer system according to claim 26, the means for classifying further comprising:
- means for creating a document vector for a document to be classified within the class hierarchy;
  
  means for searching the class hierarchy for the terms file vectors which are relevant to the document to determine appropriate categorization of the document, the step of searching if successful returning a list of matching category names, and otherwise notifying a system administrator that the step of searching was unsuccessful; and
  
  means for storing the document in directories corresponding to the matching category names if the step of searching is successful.
31. The computer system according to claim 30, wherein the means for searching comprises:
- means for comparing the document vector to the term vectors using Fulcrum, the step of comparing returning a list of matching categories within the class hierarchy according to the user-defined criteria, the list of matching categories corresponding to the at least one of the plurality of category nodes.
32. The computer system according to claim 30, the means for storing further comprising:
- means for retrieving a directory path corresponding to one of the matching category names utilizing the path-to-name translation listing;
  
  means for creating directories within the document directory hierarchy corresponding to the directory path if not already existing within the document directory hierarchy; and
  
  means for adding the document to a leaf directory within the document directory hierarchy corresponding to the retrieved directory path.
33. The computer system according to claim 32, wherein the means for adding comprises linking the document to the directory.
34. The computer system according to claim 30, the means for storing further comprising:
- means for creating a directory within the document directory hierarchy corresponding to each of the unique directories within the class hierarchy;
  
  means for retrieving a directory path corresponding to one of the matching category names utilizing the path-to-name translation listing; and
  
  means for adding the document to a directory within the document directory hierarchy corresponding to the retrieved directory path.
35. The computer system according to claim 34, wherein the means for adding comprises linking the document to the directory.
36. The computer system of claim 26, wherein each of said contiguous multi-term portions is one sentence or longer.
37. The computer system of claim 26, wherein each of said contiguous multi-term portions is one paragraph or longer.

38. An article of manufacture, comprising:
- a computer usable medium having a computer readable program code means embodied therein for classifying a document based on content within a class hierarchy, the computer readable program code means in the article of manufacture comprising;
  
  computer-readable program means for initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;
  
  computer-readable program means for displaying the class hierarchy;
  
  computer-readable program means for accepting a user-selected command for manipulating the class hierarchy;
  
  computer-readable program means for processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing the category command further comprising;
  
  computer-readable program means for storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;
  
  computer-readable program means for storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;
  
  computer-readable program means for storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;
  
  computer-readable program means for storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;
  
  computer-readable program means for storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;
  
  computer-readable program means for creating the class hierarchy by providing a plurality of category nodes stored in a tree data structure within a memory, each of the plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;
  
  computer-readable program means for creating a plurality of terms files, each of the plurality of terms files corresponding to one of the plurality of category nodes and including a corresponding set of defining terms;
  
  computer readable program means for making one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms file according to frequency of occurrence in the corresponding terms file;
  
  computer readable program means for creating a document vector for the document, said document vector containing a weight assigned to the terms of the document according to frequency of occurrence, computer-readable program means for providing a relevance ranking between the terms files and the document by comparing the document vector with said one or more term vectors; and
  
  computer-readable program means for storing the document within the document directory hierarchy at a location corresponding to a category node having a term vector which has a relevance ranking that matches a selected criteria.
- View Dependent Claims (39, 40, 41, 42, 43, 44)
- - 39. The computer program product of claim 38, further including computer-readable program means for building a path-to-name translation listing containing a directory path and category name for at least one of the plurality of category nodes.
  - 40. The computer program product of claim 38, wherein computer-readable program means for providing a relevance ranking includes:
41. The computer program product of claim 38, wherein the computer-readable program means for storing further includes computer-readable program means for defining the location according to a directory path within a path-to-name translation listing which corresponds to the category node having a term vector which has a relevance ranking that matches a selected criteria.
42. The article of manufacture of claim 38, wherein each of said contiguous multi-term portions is one sentence or longer.
43. The article of manufacture of claim 38, wherein each of said contiguous multi-term portions is one paragraph or longer.
44. The article of manufacture of claim 38, wherein each of said contiguous multi-term portions is 25 words or longer.

45. A computer-readable medium recording software, the software disposed on a computer to perform a method for classifying a document based on content within a class hierarchy, the class hierarchy comprising a plurality of category nodes stored in a tree data structure within a memory, each of the plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms, the method comprising:
- initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;
  
  displaying the class hierarchy;
  
  accepting a user-selected command for manipulating the class hierarchy;
  
  processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing the category command further comprising;
  
  storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;
  
  storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;
  
  storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;
  
  storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;
  
  storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;
  
  creating a plurality of terms files, each of the plurality of terms files corresponding to one of the plurality of category nodes and including a corresponding set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;
  
  making one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms file according to frequency of occurrence in the corresponding terms file;
  
  building a path-to-name translation listing containing each category name and unique directory pair for each of the plurality of category nodes;
  
  indexing the class hierarchy using Fulcrum to create an index file containing term file vectors corresponding to the plurality of terms files;
  
  classifying at least one document within a document directory hierarchy using terms vectors; and
  
  indexing the document directory hierarchy using Fulcrum.
- View Dependent Claims (46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56)
- - 46. The computer-readable medium according to claim 45, wherein the step of creating is performed for each one of the plurality of category nodes being a leaf node.
  - 47. The computer-readable medium according to claim 45, the step of creating further comprising the following steps:
48. The computer-readable medium according to claim 45, the step of classifying further comprising the following steps:
- creating a document vector for a document to be classified within the class hierarchy;
  
  searching the class hierarchy for the terms vectors which are relevant to the document to determine appropriate categorization of the document, the step of searching if successful returning a list of matching category names, and otherwise notifying a system administrator that the step of searching was unsuccessful; and
  
  storing the document in directories corresponding to the matching category names if the step of searching is successful.
49. The computer readable medium according to claim 48, wherein the step of searching comprises:
- comparing the document vector to the term vectors using Fulcrum, the step of comparing returning a list of matching categories within the class hierarchy according to user-defined criteria, the list of matching categories corresponding to the at least one of the plurality of category nodes.
50. The computer-readable medium according to claim 48, the step of storing further comprising the following steps:
- retrieving a directory path corresponding to one of the matching category names utilizing the path-to-name translation listing;
  
  creating directories within the document directory hierarchy corresponding to the directory path if not already existing within the document directory hierarchy; and
  
  adding the document to a leaf directory within the document directory hierarchy corresponding to the retrieved directory path.
51. The computer-readable medium according to claim 50, wherein the step of adding comprises linking the document to the directory.
52. The computer-readable medium according to claim 48, the step of storing further comprising the following steps:
- creating a directory within the document directory hierarchy corresponding to each of the unique directories within the class hierarchy;
  
  retrieving a directory path corresponding to one of the matching category names utilizing the path-to-name translation listing; and
  
  adding the document to a directory within the document directory hierarchy corresponding to the retrieved directory path.
53. The computer-readable medium according to claim 52, wherein the step of adding comprises linking the document to the directory.
54. The computer-readable medium recording software of claim 45, wherein each of said contiguous multi-term portions is one sentence or longer.
55. The computer-readable medium recording software of claim 45, wherein each of said contiguous multi-term portions is one paragraph or longer.
56. The computer-readable medium recording software of claim 45, wherein each of said contiguous multi-term portions is 48 words or longer.

57. A computer data signal embodied in a carrier wave and representing sequences of instructions which, when executed by a processor, cause said processor to classify a document based on content within a class hierarchy, by performing the following steps:
- initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;
  
  displaying the class hierarchy;
  
  accepting a user-selected command for manipulating the class hierarchy;
  
  processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing the category command further comprising;
  
  storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;
  
  storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;
  
  storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;
  
  storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;
  
  storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;
  
  creating the class hierarchy by providing a plurality of category nodes stored in a tree data structure within a memory, each of the plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;
  
  creating a plurality of terms files, each of the plurality of terms files corresponding to one of the plurality of category nodes and including a corresponding set of defining terms;
  
  making one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms files according to frequency of occurrence in the corresponding terms file;
  
  providing a relevance ranking between the term vectors and the document by comparing said document vector with the at least one term vector; and
  
  storing the document within the document directory hierarchy at a location corresponding to a category node having a term vector which has a relevance ranking that matches a selected criteria.
- View Dependent Claims (58, 59, 60, 61, 62, 63)
- - 58. The computer data signal of claim 57, further including computer-readable program means for building a path-to-name translation listing containing a directory path and category name for at least one of the plurality of category nodes.
  - 59. The computer data signal of claim 57, wherein the computer-readable program means for providing a relevance ranking includes:
60. The computer data signal of claim 57, wherein the step of storing further includes a step of defining the location according to a directory path within a path-to-name translation listing which corresponds to the category node having a term vector which has a relevance ranking that matches a selected criteria.
61. The computer data signal of claim 57, wherein each of said contiguous multi-term portions is one sentence or longer.
62. The computer data signal of claim 57, wherein each of said contiguous multi-term portions is one paragraph or longer.
63. The computer data signal of claim 57, wherein each of said contiguous multi-term portions is 25 words or longer.

64. A computer data signal embodied in a carrier wave comprising:
- means for initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;
  
  means for displaying the class hierarchy;
  
  means for accepting a user-selected command for manipulating the class hierarchy;
  
  means for processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing means further comprising;
  
  means for storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;
  
  means for storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;
  
  means for storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;
  
  means for storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;
  
  means for storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;
  
  means for creating a class hierarchy by providing a plurality of category nodes in a tree data structure within a memory, each of the plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms;
  
  means for creating a plurality of terms files, each of the plurality of terms files corresponding to one of the plurality of category nodes and including a corresponding set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;
  
  means for creating one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms files according to frequency of occurrence in the corresponding terms file;
  
  means for creating a path-to-name translation listing containing each category name and unique directory pair for each of the plurality of category nodes;
  
  means for indexing the class hierarchy using Fulcrum to create an index file containing term vectors corresponding to the plurality of term vectors;
  
  means for classifying at least one document within a document directory hierarchy using the term vectors; and
  
  means for indexing the document directory hierarchy using Fulcrum.
- View Dependent Claims (65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75)
- - 65. The computer data signal according to claim 64, wherein the step of creating is performed for each one of the plurality of category nodes being a leaf node.
  - 66. The computer data signal according to claim 64, the step of creating further comprising the following steps:
67. The computer data signal according to claim 64, the step of classifying further comprising the following steps:
- creating a document vector for a document to be classified within the class hierarchy;
  
  searching the class hierarchy for the terms vectors which are relevant to the document to determine appropriate categorization of the document, the step of searching if successful returning a list of matching category names, and otherwise notifying a system administrator that the step of searching was unsuccessful; and
  
  storing the document in directories corresponding to the matching category names if the step of searching is successful.
68. The computer data signal according to claim 67, wherein the step of searching comprises:
- comparing the document vector to the term vectors using Fulcrum, the step of comparing returning a list of matching categories within the class hierarchy according to user-defined criteria, the list of matching categories corresponding to the at least one of the plurality of category nodes.
69. The computer data signal according to claim 67, the step of storing further comprising the following steps:
- retrieving a directory path corresponding to one of the matching category names utilizing the path-to-name translation listing;
  
  creating directories within the document directory hierarchy corresponding to the directory path if not already existing within the document directory hierarchy; and
  
  adding the document to a leaf directory within the document directory hierarchy corresponding to the retrieved directory path.
70. The computer data signal according to claim 69, wherein the step of adding comprises linking the document to the directory.
71. The computer data signal according to claim 67, the step of storing further comprising the following steps:
- creating a directory within the document directory hierarchy corresponding to each of the unique directories within the class hierarchy;
  
  retrieving a directory path corresponding to one of the matching category names utilizing the path-to-name translation listing; and
  
  adding the document to a directory within the document directory hierarchy corresponding to the retrieved directory path.
72. The computer data signal according to claim 71, wherein the step of adding comprises linking the document to the directory.
73. The computer data signal of claim 64, wherein each of said contiguous multi-term portions is one sentence or longer.
74. The computer data signal of claim 64, wherein each of said contiguous multi-term portions is one paragraph or longer.
75. The computer data signal of claim 64, wherein each of said contiguous multi-term portions is 25 words or longer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hanger Solutions, LLC (IP Investments Group LLC)
Original Assignee
Sun Microsystems Incorporated (Oracle Corporation)
Inventors
Snow, William A., Mocker, Joseph D.
Primary Examiner(s)
Breene, John
Assistant Examiner(s)
CHANNAVAJJALA, SRIRAMA T

Application Number

US08/874,783
Time in Patent Office

1,334 Days
Field of Search

707/1-10, 707/100-104, 707/500, 707/514, 707/532, 707/512, 707/533, 707/536, 707/501, 707/515, 707/200-206, 707/907, 1/1, 345/440, 358/462, 358/403, 395/701, 704/4-10
US Class Current

1/1
CPC Class Codes

G06F 16/30   of unstructured textual dat...

G06F 16/3347   using vector based model

G06F 16/353   into predefined classes

G06F 16/355   Class or cluster creation o...

Y10S 707/99931   Database or file accessing

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99943   Generating database or data...

Y10S 707/99944   Object-oriented database st...

Method and apparatus for classifying documents within a class hierarchy creating term vector, term file and relevance ranking

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

356 Citations

75 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for classifying documents within a class hierarchy creating term vector, term file and relevance ranking

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

356 Citations

75 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links