×

Method and apparatus for classifying documents within a class hierarchy creating term vector, term file and relevance ranking

  • US 6,185,550 B1
  • Filed: 06/13/1997
  • Issued: 02/06/2001
  • Est. Priority Date: 06/13/1997
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for classifying a document based on content within a class hierarchy, the method comprising:

  • initializing the class hierarchy, the class hierarchy having a root category node within a tree data structure, the root category node having a user-defined category name;

    displaying the class hierarchy;

    accepting a user-selected command for manipulating the class hierarchy;

    processing a category command in response to the user-selected command having a first predefined state, causing the class hierarchy to contain a plurality of category nodes, said processing the category command further comprising;

    storing a category name in one of the plurality of category nodes, wherein each of the plurality of category nodes corresponds to a unique directory;

    storing a NodeID within one of the plurality of category nodes, the NodeID defining the unique directory;

    storing a nodetype within one of the plurality of category nodes, the nodetype when having a predefined type allowing a new category node to be added to a selected one of the plurality of category nodes, and otherwise preventing the new category node from being added to the selected one of the plurality of category nodes;

    storing a ParentID within one of the plurality of category nodes, the ParentID indicating a NodeID of a parent category node;

    storing a LinkID within a first one of the plurality of category nodes, the LinkID indicating a NodeID of a second one of the plurality of category nodes when the nodetype is of a predefined type;

    creating a class hierarchy by providing a plurality of category nodes stored in a tree data structure within a memory, each of said plurality of category nodes having a category name corresponding to a unique directory and a set of defining terms;

    creating a plurality of terms files, each of said plurality of terms files corresponding to one of said plurality of category nodes and including a corresponding set of defining terms and one or more document fragments stored under said one of said plurality of category nodes, said set of defining terms including a term corresponding to one of said plurality of category nodes and said one or more document fragments including a reference to one or more documents and indexing information indicating contiguous multi-term portions of said documents to be extracted during indexing, said set of defining terms and said document fragments together providing a definition of files to be contained in said unique directory referenced by said one of said plurality of category nodes;

    creating one or more term vectors for each of said terms files, each of said term vectors containing a weight assigned to each of one or more common terms of the corresponding terms file according to frequency of occurrence in the corresponding terms file;

    creating a document vector for the document, said document vector containing a weight assigned to the terms of the document according to frequency of occurrence;

    providing a relevance ranking between said terms files and said document by comparing said document vector with said one or more term vectors; and

    storing said document within said document directory hierarchy at a location corresponding to a category node having a term vector which has a relevance ranking that matches a selected criteria.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×