×

Dynamic taxonomy process for browsing and retrieving information in large heterogeneous data bases

  • US 6,763,349 B1
  • Filed: 06/18/2001
  • Issued: 07/13/2004
  • Est. Priority Date: 12/16/1998
  • Status: Expired due to Fees
First Claim
Patent Images

1. A process for retrieving information on large heterogeneous databases, wherein information retrieval is performed through visual queries on dynamic taxonomies, said dynamic taxonomies being an organization of concepts that ranges from a most general concept to a most specific concept, said concepts and their generalization or specialization relationships being an intension, documents in said databases being able to be classified under different concepts, said documents and their classification being called an extension, said process comprising:

  • initially displaying a complete taxonomy for said retrieval;

    selecting subsets of interest of said complete taxonomy in order to refine said retrieval, said subsets of interest being specified by selecting taxonomy concepts and combining them through boolean operations or being specified through querying methods, which retrieve classified documents according to different selection criteria, including words contained in a document;

    displaying a reduced taxonomy for said selected set, said reduced taxonomy being derived from the original taxonomy by pruning the concepts under which no document in the selected subset of interest is classified; and

    iteratively repeating said steps of selecting subsets and of displaying a reduced taxonomy to further refine said retrieval, wherein;

    said process is performed on documents of any type and format;

    said intension is organized as a hierarchy of concepts or as a directed acyclic graph of concepts, thereby allowing a concept to have multiple fathers;

    said process dynamically reconstructs all relationships among concepts based on the classification without requiring, in the intension, concept relationships in addition to generalization or specialization, a relationship between any two concepts existing if and only if at least one document is classified (1) under a first concept or any descendants of the first concept, and (2) under a second concept, or any descendants of the second concept;

    documents in said classification are classified under a concept at any level in said intension, including concepts with no sons;

    said taxonomy supports operations for concept insertion, deletion, and modification;

    said taxonomy supports operations for document insertion and classification, deletion, and reclassification;

    documents in said classification are classified manually, programmatically, or automatically;

    said process allows retrieval through different languages on a same database, while maintaining the same classification for all said languages;

    said classification is explicitly stored as a set or list of documents for each concept or implicitly stored in external structures;

    said explicitly stored classification includes, for each concept in the intension, deep classification, which records all documents classified under the concept and under any of its descendants, and a shallow classification, which records all the documents classified directly under the concepts, said shallow classification being only required if documents can be classified under non-terminal concepts and is equivalent, by definition, to the deep classification for terminal concepts;

    said deep and shallow classifications are physically stored as compressed or uncompressed bit vectors, or as compressed or uncompressed inverted lists, or as Bloom'"'"'s filters, or in a relational database system;

    said process accounts for an age of documents either explicitly or implicitly, or by a lazy reclassification of a minimum number of concepts;

    said intension and classification are used either for querying/browsing the database or to dynamically inform a user when documents of interest are added or modified in the database; and

    said step of displaying a reduced taxonomy either reports only the concepts belonging to the reduced taxonomy or, for each such concept, also reports how many documents in the interest set are classified under the concept.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×