×

Single pass workload directed clustering of XML documents

  • US 7,512,615 B2
  • Filed: 11/07/2003
  • Issued: 03/31/2009
  • Est. Priority Date: 11/07/2003
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system for clustering XML documents, the system comprising:

  • an arrangement for parsing an XML document by node;

    an arrangement for initializing at least one parsed node;

    an arrangement for partitioning at least one parsed node; and

    an arrangement for processing at least one parsed node;

    wherein the system removes XML text data of a node prior to the entire document being clustered by detecting a ready cluster and removing the ready cluster from an intermediate partition upon assignment to a page, wherein said ready cluster is a cluster which carries with it corresponding XML text that would be part of a final partition while avoiding the need to keep the entire XML document in memory until the final partition is computed;

    wherein the system utilizes a processor to cluster XML documents;

    wherein the system partitions a weight range into equal size weight intervals and associates only one partition for each weight interval;

    wherein given a predetermined memory limit M for managing memory usage in selecting optimal partitions, when memory usage reaches a high water mark, a corrective action is triggered to select a ready sub-partition, and when memory usage reaches a low water mark operation resumes;

    wherein said ready sub-partition is a highest value partition associated with a root of a processed subtree which is a subset of a computed best partition for a whole clustering tree; and

    wherein the XML clustering system processes the XML document, partitions the XML document into clusters, and assigns the clusters to pages all within a single pass.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×