×

Majority schema in semi-structured data

  • US 6,604,099 B1
  • Filed: 07/27/2000
  • Issued: 08/05/2003
  • Est. Priority Date: 03/20/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for discovering a majority schema from a set of related documents that share similar schemas, comprising:

  • extracting a set of schematic structures of the documents;

    converting the schematic structures to sets of label paths;

    discovering a set of frequent label paths from amongst the sets of label paths;

    unifying similar schematic structures of the documents based on the set of frequent label paths that represents a majority schema;

    expressing the majority schema in a predefined language;

    wherein extracting schematic structures of the documents includes representing the schematic structures as sets of ordered trees with nodes labeled by a set of keywords;

    wherein extracting schematic structures includes acquiring XML documents;

    wherein extracting schematic structures includes placing title keywords and content keywords in ordered trees according to a specified depth;

    wherein discovering a set of frequent label paths includes introducing a constraint mechanism to specify a restriction on the schematic structures in the majority schema, to help reduce noise and to improve efficiency; and

    wherein discovering a set of frequent label paths further includes discovering a set of frequent label paths satisfying the constraint mechanism.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×