×

System and method for discovering schematic structure in hypertext documents

  • US 6,738,767 B1
  • Filed: 03/20/2000
  • Issued: 05/18/2004
  • Est. Priority Date: 03/20/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of reformatting at least one mark-up language document comprising the steps:

  • discovering a schematic structure from said document and creating a reformatted document with a schema corresponding to said schematic structure, said steps of discovering and reformatting comprising;

    receiving a document having a plurality of semantic nodes and a plurality of formatting nodes;

    tokenizing said plurality of said semantic and formatting nodes;

    identifying a set of keyword nodes, said set of keyword nodes comprising at least one of said plurality of semantic nodes which match an element in a set of keywords and corresponding synonyms;

    labeling each element of said set of keyword nodes with a label corresponding to a matching keyword;

    arranging said plurality of said semantic and formatting nodes so that each node representing an object of a similar level of abstraction are arranged as sibling nodes;

    identifying a non-keyword set of nodes, said non-keyword set comprising all of said plurality of semantic and formatting nodes not an element of said set of keyword nodes;

    determining for each node in said non-keyword set a corresponding first child node which is also an element of said set of keyword nodes, said corresponding first child node having a keyword label, and reformatting said document by generating a document with a schema having a keyword structure based upon re-labeling each node in said non-keyword set with a keyword label of said corresponding first child node.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×