×

Method and system for extracting web query interfaces

  • US 7,552,116 B2
  • Filed: 08/06/2004
  • Issued: 06/23/2009
  • Est. Priority Date: 08/06/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer readable storage medium encoded with a computer program to be executed by a computer for extracting semantic information about a plurality of documents autonomously created by different sources and being accessible via a computer network, said computer readable storage medium comprising:

  • a tokenizer for causing the computer to generate a set of tokens indicative of document object model (DOM) nodes associated with visual information in a displayed document image from one of the plurality of autonomously created documents;

    a grammar mechanism for causing the computer to derive a non-prescribed visual grammar from the set of tokens to represent a hidden syntax convention of a visual presentation; and

    a best-effort parser for causing the computer to apply the derived visual grammar to construct multiple parse trees that represent semantic structure of the document and interpret a maximum subset of the set of tokens,wherein said non-prescribed visual grammar is derived from a plurality of autonomously created or heterogeneous Web documents to represent the hidden syntax convention of the visual presentation common among the plurality of autonomously created or heterogeneous Web documents; and

    said grammar is a five tuple <

    Σ

    , N, s, Pd, Pf>

    where Σ

    is a set of terminal symbols, N is a set of nonterminal symbols, sε

    N is a start symbol, Pd is a set of production rules that represent visual patterns and Pf is a set of preference rules that represent pattern precedence.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×