×

Probabilistic learning method for XML annotation of documents

  • US 8,543,906 B2
  • Filed: 06/29/2005
  • Issued: 09/24/2013
  • Est. Priority Date: 06/29/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A document processor stored in a non-transitory medium comprising:

  • a probabilistic classifier that classifies fragments of an input document respective to a set of terminal elements by assigning probability values for the fragments corresponding to elements of the set of terminal elements;

    a parser that defines a parsed document structure associating the input document fragments with terminal elements connected by links of non-terminal elements conforming with a probabilistic grammar defining transformation rules operating on elements selected from the set of terminal elements and a set of non-terminal elements, the parsed document structure being used to organize the input document, the parser including a joint probability optimizer that optimizes the parsed document structure respective to a joint probability of (i) the probability values of the associated terminal elements and (ii) probabilities of the connecting links of non-terminal elements derived from the probabilistic grammar;

    a classifier trainer that trains the probabilistic classifier respective to a set of training documents having pre-classified fragments; and

    a grammar derivation module that derives the probabilistic grammar from the set of training documents, each training document having a pre-assigned parsed document structure associating fragments of the training document with terminal elements connected by links of non-terminal elements.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×