×

Computer implemented example-based concept-oriented data extraction method

  • US 7,107,524 B2
  • Filed: 05/21/2003
  • Issued: 09/12/2006
  • Est. Priority Date: 12/24/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer implemented example-based concept-oriented data extraction method, comprising:

  • a first procedure for labeling an exemplary data string, comprising the steps of;

    capturing an exemplary data string;

    tokenizing the exemplary data string into a plurality of tokens as an exemplary token sequence, each token having an index;

    specifying the exemplary token sequence as a plurality of specific concepts, each being labeled to be a tuple and consisting of at least one token, the specific concept being selected from the group of a target concept and a filler concept, the target concept pointing to the targeted data of interest, the filler concept pointing to the contextual data of the targeted data, each tuple having a format including a concept type, a concept name, a beginning index of the first token in the specific concept, an ending index of the last token in the specific concept, and an associated concept recognizer of the specific concept, wherein the associated concept recognizer is provided to recognize the possible token sequence of the specific concept; and

    constructing an exemplary concept graph of the exemplary data string according to the tuples; and

    a second procedure for extracting targeted data from an untested data string, comprising the steps of;

    capturing an untested data string;

    tokenizing the untested data string into a plurality of tokens as an untested token sequence;

    using the associated concept recognizers defined by the tuples for detecting a plurality of concept candidates, wherein each concept candidate has a format including the beginning index and the ending index of the corresponding token sequence, and the concept name of the concept candidate;

    constructing a preliminary concept graph of the untested token sequence according to the concept candidates; and

    determining an optimal hypothetical concept sequence by comparing the exemplary concept graph with the preliminary concept graph and capturing at least one matched target concept from the optimal hypothetical concept sequence for extracting the targeted data.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×