×

System and method for document section segmentation

  • US 20050144184A1
  • Filed: 09/30/2004
  • Published: 06/30/2005
  • Est. Priority Date: 10/01/2003
  • Status: Abandoned Application
First Claim
Patent Images

1. A system and method for document heading categorization, comprising the steps of:

  • constructing a first data set consisting of exemplars having at least one pair of expressions and corresponding codes;

    constructing a second data set having a structural hierarchy, where the second data set contains at least one corresponding code mapped to at least one expression;

    transforming at least one of the expressions into a first representation, where the first representation includes sequential word features;

    constructing a target data set consisting of at least one first representation and at least one corresponding code;

    comparing a candidate string to the target data set;

    identifying a least dissimilar target representation in the target data set having a dissimilarity score exceeding a first pre-determined value;

    providing the corresponding code of the least dissimilar target in the target data set;

    selectively saving a candidate string having a dissimilarity score not exceeding a second pre-determined value; and

    selectively reviewing the saved candidate string and assigning its representation and corresponding code to the target data set.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×