×

Text segmentation with multiple granularity levels

  • US 8,892,420 B2
  • Filed: 11/17/2011
  • Issued: 11/18/2014
  • Est. Priority Date: 11/22/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method of text processing, comprising:

  • training, using a processor, a classifier for classifying text, wherein;

    the training is based on a plurality of training sample entries;

    a training sample entry in the plurality of training sample entries includes;

    a character count;

    an independent use rate;

    a phrase structure rule value indicating whether the training sample entry complies with phrase structure rules;

    a semantic attribute value indicating an inclusion state of the training sample entry in a predetermined set of enumerated entries;

    an overlap attribute value indicating overlap of the training sample entry with another entry in the predetermined set of enumerated entries; and

    a classification result indicating whether the training sample entry is a compound semantic unit or a smallest semantic unit;

    building, using the processor, a lexicon of smallest semantic units, comprising;

    receiving an entry to be classified;

    using the trained classifier to determine whether the entry to be classified is a smallest semantic unit or a compound semantic unit; and

    in the event that the entry is determined to be a smallest semantic unit, adding the entry to the lexicon of smallest semantic units;

    segmenting, using the processor, received text based on the lexicon of smallest semantic units to obtain medium-grained segmentation results;

    merging, using the processor, the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results;

    looking up, using the processor, in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and

    forming, using the processor, fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×