×

Document processing method and system

  • US 8,359,327 B2
  • Filed: 05/25/2010
  • Issued: 01/22/2013
  • Est. Priority Date: 05/27/2009
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for expanding a seed document in a seed document set, wherein the seed document set comprises at least one seed document, the method comprising:

  • identifying one or more entity words of the seed document in memory by a processor, wherein the one or more identified entity words are words indicating focused entities of the seed document and the one or more identified entity words of the seed document are identified with focused named entity recognition (FNER) technology, the FNER technology comprising;

    segmenting the seed document;

    applying part of speech tagging;

    identifying candidate entity words;

    extracting feature values for each candidate entity word to form a feature vector;

    setting a threshold and setting a weight for each feature value in the feature vector;

    calculating a score of each candidate entity word with the feature vector and the weight; and

    comparing the score with the set threshold and determining entity words from the candidate entity words as the one or more identified entity words;

    identifying by the processor, based on each of the one or more identified entity words of the seed document, one or more topic words related to each of the one or more identified entity words, the one or more identified topic words located in the seed document, wherein the one or more identified topic words of the seed document are identified with focused topic detection (FTD) technology using the segmenting of the seed document and the part of speech tagging of the FNER technology as a basis for identifying the one or more topic words;

    forming, by the processor, an entity word-topic word pair from each of the one or more identified topic words and each of the one or more identified entity words upon which each of the one or more identified topic words is identified; and

    obtaining one or more expanded documents by the processor by taking the entity word and topic word in each entity word-topic word pair as key words for web searching at the same time, wherein the expanded documents comprise not only the entity word in the each entity word-topic word pair but also the topic word in the each entity word-topic word pair.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×