×

LARGE SCALE CONCEPT DISCOVERY FOR WEBPAGE AUGMENTATION USING SEARCH ENGINE INDEXERS

  • US 20110252045A1
  • Filed: 04/07/2010
  • Published: 10/13/2011
  • Est. Priority Date: 04/07/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • retrieving, by a training computer, training data comprising a plurality of web documents;

    extracting, by the training computer, information from the training data, the extracted information comprising a plurality of phrases extracted from each document of said plurality of web documents;

    learning, by the training computer, to disambiguate the extracted information by analysis of a context derived from words proximate each phrase such that a particular sense of each phrase of the plurality of phrases is determined for each web document;

    generating, by the training computer as a result of the learning to disambiguate step, a disambiguation classifier capable of determining a sense of a phrase within a document to be analyzed;

    learning, by the training computer using the disambiguated extracted information from each web document, to select a portion of the extracted information of each web document as being relevant to a theme of the each web document;

    generating, by the training computer as a result of the learning to select step, a selection classifier capable of selecting a topic in a document that is relevant to the theme of the document; and

    using, by an indexing computer, the disambiguation classifier and the selection classifier to determine a set of topics from a new web document that is not a part of the training data.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×