×

Large scale concept discovery for webpage augmentation using search engine indexers

  • US 8,886,623 B2
  • Filed: 04/07/2010
  • Issued: 11/11/2014
  • Est. Priority Date: 04/07/2010
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method comprising:

  • retrieving, by a training computer, training data comprising a plurality of web documents;

    extracting, by the training computer, information from the training data, the extracted information comprising a plurality of phrases extracted from each document of said plurality of web documents;

    learning, by the training computer, to disambiguate the extracted information by analysis of a context derived from words proximate each phrase such that a particular sense of each phrase of the plurality of phrases is determined for each web document;

    generating, by the training computer as a result of the learning to disambiguate step, a disambiguation classifier capable of determining a sense of a phrase within a document to be analyzed;

    learning, by the training computer using the disambiguated extracted information from each web document, to select a portion of the extracted information of each web document as being relevant to a theme of the each web document;

    generating, by the training computer as a result of the learning to select step, a selection classifier capable of selecting a topic in a document that is relevant to the theme of the document;

    using, by an indexing computer, the disambiguation classifier and the selection classifier to determine a set of topics from a new web document that is not a part of the training data and a set of categories from the new web document;

    determining, by the indexing computer, one or more entities associated with the set of topics, the one or more entities selected from a group of entities consisting of text, a graphic, an icon, a video, and a link; and

    transmitting, by the indexing computer, topic and category information to a client computer for display, the topic and category information obtained from a group of topic and category information consisting of the set of topics, the set of categories, and the one or more entities.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×