×

Pattern recognition in web search engine result pages

  • US 8,326,830 B2
  • Filed: 10/06/2009
  • Issued: 12/04/2012
  • Est. Priority Date: 10/06/2009
  • Status: Active Grant
First Claim
Patent Images

1. A non-transitory computer readable medium comprising computer readable instructions, which, when executed by a computer, cause the computer to perform a method, the method comprising:

  • receiving a result page from a web search engine, the result page comprising text fields and markup tags, and an integer number for the results on the result page;

    generating simplified variations of the result page, the generating comprising;

    determining noisy markup tags in the result page;

    generating a first variation of the result page by removing the noisy markup tags from the result page;

    generating a plurality of other variations of the result page by preserving one noisy markup tag from the noisy markup tags and removing the rest of the noisy markup tags;

    stripping inside of remaining markup tags in the first variation and the plurality of other variations; and

    simplifying the text fields by marking the text fields with free text markers;

    parsing the simplified variations of the result page to determine one or more repeating patterns, the one or more repeating patterns comprising a substring of the simplified variations of the result page, the substring beginning at a start of a remaining markup tag or a free text marker and ending at a close of the remaining markup tag or the free text marker;

    selecting the one or more repeating patterns that are repeated the integer number of times in the result page as result patterns;

    selecting one of the one or more result patterns as a highest rated result pattern according to predefined rating criteria; and

    generating a regular expression from the highest rated result pattern as an output that matches the results on the result page.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×