×

Automated screen scraping via grammar induction

  • US 8,838,625 B2
  • Filed: 04/03/2009
  • Issued: 09/16/2014
  • Est. Priority Date: 04/03/2009
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method of extracting information from a data source, comprising:

  • intercepting display information transmitted to a computer-implemented display device;

    wherein the display information is from the data source;

    wherein the display information includes information to cause particular visual content to be displayed on the computer-implemented display device;

    inducing a grammar via statistical analysis of the intercepted display information;

    wherein inducing a grammar includes determining how to break up the particular visual content into component parts;

    wherein determining how to break up the particular visual content into component parts includes;

    identifying a plurality of tokens in the particular visual content;

    for each token of the plurality of tokens, determining a frequency at which the token appears within the display information from the data source; and

    determining how to break up the particular visual content into component parts based, at least in part, on the frequency determined for each token of the plurality of tokens;

    generating a parser corresponding to the induced grammar; and

    performing screen scraping using the generated parser to produce a sequence of return values representing the extracted information;

    wherein the method is performed by one or more computing devices.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×