×

Query generation using structural similarity between documents

  • US 8,346,792 B1
  • Filed: 11/09/2010
  • Issued: 01/01/2013
  • Est. Priority Date: 11/09/2010
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method executed by a data processing apparatus, comprising:

  • receiving, by one or more computers, a set of seed queries associated with a structured document, the structured document including embedded coding and being hosted on a website, each seed query including one or more terms;

    identifying, by the one or more computers, one or more embedded coding fragments from the structured document for each seed query, each identified embedded coding fragment for a seed query specifying a structure of a portion of the structured document that includes at least one term of the seed query;

    generating, by the one or more computers, one or more query templates, each query template corresponding to at least one identified embedded coding fragment, the query template including the structure of the corresponding at least one embedded coding fragment and a generative rule to be used in generating one or more candidate synthetic queries;

    generating, by the one or more computers, the one or more candidate synthetic queries using the one or more query templates and other structured documents hosted on the website, the generating comprising, for each query template;

    identifying a portion of a particular structured document hosted on the website that includes the structure specified by the corresponding embedded coding fragment; and

    generating a candidate synthetic query using text contained in the portion of the particular structured document and specified by the generative rule;

    measuring, by the one or more computers, a performance in a search operation of each of the one or more candidate synthetic queries;

    designating, by the one or more computers, as a synthetic query a candidate synthetic query that has a performance measurement that exceeds a performance threshold; and

    storing, by the one or more computers, the designated synthetic query in a query store.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×