Search results ranking using editing distance and document information
First Claim
1. A computer-implemented relevance system, comprising:
- one or more processors; and
a memory coupled to the one or more processors, the memory storing instructions which, when executed by the one or more processors, cause the one or more processors to;
extract document information from a document received as search results based on a query string, the document information including a universal resource locator wherein the universal resource locator includes a compound term;
split the compound term into multiple, separate terms;
find at least one of the multiple, separate terms in a dictionary of terms;
generate a target data string based on the extracted document information, the target data string including one of the multiple, separate terms found in the dictionary; and
compute edit distance between the target data string and the query string, the edit distance employed in determining relevance of a document as part of result ranking.
2 Assignments
0 Petitions
Accused Products
Abstract
Architecture for extracting document information from documents received as search results based on a query string, and computing an edit distance between the data string and the query string. The edit distance is employed in determining relevance of the document as part of result ranking by detecting near-matches of a whole query or part of the query. The edit distance evaluates how close the query string is to a given data stream that includes document information such as TAUC (title, anchor text, URL, clicks) information, etc. The architecture includes the index-time splitting of compound terms in the URL to allow the more effective discovery of query terms. Additionally, index-time filtering of anchor text is utilized to find the top N anchors of one or more of the document results. The TAUC information can be input to a neural network (e.g., 2-layer) to improve relevance metrics for ranking the search results.
-
Citations
20 Claims
-
1. A computer-implemented relevance system, comprising:
-
one or more processors; and a memory coupled to the one or more processors, the memory storing instructions which, when executed by the one or more processors, cause the one or more processors to; extract document information from a document received as search results based on a query string, the document information including a universal resource locator wherein the universal resource locator includes a compound term; split the compound term into multiple, separate terms; find at least one of the multiple, separate terms in a dictionary of terms; generate a target data string based on the extracted document information, the target data string including one of the multiple, separate terms found in the dictionary; and compute edit distance between the target data string and the query string, the edit distance employed in determining relevance of a document as part of result ranking. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method of determining relevance of a document, comprising:
-
receiving a query string as part of a search process; extracting a universal resource locator from document information included in a document returned during the search process, wherein the universal resource locator includes a compound term; generating a target data string from the universal resource locator by splitting the compound term of the universal resource locator into multiple, separate terms and finding at least one of the multiple, separate terms in a dictionary of terms; computing edit distance between the target data string and the query string; and calculating a relevance score based on the edit distance. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented method of computing relevance of a document, comprising:
-
processing a query string as part of a search process to return a result set of documents; generating a target data string based on document information extracted from a document of the result set, the document information including a universal resource locator, wherein the universal resource locator includes a compound term, wherein generating the target data string includes splitting the compound term into multiple, separate terms, and finding at least one of the multiple, separate terms in a dictionary of terms; computing edit distance between the target data string and the query string based on term insertion, term deletion, and term position; and calculating a relevance score based on the edit distance, the relevance score used to rank the document in the result set. - View Dependent Claims (18, 19, 20)
-
Specification