Document ranking based on semantic distance between terms in a document
First Claim
Patent Images
1. A method comprising:
- identifying, by one or more processors, a document based on two or more search terms;
forming, by one or more processors, a tree structure based on the document, the tree structure including a plurality of items;
analyzing, by the one or more processors, a repetition of one or more tags in the tree structure;
determining, by the one or more processors and based on analyzing the repetition of the one or more tags, that the plurality of items are associated with a list, in the tree structure,the list not being defined by a list tag,the list including a header, andeach item, of the plurality of items associated with the list, including a plurality of words that describe the item associated with the list;
annotating, by the one or more processors, the tree structure to indicate that the list is present;
determining, by the one or more processors, a metric associated with the two or more search terms in a first manner when the two or more search terms appear in a single item of the plurality of items associated with the list;
determining, by the one or more processors, the metric associated with the two or more search terms in a second manner when;
a first search term, of the two or more search terms, appears in a first item of the plurality of items associated with the list, anda second search term, of the two or more search terms, appears in a second item of the plurality of items associated with the list,the first manner being different than the second manner;
determining, by the one or more processors, the metric associated with the two or more search terms in a third manner when;
the first search term appears in the header, andthe second search term appears in an item of the plurality of items associated with the list,the third manner being different than the first manner and the second manner;
determining, by the one or more processors, a score for the document based on the metric associated with the two or more search terms; and
ranking, by the one or more processors and based on the score, the document with respect to at least one other document.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document. The semantic structures can be used in the calculation of distance values between terms in the documents. The distance values may be used, for example, in the generation of ranking scores that indicate a relevance level of the document to a search query.
23 Citations
20 Claims
-
1. A method comprising:
-
identifying, by one or more processors, a document based on two or more search terms; forming, by one or more processors, a tree structure based on the document, the tree structure including a plurality of items; analyzing, by the one or more processors, a repetition of one or more tags in the tree structure; determining, by the one or more processors and based on analyzing the repetition of the one or more tags, that the plurality of items are associated with a list, in the tree structure, the list not being defined by a list tag, the list including a header, and each item, of the plurality of items associated with the list, including a plurality of words that describe the item associated with the list; annotating, by the one or more processors, the tree structure to indicate that the list is present; determining, by the one or more processors, a metric associated with the two or more search terms in a first manner when the two or more search terms appear in a single item of the plurality of items associated with the list; determining, by the one or more processors, the metric associated with the two or more search terms in a second manner when; a first search term, of the two or more search terms, appears in a first item of the plurality of items associated with the list, and a second search term, of the two or more search terms, appears in a second item of the plurality of items associated with the list, the first manner being different than the second manner; determining, by the one or more processors, the metric associated with the two or more search terms in a third manner when; the first search term appears in the header, and the second search term appears in an item of the plurality of items associated with the list, the third manner being different than the first manner and the second manner; determining, by the one or more processors, a score for the document based on the metric associated with the two or more search terms; and ranking, by the one or more processors and based on the score, the document with respect to at least one other document. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more server devices to; identify a document based on two or more search terms; form a tree structure based on the document, the tree structure including a plurality of items; analyze a repetition of one or more tags in the tree structure; determine, based on analyzing the repetition of the one or more tags, that the plurality of items are associated with a list, in the tree structure, the list not being defined by a list tag, the list including a header, and each item, of the plurality of items associated with the list, including plurality of words that describe the item associated with the list; annotate the tree structure to indicate that the list is present; determine a metric associated with the two or more search terms in a first manner when the two or more search terms appear in a single item of the plurality of items associated with the list; determine the metric associated with the two or more search terms in a second manner when; a first search term, of the two or more search terms, appears in a first item of the plurality of items associated with the list, and a second search term, of the two or more search terms, appears in a second item of the plurality of items associated with the list, the first manner being different than the second manner; determine the metric associated with the two or more search terms in a third manner when; the first search term appears in the header, and the second search term appears in an item of the plurality of items associated with the list, the third manner being different than the first manner and the second manner; determine a score for the document based on the metric associated with the two or more search terms; and rank, based on the score, the document with respect to at least one other document. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable medium for storing instructions, the instructions comprising:
-
one or more instructions which, when executed by at least one processor, cause the at least one processor to identify a document based on two or more search terms, one or more instructions which, when executed by the at least one processor, cause the at least one processor to form a tree structure based on the document, the tree structure including a plurality of items; one or more instructions which, when executed by the at least one processor, cause the at least one processor to analyze a repetition of one or more tags in the tree structure; one or more instructions which, when executed by the at least one processor, cause the at least one processor to determine, based on analyzing the repetition of the one or more tags, that the plurality of items are associated with a list, in the tree structure, the list not being defined by a list tag, the list including a header, and each item, of the plurality of items associated with the list, including a plurality of words that describe the item associated with the list; one or more instructions which, when executed by the at least one processor, cause the at least one processor to annotate the tree structure to indicate that the list is present; one or more instructions which, when executed by the at least one processor, cause the at least one processor to determine a metric associated with the two or more search terms in a first manner when the two or more search terms appear in a single item of the plurality of items associated with the list; one or more instructions which, when executed by the at least one processor, cause the at least one processor to determine the metric associated with the two or more search terms in a second manner when; a first search term, of the two or more search terms, appears in a first item of the plurality of items associated with the list, and a second search term, of the two or more search terms, appears in a second item of the plurality of items associated with the list, the first manner being different than the second manner; one or more instructions which, when executed by the at least one processor, cause the at least one processor to determine the metric associated with the two or more search terms in a third manner when; the first search term appears in the header, and the second search term appears in an item of the plurality of items associated with the list, the third manner being different than the first manner and the second manner; one or more instructions which, when executed by the at least one processor, cause the at least one processor to determine a score for the document based on the metric associated with the two or more search terms; and one or more instructions which, when executed by the at least one processor, cause the at least one processor to rank, based on the score, the document with respect to at least one other document. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification