Document ranking based on semantic distance between terms in a document
First Claim
Patent Images
1. A method comprising:
- receiving, by one or more processors, a document that includes first, second, third, and fourth terms;
identifying, by the one or more processors, that a list is present within the document, where the list includes a plurality of items;
determining, by the one or more processors, that the first and second terms appear in the list;
determining, by the one or more processors, in response to determining that the first and second terms appear in the list, a distance value between the first and second terms within the list, where the distance value is determined based on at least one of a plurality of rules that is associated with particular locations, in the list, of the first and second terms, where determining the distance value includes;
determining, by the one or more processors, that the first term appears in one of the plurality of items of the list and the second term appears in a different one of the plurality of items of the list, andgenerating, by the one or more processors, the distance value based on the first term appearing in the one of the plurality of items of the list and the second term appearing in the different one of the plurality of items of the list, where the distance value, relative to a particular distance value of the third and fourth terms of the list, indicates that the first and second terms are farther apart from each other than the third and fourth terms, where the third and fourth terms appear in a same one of the plurality of items of the list;
generating, by the one or more processors, a ranking score for the document based on the distance value; and
ordering, by the one or more processors, the document with respect to at least one other document based on the ranking score for the document.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document. The semantic structures can be used in the calculation of distance values between terms in the documents. The distance values may be used, for example, in the generation of ranking scores that indicate a relevance level of the document to a search query.
25 Citations
21 Claims
-
1. A method comprising:
-
receiving, by one or more processors, a document that includes first, second, third, and fourth terms; identifying, by the one or more processors, that a list is present within the document, where the list includes a plurality of items; determining, by the one or more processors, that the first and second terms appear in the list; determining, by the one or more processors, in response to determining that the first and second terms appear in the list, a distance value between the first and second terms within the list, where the distance value is determined based on at least one of a plurality of rules that is associated with particular locations, in the list, of the first and second terms, where determining the distance value includes; determining, by the one or more processors, that the first term appears in one of the plurality of items of the list and the second term appears in a different one of the plurality of items of the list, and generating, by the one or more processors, the distance value based on the first term appearing in the one of the plurality of items of the list and the second term appearing in the different one of the plurality of items of the list, where the distance value, relative to a particular distance value of the third and fourth terms of the list, indicates that the first and second terms are farther apart from each other than the third and fourth terms, where the third and fourth terms appear in a same one of the plurality of items of the list; generating, by the one or more processors, a ranking score for the document based on the distance value; and ordering, by the one or more processors, the document with respect to at least one other document based on the ranking score for the document. - View Dependent Claims (2, 3, 4, 5, 9, 10)
-
-
6. A method comprising:
-
receiving, by one or more processors, a document that includes first, second, third, and fourth terms; identifying, by the one or more processors, that a list is present within the document, where the list includes a plurality of items; determining, by the one or more processors, that the first and second terms appear in the list; determining, by the one or more processors, in response to determining that the first and second terms appear in the list, a distance value between the first and second terms within the list, where the distance value is determined based on at least one of a plurality of rules that is associated with particular locations, in the list, of the first and second terms, where determining the distance value includes; determining, by the one or more processors, that the first term appears in one of the plurality of items of the list and the second term appears in a different one of the plurality of items of the list, and generating, by the one or more processors, the distance value based on the first term appearing in the one of the plurality of items of the list and the second term appearing in the different one of the plurality of items of the list, where the distance value, relative to a particular distance value of the third and fourth terms, indicates that the first and second terms are farther apart from each other than the third and fourth terms, where the third term appears in one of the plurality of items of the list and the fourth term appears in a header of the list; generating, by the one or more processors, a ranking score for the document based on the distance value; and ordering, by the one or more processors, the document with respect to at least one other document based on the ranking score for the document. - View Dependent Claims (7, 8, 20)
-
-
11. A system comprising:
-
one or more processors to; receive a document that includes first, second, third, and fourth terms, identify that a list is present within the document, where the list includes a plurality of items, determine that first and second terms appear in the list, determine, in response to determining that the first and second terms appear in the list, a distance value between the first and second terms in the list, where the distance value is determined based on at least one of a plurality of rules that is associated with particular locations, in the list, of the first and second terms, where, when determining the distance value, the one or more processors are to; determine that the first term appears in one of the plurality of items of the list and the second term appears in a different one of the plurality of items of the list, and generate a first distance value based on the first term appearing in the one of the plurality of items of the list and the second term appearing in the different one of the plurality of items of the list, where the first distance value, relative to a particular distance value of the third and fourth terms, indicates that the first and second terms are farther apart from each other than third and fourth terms, when the third term appears in one of the plurality of items of the list and the fourth term appears in a title of the at least one list, or generate a second distance value based on the first term appearing in the one of the plurality of items of the list and the second term appearing in the different one of the plurality of items of the list, where the second distance value, relative to a particular distance value of the third and fourth terms, indicates that the first and second terms are farther apart from each other than the third and fourth terms, when the third and fourth terms appear in a same one of the plurality of items of the list; generate a ranking score for the document based on the first or second distance value, and order the document with respect to at least one other document based on the ranking score for the document. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A non-transitory computer-readable medium that stores instructions executable by at least one processor for performing a method, the computer-readable medium comprising:
-
instructions to receive a document that includes first, second, third, and fourth terms; instructions to identify that an implicit list is present within the document, where the list includes a plurality of items; instructions to determine that the first and second search terms appear in the list; instructions to determine, in response to determining that the first and second terms appear in the list, a distance value between the first and second terms in the list, where the distance value is determined based on at least one of a plurality of rules that is associated with particular locations, in the list, of the first and second terms, where the instructions to determine the distance value include, instructions to determine that the first term appears in one of the plurality of items of the list and the second term appears in a different one of the plurality of items of the list, and instructions to generate a first distance value based on the first term appearing in the one of the plurality of items of the list and the second term appearing in the different one of the plurality of items of the list, where the first distance value, relative to a particular distance value of the third and fourth terms, indicates that the first and second terms are farther apart from each other than third and fourth terms, when the third term appears in one of the plurality of items of the list and the fourth term appears in a header of the at least one list, or instructions to generate a second distance value based on the first term appearing in the one of the plurality of items of the list and the second term appearing in the different one of the plurality of items of the list, where the second distance value, relative to a particular distance value of the third and fourth terms, indicates that the first and second terms are farther apart from each other than the third and fourth terms, when the third and fourth terms appear in a same one of the plurality of items of the list; instructions to generate, with the one or more processors, a ranking score for the document based on the first or second distance value; and instructions to order the document with respect to at least one other document based on the ranking score for the document. - View Dependent Claims (18, 19)
-
-
21. A method comprising:
-
receiving, by one or more processors, a document that includes first, second, third, and fourth terms; identifying, by the one or more processors, that a semantic structure is present within the document, where the semantic structure includes a plurality of items; determining, by the one or more processors, that the first and second terms appear in the semantic structure; determining, by the one or more processors, in response to determining that the first and second terms appear in the semantic structure, a distance value between the first and second terms within the semantic structure, where the distance value is determined based on at least one of a plurality of rules that is associated with particular locations, in the semantic structure, of the first and second terms, where determining the distance value includes; determining, by the one or more processors, that the first term appears in one of the plurality of items of the semantic structure and the second term appears in a different one of the plurality of items of the semantic structure, and generating, by the one or more processors, the distance value based on the first term appearing in the one of the plurality of items of the semantic structure and the second term appearing in the different one of the plurality of items of the semantic structure, where the distance value, relative to a particular distance value of the third and fourth terms of the semantic structure, indicates that the first and second terms are farther apart from each other than the third and fourth terms, where the third and fourth terms appear in a same one of the plurality of items of the semantic structure; and generating, by the one or more processors, a ranking score for the document based on the distance value.
-
Specification