Document ranking based on semantic distance between terms in a document
First Claim
Patent Images
1. A method, performed by one or more server devices, comprising:
- identifying, using a processor of the one or more server devices, an implicitly defined semantic structure in a document, where a plurality of rules are associated with the implicitly defined semantic structure, and where the semantic structure includes a list having a header and a plurality of items associated with the header;
determining, using a processor of the one or more server devices, a location of a first term and a location of a second term within the list;
selecting, using a processor of the one or more server devices, one of the plurality of rules, as a selected rule, based on a relationship of the locations of the first and second terms within the implicitly defined semantic structure,where a first rule of the plurality of rules is selected when the first term is located in one of the plurality of items and the second term is located in a different one of the plurality of items,where a second rule of the plurality of rules, different than the first rule, is selected when the first term is located in one of the plurality of items and the second term is located in the same one of the plurality of items, andwhere a third rule of the plurality of rules, different than the first rule and the second rule, is selected when the first term is located in the header and the second term is located in one of the plurality of items;
determining, using a processor of the one or more server devices, a distance value, reflecting a distance between the first and second terms, using a function based on the selected rule, where the function differs based on whether the selected rule corresponds to the first rule, the second rule, or the third rule; and
outputting, using a processor of the one or more server devices, the distance value to rank the document for relevancy to a search query that includes the first term and the second term.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document. The semantic structures can be used in the calculation of distance values between terms in the documents. The distance values may be used, for example, in the generation of ranking scores that indicate a relevance level of the document to a search query.
328 Citations
21 Claims
-
1. A method, performed by one or more server devices, comprising:
-
identifying, using a processor of the one or more server devices, an implicitly defined semantic structure in a document, where a plurality of rules are associated with the implicitly defined semantic structure, and where the semantic structure includes a list having a header and a plurality of items associated with the header; determining, using a processor of the one or more server devices, a location of a first term and a location of a second term within the list; selecting, using a processor of the one or more server devices, one of the plurality of rules, as a selected rule, based on a relationship of the locations of the first and second terms within the implicitly defined semantic structure, where a first rule of the plurality of rules is selected when the first term is located in one of the plurality of items and the second term is located in a different one of the plurality of items, where a second rule of the plurality of rules, different than the first rule, is selected when the first term is located in one of the plurality of items and the second term is located in the same one of the plurality of items, and where a third rule of the plurality of rules, different than the first rule and the second rule, is selected when the first term is located in the header and the second term is located in one of the plurality of items; determining, using a processor of the one or more server devices, a distance value, reflecting a distance between the first and second terms, using a function based on the selected rule, where the function differs based on whether the selected rule corresponds to the first rule, the second rule, or the third rule; and outputting, using a processor of the one or more server devices, the distance value to rank the document for relevancy to a search query that includes the first term and the second term. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a device comprising; means for identifying an implicitly defined semantic structure associated with terms in a document, where a plurality of rules are associated with the implicitly defined semantic structure, and where the semantic structure includes a list including a header and a plurality of items associated with the header; means for determining a location relationship between a pair of the terms within the list; means for selecting one of the plurality of rules, as a selected rule, corresponding to the location relationship; where a first rule of the plurality of rules is determined to correspond to the location relationship when the first term is located in one of the plurality of items and the second term is located in a different one of the plurality of items; where a second rule of the plurality of rules, different than the first rule, is determined to correspond to the location relationship when the first term is located in one of the plurality of items and the second term is located in the same one of the plurality of items; and where a third rule of the plurality of rules, different than the first rule and the second rule, is determined to correspond to the location relationship when the first term is located in the header and the second term is located in one of the plurality of items; means for determining a distance value between the pair of terms using a function that is based on the selected rule, where the function differs based on whether the selected rule is the first rule, the second rule, or the third rule; means for generating a ranking score for the document based on the distance value; and means for outputting the ranking score.
-
-
9. A method performed by one or more server devices, comprising:
-
identifying, using a processor of the one or more server devices, a semantic structure associated with terms in a plurality of documents, where a plurality of rules are associated with the semantic structure, and where the semantic structure includes a list including a header and a plurality of items associated with the header; locating, using a processor of the one or more server devices, a first term and a second term occurring within the list; selecting, using a processor of the one or more server devices and based on a relationship of the locations of the first and second terms, at least one of the plurality of rules, as a selected rule, to be used in determining a distance value between the first and second terms; where a first rule of the plurality of rules is selected when the first term is located in one of the plurality of items and the second term is located in a different one of the plurality of items, where a second rule of the plurality of rules, different than the first rule, is selected when the first term is located in one of the plurality of items and the second term is located in the same one of the plurality of items, and where a third rule of the plurality of rules, different than the first rule and the second rule, is selected when the first term is located in the header and the second term is located in one of the plurality of items; determining, using a processor of the one or more server devices when the first and second terms occur in a search query, the distance value, between the first and second terms within the semantic structure, using a function that is based on the selected rule, where the function differs based on whether the selected rule corresponds to the first rule, the second rule, or the third rule; ranking, using a processor of the one or more server devices, the documents for relevancy to the search query based on the determined distance value; and outputting, using a processor of the one or more server devices, the rankings of the documents in response to the search query. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A device comprising:
-
a memory; and a processor coupled to the memory and configured to; identify a semantic structure associated with a first term and a second term occurring in a document, where a plurality of rules are associated with the semantic structure, and where the semantic structure includes a list having a header and a plurality of items associated with the header; determine a semantically based distance relationship between the first term and the second term in the identified semantic structure; select one of the plurality of rules, as a selected rule, that corresponds to the distance relationship; where the processor is to select a first rule of the plurality of rules when the first term is located in one of the plurality of items and the second term is located in a different one of the plurality of items, where the processor is to select a second rule of the plurality of rules, different than the first rule, when the first term is located in one of the plurality of items and the second term is located in the same one of the plurality of items, and where the processor is to select a third rule of the plurality of rules, different than the first rule and the second rule, when the first term is located in the header and the second term is located in one of the plurality of items; determine, using a function based on the selected rule, a semantically based distance value between the first term and the second term, where the first term and the second term occur in a search query, and where the function differs based on whether the selected rule corresponds to the first rule, the second rule, or the third rule; rank the document for relevancy to the search query based on the semantically based distance value; and provide at least some of the ranks in response to the search query. - View Dependent Claims (16, 17)
-
-
18. A memory device containing computer-executable instructions, the memory device comprising:
-
one or more instructions that receive a search query; one or more instructions that identify an implicitly defined semantic structure associated with terms in documents, where a plurality of rules are associated with the implicitly defined semantic structure, and where the semantic structure includes a list having a header and a plurality of items associated with the header; one or more instructions that determine a semantic-based distance relationship between a first term and a second term within the list; one or more instructions that select one of the plurality of rules, as a selected rule, based on the semantic-based distance relationship between the first and second terms within the implicitly defined semantic structure; where a first rule of the plurality of rules is selected when the first term is located in one of the plurality of items and the second term is located in a different one of the plurality of items, where a second rule of the plurality of rules, different than the first rule, is selected when the first term is located in one of the plurality of items and the second term is located in the same one of the plurality of items, and where a third rule of the plurality of rules, different than the first rule and the second rule, is selected when the first term is located in the header and the second term is located in one of the plurality of items; one or more instructions that determine, using a function based on the selected rule, a distance value for the first and second terms, where the function differs based on whether the selected rule corresponds to the first rule, the second rule, or the third rule; one or more instructions that rank the documents for relevancy to the search query based on the distance value; and one or more instructions that present the documents in an order influenced by the ranking. - View Dependent Claims (19, 20, 21)
-
Specification