Methods and apparatuses for searching content
First Claim
1. A machine implemented method comprising:
- receiving by a search engine, from a content searching or consuming application, an atomic search term, the search engine and the content searching or consuming application being operated on one or more different or same computing devices;
receiving a content page nominally associated with the atomic search term, or access information of the content page, by the search engine;
generating, by the search engine, one or more scores for one or more structures of the content page indicative of relative relevance of the content page or one or more portions of the content page to the atomic search term, wherein the generating of a score for a structure is based at least in part on a distance function and a scoring function, wherein the structure has sub-structures structurally describing at least a portion of the content page, and having content nodes and/or text strings, wherein the distance function measures distances between sub-structures within the structure, and the scoring function is positionally sensitive, yielding different scores for different occurrence positions of the atomic search term in the sub-structures; and
conditionally providing or not providing the content or one or more portions of the content, or access information of the content or one or more portions of the content, to the content searching or consuming application, by the search engine, based at least in part on the generated one or more scores;
wherein the generating of a score for a structure further includes establishing a bound on a number of children content nodes considered for each content node and/or a bound on a size of each of the text strings considered.
3 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of methods and apparatuses for searching contents, including structured search are described herein. Embodiments of the present invention use tree structures (or more generally, graph structures), layout structures, and/or content category information to capture within search results relevant content that would otherwise be missed, to reduce the incidence of false positives within search results, and to improve the accuracy of rankings within search results. Embodiments of the present invention further use tree structures (or more generally, graph structures), layout structures, and/or content category information to extend search results to include sub-document constituents. Embodiments of the present invention also support the use of distribution properties as criteria for ranking search results. And embodiments of the present invention support search based on structural proximity, search expressions with recursively embedded operators, predicates, and/or quantifiers, and applications to selection of advertisements.
18 Citations
39 Claims
-
1. A machine implemented method comprising:
-
receiving by a search engine, from a content searching or consuming application, an atomic search term, the search engine and the content searching or consuming application being operated on one or more different or same computing devices; receiving a content page nominally associated with the atomic search term, or access information of the content page, by the search engine; generating, by the search engine, one or more scores for one or more structures of the content page indicative of relative relevance of the content page or one or more portions of the content page to the atomic search term, wherein the generating of a score for a structure is based at least in part on a distance function and a scoring function, wherein the structure has sub-structures structurally describing at least a portion of the content page, and having content nodes and/or text strings, wherein the distance function measures distances between sub-structures within the structure, and the scoring function is positionally sensitive, yielding different scores for different occurrence positions of the atomic search term in the sub-structures; and conditionally providing or not providing the content or one or more portions of the content, or access information of the content or one or more portions of the content, to the content searching or consuming application, by the search engine, based at least in part on the generated one or more scores; wherein the generating of a score for a structure further includes establishing a bound on a number of children content nodes considered for each content node and/or a bound on a size of each of the text strings considered. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A machine implemented method comprising:
-
receiving by a search engine, from a content searching or consuming application, a search expression having a first and a second proximally associated atomic sub-expression, the search engine and the content searching or consuming application being operated on one or more different or same computing devices; receiving a content page nominally associated with the search expression, or access information of the content page, by the search engine; generating, by the search engine, one or more scores for one or more structures of the content page indicative of relative relevance of the content page or one or more portions of the content page to the search expression, wherein the generating of a score for a structure is based at least in part on a distance function and a scoring function, wherein the structure have sub-structures structurally describing at least a portion of the content page, and having content nodes and/or text strings, wherein the distance function measures distances between sub-structures within the structure, and the scoring function is positionally sensitive, yielding different scores for different occurrence positions of either or both of the proximally associated first and second atomic sub-expressions in the sub-structures; and conditionally providing or not providing the content or one or more portions of the content, or access information of the content or one or more portions of the content, to the content searching or consuming application, by the search engine, based at least in part on the generated one or more scores; wherein the generating of a score for a structure further includes establishing a bound on a number of children content nodes considered for each content node and/or a bound on a size of each of the text strings considered. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. An apparatus, comprising:
-
one or more processors; means, operated by the one or more processors, for receiving from a content searching or consuming application, an atomic search term; means, operated by the one or more processors, for receiving a content page nominally associated with the atomic search term, or access information of the content page; means, operated by the one or more processors, for generating one or more scores for one or more structures of the content page indicative of relative relevance of the content page or one or more portions of the content page to the atomic search term, wherein generation of a score for a structure is based at least in part on a distance function and a scoring function, wherein the structure has sub-structures structurally describing at least a portion of the content page, and having content nodes and/or text strings, wherein the distance function measures distances between sub-structures within the structure, and the scoring function is positionally sensitive, yielding different scores for different occurrence positions of the atomic search term in the sub-structures; and means, operated by the one or more processors, for conditionally providing or not providing the content or one or more portions of the content, or access information of the content or one or more portions of the content, to the content searching or consuming application, based at least in part on the generated one or more scores; wherein generation of a score for a structure further includes establishment of a bound on a number of children content nodes considered for each content node and/or a bound on a size of each of the text strings considered.
-
-
37. A tangible, non-transitory computer-readable storage medium comprising programming instructions configured, in response to execution of the programming instruction by an apparatus, to cause the apparatus to:
-
receive from a content searching or consuming application, an atomic search term; receive a content page nominally associated with the atomic search term, or access information of the content page; generate one or more scores for one or more structures of the content page indicative of relative relevance of the content page or one or more portions of the content page to the atomic search term, wherein generation of a score for a structure is based at least in part on a distance function and a scoring function, wherein the structure has sub-structures structurally describing at least a portion of the content page, and having content nodes and/or text strings, wherein the distance function measures distances between sub-structures within the structure, and the scoring function is positionally sensitive, yielding different scores for different occurrence positions of the atomic search term in the sub-structures; and conditionally provide or not provide the content or one or more portions of the content, or access information of the content or one or more portions of the content, to the content searching or consuming application, based at least in part on the generated one or more scores; wherein generation of a score for a structure further includes establishment of a bound on a number of children content nodes considered for each content node and/or a bound on a size of each of the text strings considered.
-
-
38. An apparatus comprising:
-
one or more processors; means operated by the one or more processors, for receiving from a content searching or consuming application, a search expression having a first and a second proximally associated atomic sub-expression; means operated by the one or more processors, for receiving a content page nominally associated with the search expression, or access information of the content page; means operated by the one or more processors, for generating one or more scores for one or more structures of the content page indicative of relative relevance of the content page or one or more portions of the content page to the search expression, wherein generation of a score for a structure is based at least in part on a distance function and a scoring function, wherein the structure have sub-structures structurally describing at least a portion of the content page, and having content nodes and/or text strings, wherein the distance function measures distances between sub-structures within the structure, and the scoring function is positionally sensitive, yielding different scores for different occurrence positions of either or both of the proximally associated first and second atomic sub-expressions in the sub-structures; and means operated by the one or more processors, for conditionally providing or not providing the content or one or more portions of the content, or access information of the content or one or more portions of the content, to the content searching or consuming application, based at least in part on the generated one or more scores; wherein generation of a score for a structure further includes establishment of a bound on a number of children content nodes considered for each content node and/or a bound on a size of each of the text strings considered.
-
-
39. A tangible, non-transitory computer-readable storage medium comprising programming instructions configured, in response to execution of the programming instruction by an apparatus, to cause the apparatus to:
-
receive from a content searching or consuming application, a search expression having a first and a second proximally associated atomic sub-expression; receive a content page nominally associated with the search expression, or access information of the content page; generate one or more scores for one or more structures of the content page indicative of relative relevance of the content page or one or more portions of the content page to the search expression, wherein generation of a score for a structure is based at least in part on a distance function and a scoring function, wherein the structure have sub-structures structurally describing at least a portion of the content page, and having content nodes and/or text strings, wherein the distance function measures distances between sub-structures within the structure, and the scoring function is positionally sensitive, yielding different scores for different occurrence positions of either or both of the proximally associated first and second atomic sub-expressions in the sub-structures; and conditionally provide or not provide the content or one or more portions of the content, or access information of the content or one or more portions of the content, to the content searching or consuming application, based at least in part on the generated one or more scores; wherein generation of a score for a structure further includes establishment of a bound on a number of children content nodes considered for each content node and/or a bound on a size of each of the text strings considered.
-
Specification