Verifying relevance between keywords and web site contents
First Claim
1. A method for verifying relevance between terms and Web site contents, the method comprising:
- retrieving site contents from a bid URL;
formulating expanded term(s) comprising at least one of semantically or contextually related to bid term(s), which are mined from a search engine in view of high-frequency of occurrence historical query terms;
generating content similarity and expanded similarity measurements from respective combinations of the bid term(s), the site contents, and the expanded terms, wherein the similarity measurements indicate relatedness between respective ones of the bid term(s), site contents, or expanded terms;
calculating category similarity measurements between the expanded terms and the site contents in view of a similarity classifier, wherein the similarity classifier has been trained from mined web site content associated with directory data;
calculating a confidence value from combined ones of multiple similarity measurements, wherein the combined ones comprise content, expanded, and category similarity measurements, wherein the confidence value provides objective measure of relevance between the bid term(s) and the site contents;
analyzing the confidence value to identify the bid term(s); and
using the bid term(s) identified to increase traffic to a site to obtain site exposure;
wherein generating the category similarity measurements further comprises;
extracting features from Web site content associated with the directory data, the features comprising a combination of at least one of title, metadata, body, hypertext link(s), visual feature(s), and summarization by page layout analysis information;
reducing dimensionality of the features via feature selection;
categorizing the features via a classifier model to generate the similarity classifier;
generating respective term vectors from the bid term(s), the site contents, and the expanded terms; and
calculating similarity between the respective term vectors as a function of the similarity classifier to determine the category similarity measurements.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for verifying relevance between terms and Web site contents are described. In one aspect, site contents from a bid URL are retrieved. Expanded term(s) semantically and/or contextually related to bid term(s) are calculated. Content similarity and expanded similarity measurements are calculated from respective combinations of the bid term(s), the site contents, and the expanded terms. Category similarity measurements between the expanded terms and the site contents are determined in view of a trained similarity classifier. The trained similarity classifier having been trained from mined web site content associated with directory data. A confidence value providing an objective measure of relevance between the bid term(s) and the site contents is determined from the content, expanded, and category similarity measurements evaluating the multiple similarity scores in view of a trained relevance classifier model.
175 Citations
41 Claims
-
1. A method for verifying relevance between terms and Web site contents, the method comprising:
-
retrieving site contents from a bid URL; formulating expanded term(s) comprising at least one of semantically or contextually related to bid term(s), which are mined from a search engine in view of high-frequency of occurrence historical query terms; generating content similarity and expanded similarity measurements from respective combinations of the bid term(s), the site contents, and the expanded terms, wherein the similarity measurements indicate relatedness between respective ones of the bid term(s), site contents, or expanded terms; calculating category similarity measurements between the expanded terms and the site contents in view of a similarity classifier, wherein the similarity classifier has been trained from mined web site content associated with directory data; calculating a confidence value from combined ones of multiple similarity measurements, wherein the combined ones comprise content, expanded, and category similarity measurements, wherein the confidence value provides objective measure of relevance between the bid term(s) and the site contents; analyzing the confidence value to identify the bid term(s); and using the bid term(s) identified to increase traffic to a site to obtain site exposure; wherein generating the category similarity measurements further comprises; extracting features from Web site content associated with the directory data, the features comprising a combination of at least one of title, metadata, body, hypertext link(s), visual feature(s), and summarization by page layout analysis information; reducing dimensionality of the features via feature selection; categorizing the features via a classifier model to generate the similarity classifier; generating respective term vectors from the bid term(s), the site contents, and the expanded terms; and calculating similarity between the respective term vectors as a function of the similarity classifier to determine the category similarity measurements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable storage medium comprising computer-executable instructions for verifying relevance between terms and Web site contents, the computer-executable instructions comprising instructions for:
-
retrieving site contents from a bid URL; formulating expanded term(s) comprising at least one of semantically or contextually related to bid term(s), which are mined from a search engine in view of high-frequency of occurrence historical query terms; generating content similarity and expanded similarity measurements from respective combinations of the bid term(s), the site contents, and the expanded terms, wherein the similarity measurements indicate relatedness between respective ones of the bid term(s), site contents, or expanded terms; calculating category similarity measurements between the expanded terms and the site contents in view of a similarity classifier, wherein the similarity classifier has been trained from mined web site content associated with directory data; calculating a confidence value from combined ones of multiple similarity measurements, wherein the combined ones comprise content, expanded, and category similarity measurements; providing an objective measure of relevance between the bid term(s) and the site contents as indicated by the confidence value; analyzing the confidence value to identify the bid term(s); and using the bid term(s) identified to increase traffic to a site to obtain site exposure; wherein the computer-executable instructions for generating the category similarity measurements further comprise instructions for; extracting features from Web site content associated with the directory data, the features comprising a combination at least one of title, metadata, body, hypertext link(s), visual feature(s), and summarization by page layout analysis information;
reducing dimensionality of the features via feature selection;
categorizing the features via a classifier model to generate the similarity classifier;generating respective term vectors from the bid term(s), the site contents, and the expanded terms; and calculating similarity between the respective term vectors as a function of the similarity classifier to determine the category similarity measurements. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computing device for verifying relevance between terms and Web site contents, the computing device comprising:
-
a processor; and a memory coupled to the processor, the memory comprising computer-program instructions executable by the processor for; retrieving site contents from a bid URL; formulating expanded term(s) comprising at least one of semantically or contextually related to bid term(s), generating content similarity and expanded similarity measurements from respective combinations of the bid term(s), the site contents, and the expanded terms, wherein the similarity measurements indicate relatedness between respective ones of the bid term(s), site contents;
or expanded terms;calculating a confidence value from combined ones of multiple similarity measurements, wherein the combined ones comprise content, expanded, and category similarity measurements; providing an objective measure of relevance between the bid term(s) and the site contents as indicated by the confidence value; analyzing the confidence value to identify the bid term(s); and using the bid term(s) identified to increase traffic to a site to obtain site exposure; wherein the computer-executable instructions for generating the category similarity measurements further comprise instructions for; extracting features from web site content associated with the directory data, the features comprising a combination of at least one of title, metadata, body, hypertext link(s), visual feature(s), and summarization by page layout analysis information; reducing dimensionality of the features via feature selection; categorizing the features via a classifier model to generate the similarity classifier; generating respective term vectors from the bid term(s), the site contents, and the expanded terms; and calculating similarity between the respective term vectors as a function of the similarity classifier to determine the category similarity measurements. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A computing device for verifying relevance between terms and Web site contents, the computing device comprising:
-
retrieving means to obtain site contents from a bid URL; formulating means to identify expanded term(s) comprising at least one of semantically or contextually related to bid term(s), generating means to create content similarity and expanded similarity measurements from respective combinations of the bid term(s), the site contents, and the expanded terms, wherein the similarity measurements indicating indicate relatedness between respective ones of the bid term(s), site contents, or expanded terms; calculating means to determine category similarity measurements between the expanded terms and the site contents in view of a similarity classifier, wherein the similarity classifier has been trained from mined web site content associated with directory data; calculating means to generate a confidence value from combined ones of multiple similarity measurements, wherein the combined ones comprise content, expanded, and category similarity measurements, wherein the confidence value provides an objective measure of relevance between the bid term(s) and the site contents; analyzing means to analyze the confidence value to identify the bid term(s); and increasing means to increase traffic to a site by using the bid term(s) identified; wherein the generating means further comprise; extracting means to obtain features from Web site content associated with the directory data, the features comprising a combination of at least one of title, metadata, body, hypertext link(s), visual feature(s), and summarization by page layout analysis information; reducing means to lessen dimensionality of the features via feature selection; categorizing means to organize the features via a classifier model to generate the similarity classifier; generating means to create respective term vectors from the bid term(s), the site contents, and the expanded terms; and calculating means to identify similarity between the respective term vectors as a function of the similarity classifier to determine the category similarity measurements. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40)
-
-
41. A computing device as recited in 40, wherein the identifying means further comprise:
-
generating means to generate a set of term clusters from term vectors based on calculated term similarity, the term vectors being generated from search engine results of submitted historical queries, each historical query having a relatively low frequency of occurrence as compared to other query terms in a query log; and evaluating means to evaluate the site contents in view of lean(s) specified by the term clusters to identify at least one or more semantically or contextually related terms, the terms being the one or more other terms.
-
Specification