Evaluation of web pages
First Claim
Patent Images
1. A method comprising:
- obtaining a plurality of web pages with the same or approximately the same content;
determining, using one or more computer processors, a plurality of generation times, and a plurality of first evaluation values that correspond to a plurality of first ranks of the respective ones of the plurality of web pages;
wherein;
for one of the plurality of web pages, its corresponding generation time includes;
a time at which content of the one of the plurality of web pages was published, a time at which content of the one of the plurality of web pages was released, a time at which the one of the plurality of web pages was crawled, or a time that was mentioned by the content of the one of the plurality of web pages;
identifying an earliest web page among the plurality of web pages that has the earliest generation time; and
determining a second evaluation value of the identified earliest web page based at least in part on the plurality of first evaluation values;
wherein;
the second evaluation value is used to indicate a copy rank of the identified earliest web page among the plurality of web pages.
1 Assignment
0 Petitions
Accused Products
Abstract
A web page evaluation technique includes obtaining a plurality of web pages with the same or approximately the same content. Further, a plurality of generation times and a plurality of first evaluation values that correspond to respective ones of the plurality of web pages are determined. A web page among the plurality of web pages that has the earliest generation time is identified. A second evaluation value of the identified web page is determined according to the plurality of first evaluation values. The second evaluation value can be used to indicate a ranking of the identified web page.
-
Citations
23 Claims
-
1. A method comprising:
-
obtaining a plurality of web pages with the same or approximately the same content; determining, using one or more computer processors, a plurality of generation times, and a plurality of first evaluation values that correspond to a plurality of first ranks of the respective ones of the plurality of web pages;
wherein;for one of the plurality of web pages, its corresponding generation time includes;
a time at which content of the one of the plurality of web pages was published, a time at which content of the one of the plurality of web pages was released, a time at which the one of the plurality of web pages was crawled, or a time that was mentioned by the content of the one of the plurality of web pages;identifying an earliest web page among the plurality of web pages that has the earliest generation time; and determining a second evaluation value of the identified earliest web page based at least in part on the plurality of first evaluation values;
wherein;the second evaluation value is used to indicate a copy rank of the identified earliest web page among the plurality of web pages. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method comprising:
-
obtaining a plurality of web pages with the same or approximately the same content; determining, using one or more computer processors, a plurality of generation times and a plurality of first evaluation values that correspond to respective ones of the plurality of web pages; identifying a web page among the plurality of web pages that has the earliest generation time; determining a second evaluation value of the identified web page according to the plurality of first evaluation values; and identifying the plurality of web pages with the same or approximately the same content by comparing their respective digital fingerprints; wherein identifying the plurality of web pages with the same or approximately the same content comprises; obtaining an intermediate paragraph with longest content in a set of web pages, the intermediate paragraph being neither a first paragraph nor a last paragraph in the respective web page, or obtaining a longest sentence in each of the set of web pages, the longest sentence being neither the first sentence nor the last sentence; generating a digital fingerprint for each web page based on the obtained intermediate paragraph or the longest sentence for each web page; and comparing the digital fingerprint of each web page to identify the plurality of web pages among the set of web pages with the same or approximately the same content.
-
-
13. A system, comprising:
-
one or more computer processors configured to; obtain a plurality of web pages with the same or approximately the same contents; determine a plurality of generation times, and a plurality of first evaluation values that correspond to a plurality of first ranks of the respective ones of the plurality of web pages;
wherein;for one of the plurality of web pages, its corresponding generation time includes;
a time at which content of the one of the plurality of web pages was published, a time at which content of the one of the plurality of web pages was released, a time at which the one of the plurality of web pages was crawled, or a time that was mentioned by the content of the one of the plurality of web pages;identify an earliest web page among the plurality of web pages that has the earliest generation time, and determine a second evaluation value of the identified earliest web page based at least in part on the plurality of first evaluation values;
whereinthe second evaluation value is used to indicate a copy rank of the identified earliest web page among the plurality of web pages; and one or more memories coupled to the one or more computer processors, configured to provide the one or more processors with instructions. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
Specification