Computer method and apparatus for determining site type of a web site
First Claim
Patent Images
1. A method of selecting site type of a subject Web site comprising the computer-implemented steps of:
- providing a predefined set of potential site types for a subject Web site;
for each potential site type, running tests having test results which enable quantitative evaluation of the potential site type being site type of the subject Web site, the tests including examining at least one of the following;
number of external links, number of internal links, distribution of internal and external links among pages, morphology of site “
tree”
, morphology of the site'"'"'s text content, distribution of multimedia elements in the site;
mathematically combining the test results; and
based on the combined test results, selecting one potential site type from the predetermined set as the site type for the subject Web site.
6 Assignments
0 Petitions
Accused Products
Abstract
Computer method and apparatus identifies content owner of a Web site. A collecting step or element collects candidate names from the subject Web site. For each candidate name, a test module (or testing step) runs tests that provide quantitative/statistical evaluation of the candidate name being the content owner name of the subject Web site. The test results are combined mathematically, such as by a Bayesian network, into an indication of content owner name.
-
Citations
27 Claims
-
1. A method of selecting site type of a subject Web site comprising the computer-implemented steps of:
-
providing a predefined set of potential site types for a subject Web site;
for each potential site type, running tests having test results which enable quantitative evaluation of the potential site type being site type of the subject Web site, the tests including examining at least one of the following;
number of external links, number of internal links, distribution of internal and external links among pages, morphology of site “
tree”
,morphology of the site'"'"'s text content, distribution of multimedia elements in the site;
mathematically combining the test results; and
based on the combined test results, selecting one potential site type from the predetermined set as the site type for the subject Web site. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
News provider (e.g. on-line News, magazine, newspaper, newsletter, etc) Specialized information provider (e.g. weather, traffic, movies, etc) Company, for-profit organization Educational institution (e.g. School, University, College, etc) Medical organization (e.g. Hospital, Clinic, Health center, etc) Law firm Religious organization, church Non-profit organization Professional association Political organization City level local government State level government Government organization Military Retail, catalog Portal, directory, search Fan club of sports, music stars, movie stars Sport team Conference, symposium, workshop Travel agency, airline Sex ISP (Internet Service Provider) Gaming, sports, outdoors Personal Hotel, resort Entertainment (theater, restaurant, bar, club, etc) On-line entertainment (puzzles, jokes, chat rooms, on-line games, etc) Reference (dictionaries, thesaurus, yellow pages, places, quotes, etc) Job listings, classifieds Event (festival, celebration, etc).
-
-
8. A method as claimed in claim 7 wherein the step of running tests includes applying tests as a function of potential site type.
-
9. A method as claimed in claim 1 further comprising the step of, as a function of selected site type for the subject Web site, determining meta structure of the subject Web site.
-
10. A method as claimed in claim 9 wherein if the selected site type is company, the step of determining meta structure includes determining that the subject Web site has Web pages containing at least one of employment opportunities, press releases, general company information, contact information, products and services information, and management personnel information.
-
11. A method as claimed in claim 9 wherein if the selected site type is news, the step of determining meta structure includes determining that the subject Web site has Web pages containing at least one of current news, local news, world news, archived news, business news and technology news.
-
12. A data set formed by the method of claim 1, the data set having indications of plural Web sites and respective site types of the plural Web sites.
-
13. The method of claim 1 further comprising the step of storing indications of the selected site types per respective Web sites.
-
14. In a digital processor, computer apparatus for identifying the site type of a subject Web site comprising:
-
a predefined set of potential site types for Web sites, and a test module utilizing the predefined set and including a plurality of processor-executed tests having test results which enable quantitative evaluation of each potential site type as the site type for the subject Web site, for each potential site type, type test module (i) running at least a subset of the tests, (ii) combining the test results, and (iii) selecting one potential site type as the site type for the subject Web site, The processor-executed tests of the test module including examining at least one of the following;
number of external links, number of internal links, distribution of internal and external links among pages, morphology of site tree, morphology of the site'"'"'s text content, distribution of multimedia elements in the site. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
News provider (e.g. on-line News, magazine, newspaper, newsletter, etc) Specialized information provider (e.g. weather, traffic, movies, etc) Company, for-profit organization Educational institution (e.g. School, University, College, etc) Medical organization (e.g. Hospital, Clinic, Health center, etc) Law firm Religious organization, church Non-profit organization Professional association Political organization City or local government State government Government organization Military Retail, catalog Portal, directory, search Fan club of sports, music stars, movie stars Sport team Conference, symposium, workshop Travel agency, airline Sex ISP (Internet Service Provider) Gaming, sports, outdoors Personal Hotel, resort Entertainment (theater, restaurant, bar, club, etc) On-line entertainment (puzzles, jokes, chat rooms, on-line games, etc) Reference (dictionaries, thesaurus, yellow pages, places, quotes, etc) Job listings, classifieds Event (festival, celebration, etc).
-
-
21. Apparatus as claimed in claim 14 wherein the test module applies only certain ones of the tests depending on the potential site type being tested.
-
22. Apparatus as claimed in claim 14 wherein each potential site type corresponds to a respective meta structure, such that as a function of selected site type for the subject Web site, the test module further determines meta structure of the subject Web site.
-
23. Apparatus as claimed in claim 22 wherein if the selected site type is company, then the test module determines that the meta structure of the subject Web site has Web pages containing employment opportunities, general company information, contact information, products and services information, and management personnel information.
-
24. Apparatus as claimed in claim 22 wherein if the selected site type is news, then the test module determines that the meta structure of the subject Web site has Web pages containing current news, local news, world news, archived news, business news and technology news.
-
25. Apparatus as claimed in claim 14 further comprising storage means for receiving and storing indications of site types, per respective Web sites, as selected by the test module, such that the storage means provides indications of corresponding site types for respective Web sites.
-
26. A method of forming an index of Web sites and corresponding site types comprising the computer implemented steps of:
-
(a) for each of a subject Web site to be indexed, identifying site type by;
providing a predefined set of potential site types;
for each potential site type, running tests having test results which enable quantitative evaluation of the potential site type being site type of the subject Web site, the tests including examining at least one of the following;
number of external links, number of internal links, distribution of internal and external links among pages, morphology of site “
tree”
,morphology of the site'"'"'s text content, distribution of multimedia elements in the site;
mathematically combining the test results; and
based on the combined test results, selecting one potential site type from the predetermined set as the site type for the subject Web site; and
(b) storing in a data set indications of the subject Web sites and respective site types as determined by the step of identifying site type, the data set forming an index of Web sites and corresponding site types. - View Dependent Claims (27)
-
Specification