×

Computer method and apparatus for determining content types of web pages

  • US 7,356,761 B2
  • Filed: 01/24/2001
  • Issued: 04/08/2008
  • Est. Priority Date: 07/31/2000
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-implemented method of determining content type of contents of a subject Web page, comprising the steps of:

  • providing a predefined set of potential content types, content types being exclusive of indicating formal language of the content;

    for each potential content type, preparing a distinguishing series of tests, the distinguishing series of tests includes;

    i) at least one binary test, andii) at least one non-binary test,the at least one binary test and the at least one non-binary test further including at least one test (a) examining syntax or grammar;

    or (b) examining page format or style other than position of data or a keyword in the subject Web page;

    for each potential content type, running the distinguishing series of tests of tests having test results which enable quantitative evaluation of at least some contents of the subject Web page being of the potential content type,mathematically combining the probabilities from all possible combinations of the test results and hypothesis values with respect to content of Web pages of determined content type with the test results of the subject Web page of undetermined content type using at least one Bayesian network; and

    based on the combined test results, assigning a respective probability, for each potential content type, that some contents of that type exists on the subject Web page, and indicating content type, said indicating being exclusive of indicating language in which content is written.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×