×

SYSTEMS AND METHODS FOR CONTENT EXTRACTION FROM A MARK-UP LANGUAGE TEXT ACCESSIBLE AT AN INTERNET DOMAIN

  • US 20170031883A1
  • Filed: 06/20/2016
  • Published: 02/02/2017
  • Est. Priority Date: 03/30/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for automatically classifying a markup language text that is accessible at an Internet domain comprising:

  • (a) retrieving from one or more data repositories, data associated with the Internet domain;

    (b) computing a first identifier for the Internet domain based on at least the data associated with the Internet domain and the markup language text;

    (c) computing a measure of similarity between the computed first identifier and each of a first plurality of previously classified identifiers; and

    (d) assigning the markup language text a classification based on the computed measure of similarity between the computed first identifier and each of the first plurality of previously classified identifiers.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×