×

Systems and methods for content extraction from a mark-up language text accessible at an internet domain

  • US 10,061,753 B2
  • Filed: 06/20/2016
  • Issued: 08/28/2018
  • Est. Priority Date: 03/30/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for automatically classifying a markup language text that is accessible at an Internet domain comprising:

  • (a) retrieving from one or more data repositories, data associated with the Internet domain;

    (b) computing a first identifier for the Internet domain based on at least the data associated with the Internet domain and the markup language text;

    (c) computing a measure of similarity of content of the computed first identifier and content of each of a first plurality of previously classified identifiers;

    (d) assigning the markup language text a classification based on the computed measure of similarity between the computed first identifier and each of the first plurality of previously classified identifiers;

    (e) computing a second identifier for the markup language text based on the layout of the markup language text;

    (f) computing a measure of similarity between the second identifier and each of a second plurality of previously classified identifiers; and

    (g) assigning the markup language text a classification based on both the first identifier and the second identifier.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×