×

METHOD OF ESTABLISHING A PLAIN TEXT DOCUMENT FROM A HTML DOCUMENT

  • US 20100146381A1
  • Filed: 12/01/2009
  • Published: 06/10/2010
  • Est. Priority Date: 12/01/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method of establishing a plain text document from a HTML document, comprising the steps of:

  • (A) acquiring a HTML document defined by HTML elements, each composed of tags and content between the tags;

    (B) pre-processing the HTML document by omitting some of the tags (including the content between those tags), whereby the rest of the HTML document comprises at least one target tag (including content between the target tags);

    (C) using a data structure to store the remaining tags of the pre-processed HTML document;

    (D) grouping the remaining tags (including the content between the remaining tags) stored in the data structure of the pre-processed HTML document into at least one target group according to the target tag(s); and

    (E) identifying the target group(s) most related to a title of the HTML document by comparing correlation(s) between the target group(s) and the title, and establishing a plain text document having the content of the identified target group.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×