×

Apparatus and method for text extraction

  • US 8,924,846 B2
  • Filed: 07/03/2009
  • Issued: 12/30/2014
  • Est. Priority Date: 07/03/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method of determining main text in a mark-up document, comprising:

  • removing, by a system having a processor, first predetermined mark-up tags from the mark-up document, and replacing second predetermined mark-up tags in the mark-up document with separation elements, wherein the removing and the replacing cause the mark-up document to contain text paragraphs and the separation elements without the first and second predetermined mark-up tags;

    determining, by the system, a length of each of the text paragraphs in the mark-up document; and

    determining, by the system, one or more main paragraphs of the mark-up document based upon the lengths of the text paragraphs in the mark-up document.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×