×

Method, apparatus, and computer program product for classification of documents

  • US 10,339,375 B2
  • Filed: 02/27/2017
  • Issued: 07/02/2019
  • Est. Priority Date: 08/16/2012
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for identifying content to represent web pages and creating thumbnails from the content, the computer-implemented method comprising:

  • retrieving a web document using a uniform resource locator (URL) contained in a dequeued work item, the dequeued work item parsed using a markup language parser;

    determining, from the web document, candidate images for thumbnail creation,wherein the determination of the candidate images for thumbnail creation comprises at least;

    identifying a desired thumbnail size and aspect ratio;

    extracting data content from the parsed markup to determine one or more candidate images for thumbnail creation; and

    utilizing one or more heuristics to discard candidate images having predefined undesirable characteristics, including at least discarding, from among the extracted one or more images, any images failing to meet the desired thumbnail size and aspect ratio; and

    creating a thumbnail image, wherein generation of the thumbnail image comprises at least;

    cropping a chosen image, the chosen image selected from among the candidate images, to each of one or more predefined sizes and encoding the chosen image with predefined compression settings, each in accordance with an environment in which the thumbnails will be used.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×