×

System and method for identifying document genres

  • US 8,260,062 B2
  • Filed: 05/07/2009
  • Issued: 09/04/2012
  • Est. Priority Date: 05/07/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for generating genre models used to identify genres of a document, comprising:

  • on a computer system having one or more processors executing one or more programs stored on memory of the computer system;

    for each document image in a set of document images that are associated with one or more genres,segmenting the document image into a plurality of tiles, wherein the tiles in the plurality of tiles are sized so that document page features are identifiable; and

    computing features of the document image and the plurality of tiles; and

    training at least one genre classifier to classify document images as being associated with one or more genres based on the features of the document images in the set of document images, the features of the plurality of tiles of the set of documents images, and the one or more genres associated with each document image in the set of documents images, wherein training the at least one genre classifier to classify document images as being associated with a respective genre in the one or more genres includes;

    training a first genre classifier corresponding to the respective genre based on the features of a first subset of the set of document images and the features of the plurality of tiles associated with the first subset of the set of document images;

    tuning parameters of the first genre classifier using a second subset of the set of document images, wherein the first subset and the second subset of the set of document images are mutually-exclusive sets of document images;

    training a second genre classifier corresponding to the respective genre based on the features of a second subset of the set of document images and the features of the plurality of tiles associated with the second subset of the set of document images; and

    tuning parameters of the second genre classifier using the first subset of the set of document images.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×