SYSTEM AND METHOD FOR IDENTIFYING DOCUMENT GENRES
First Claim
1. A computer-implemented method for generating genre models used to identify genres of a document, comprising:
- on a computer system having one or more processors executing one or more programs stored on memory of the computer system;
for each document image in a set of document images that are associated with one or more genres,segmenting the document image into a plurality of tiles, wherein the tiles in the plurality of tiles are sized so that document page features are identifiable; and
computing features of the document image and the plurality of tiles; and
training at least one genre classifier to classify document images as being associated with one or more genres based on the features of the document images in the set of document images, the features of the plurality of tiles of the set of documents images, and the one or more genres associated with each document image in the set of documents images.
2 Assignments
0 Petitions
Accused Products
Abstract
A system, a computer readable storage medium including instructions, and method for generating genre models used to identify genres of a document. For each document image in a set of document images that are associated with one or more genres, the document image is segmented into a plurality of tiles, wherein the tiles in the plurality of tiles are sized so that document page features are identifiable, and features of the document image and the plurality of tiles are computed. At least one genre classifier is trained to classify document images as being associated with one or more genres based on the features of the document images in the set of document images, the features of the plurality of tiles of the set of documents images, and the one or more genres associated with each document image in the set of documents images.
44 Citations
32 Claims
-
1. A computer-implemented method for generating genre models used to identify genres of a document, comprising:
-
on a computer system having one or more processors executing one or more programs stored on memory of the computer system; for each document image in a set of document images that are associated with one or more genres, segmenting the document image into a plurality of tiles, wherein the tiles in the plurality of tiles are sized so that document page features are identifiable; and computing features of the document image and the plurality of tiles; and training at least one genre classifier to classify document images as being associated with one or more genres based on the features of the document images in the set of document images, the features of the plurality of tiles of the set of documents images, and the one or more genres associated with each document image in the set of documents images. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions to:
-
for each document image in a set of document images that are associated with one or more genres, segment the document image into a plurality of tiles, wherein the tiles in the plurality of tiles are sized so that document page features are identifiable; and compute features of the document image and the plurality of tiles; and train at least one genre classifier to classify document images as being associated with one or more genres based on the features of the document images in the set of document images, the features of the plurality of tiles of the set of documents images, and the one or more genres associated with each document image in the set of documents images.
-
-
14. A computer-implemented method for identifying genres of a document, comprising:
on a computer system having one or more processors executing one or more programs stored on memory of the computer system; receiving a document image of the document; segmenting the document image into a plurality of tiles of the document image, wherein the tiles in the plurality of tiles are sized so that document page features are identifiable; computing features of the document image and the plurality of tiles; and identifying one or more genres associated with the document image based on the features of the document image and the features of the plurality of tiles. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
28. A computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions to:
-
receive a document image of the document; segment the document image into a plurality of tiles of the document image, wherein the tiles in the plurality of tiles are sized so that document page features are identifiable; compute features of the document image and the plurality of tiles of the document image; and identify one or more genres associated with the document image based on the features of the document image and the features of the plurality of tiles of the document image.
-
-
29. An imaging system, comprising:
-
one or more processors; memory; and one or more programs stored in the memory, the one or more programs comprising instructions to; receive a document image of a document; segment the document image into a plurality of tiles of the document image, wherein the tiles in the plurality of tiles are sized so that document page features are identifiable; compute features of the document image and the plurality of tiles of the document image; and identify one or more genres associated with the document image based on the features of the document image and the features of the plurality of tiles of the document image. - View Dependent Claims (30, 31, 32)
-
Specification