Method and system for determining page numbers of page images
First Claim
1. A computer-implemented method for numbering page images, the method comprising:
- obtaining images of pages of content in a document in an order in which the pages appear in the document;
extracting candidate page numbers from the content of the page images;
saving extracted candidate page numbers in a plurality of sequences of numbers wherein for each sequence a candidate page number extracted from a page image is one different than a candidate page number extracted from an adjacent page image in the order in which the pages appear in the document;
merging a first sequence and a second sequence of the plurality of sequences beginning with a more reliable sequence, wherein merging includes comparing the first sequence with the second sequence to determine whether a difference between a highest candidate page number in the first sequence and the lowest candidate page number in the second sequence matches a number of page images spanned by a gap between the first and the second sequences, and if so, assigning consecutive page numbers to the page images in the gap;
determining whether to abandon numbering of page images based on a variable completeness threshold determined based on a method of extracting candidate page numbers; and
labeling the page images in accordance with the extracted and assigned page numbers in the merged sequence.
0 Assignments
0 Petitions
Accused Products
Abstract
Page numbering of images of pages in a document includes extracting all numbers that are exactly one different than a number found on an adjacent page, and grouping the extracted numbers into a set of sequences that describe the candidate page numbers in the book. The sequences most likely to contain candidates that represent the actual page numbers are determined by merging the most reliable sequences together to bridge gaps between the sequences, and identifying those gaps where the page numbers have been intentionally omitted. Page images are labeled with numbers that are determined to be most likely to represent the actual page number. Page numbering is abandoned when insufficient numbers of pages numbers are able to be extracted or assigned relative to the total number of pages in the document.
20 Citations
35 Claims
-
1. A computer-implemented method for numbering page images, the method comprising:
-
obtaining images of pages of content in a document in an order in which the pages appear in the document; extracting candidate page numbers from the content of the page images; saving extracted candidate page numbers in a plurality of sequences of numbers wherein for each sequence a candidate page number extracted from a page image is one different than a candidate page number extracted from an adjacent page image in the order in which the pages appear in the document; merging a first sequence and a second sequence of the plurality of sequences beginning with a more reliable sequence, wherein merging includes comparing the first sequence with the second sequence to determine whether a difference between a highest candidate page number in the first sequence and the lowest candidate page number in the second sequence matches a number of page images spanned by a gap between the first and the second sequences, and if so, assigning consecutive page numbers to the page images in the gap; determining whether to abandon numbering of page images based on a variable completeness threshold determined based on a method of extracting candidate page numbers; and labeling the page images in accordance with the extracted and assigned page numbers in the merged sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-implemented system to facilitate numbering images of pages of a document, comprising:
-
a page image database operable to store page images organized according to an order in which pages appear in the document; a memory operable to store candidate page numbers extracted from the page images; a sequence merger component, coupled with the memory and the page image database, operable to merge two sequences of extracted candidate page numbers when a beginning candidate page number of a first sequence and an ending candidate page number of a second sequence differ by an amount that equals a number of page images in a gap between the first sequence and second sequence; an insert identifier component, coupled with the sequence merger component, operable to identify omitted page numbers forming the gap between the first sequence and second sequence; an assignment component, coupled with the insert identifier component, operable to assign consecutive page numbers to page images spanned by the gap, and labeling the page images in accordance with the extracted and assigned page numbers; and a completion threshold component, coupled with the assignment component, operable to determine and apply a completeness threshold to determine whether to abandon the numbering of page images. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A non-transitory computer-readable medium having encoded thereon instructions to number images of pages of a document, wherein the instructions, when executed by a computing apparatus, cause the computing apparatus to:
-
obtain images of pages in an order in which the pages occur in a document; extract a number from at least one of the page images when the number is one different than a number extracted from an image of an adjacent page in the order in which the page occurs in the document; store the extracted numbers as sequences of consecutive numbers that correspond to the order in which the pages occur in the document; assign missing numbers to fill in gaps between the sequences when a count of the pages represented by a gap matches a count of the missing numbers that are needed to fill the gap; assign missing numbers to partially fill in gaps between the sequences when a count of the pages represented by the gap does not match the count of the missing numbers that are needed to fill in the gap, but differs by a substantially small amount; determine whether to abandon numbering of page images based on a variable completeness threshold; and label the page images with the extracted and assigned numbers. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35)
-
Specification