Method and apparatus for classifying elements of a document
First Claim
Patent Images
1. A computer-implemented method of classifying elements of a document, comprising:
- receiving a data file defining the document, wherein the document has at least one page and a plurality of elements, wherein a type, location, and size of each element is defined by the file;
classifying at least one selected element of a selected page into one of a plurality of categories based on at least one of the type, location, size, and dimension of the selected element, wherein the classifying comprises determining whether a location of a center of an element box containing the selected element is within a pre-determined threshold distance from an edge of the selected page and assigning the selected element to a background element category in response to a determination that the location of the center of the element box containing the selected element is within the pre-determined threshold distance from the edge of the selected page;
generating a classification record comprising an indication of the category into which each of the selected elements is respectively classified; and
storing the classification record in a computer-readable storage medium.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of classifying elements of a document includes receiving a file defining a document having at least one page and a plurality of elements. Each selected element is classified into one of a plurality of categories based on at least one of the element type, location, size (area, height, or width), or recurrence throughout the document.
31 Citations
5 Claims
-
1. A computer-implemented method of classifying elements of a document, comprising:
-
receiving a data file defining the document, wherein the document has at least one page and a plurality of elements, wherein a type, location, and size of each element is defined by the file; classifying at least one selected element of a selected page into one of a plurality of categories based on at least one of the type, location, size, and dimension of the selected element, wherein the classifying comprises determining whether a location of a center of an element box containing the selected element is within a pre-determined threshold distance from an edge of the selected page and assigning the selected element to a background element category in response to a determination that the location of the center of the element box containing the selected element is within the pre-determined threshold distance from the edge of the selected page; generating a classification record comprising an indication of the category into which each of the selected elements is respectively classified; and storing the classification record in a computer-readable storage medium.
-
-
2. A computer-implemented method of classifying elements of a document, comprising:
-
receiving a data file defining a respective arrangement of elements on each of a plurality of pages of the document, wherein the data file associates respective attribute values with each of the elements; classifying a selected one of the elements on a selected one of the pages into one of a content category and a background category based on frequency of occurrence of similar elements on one or more pages of the document other than the selected page, wherein the classifying comprises identifying the similar elements on the other pages by comparing one or more attributes of the selected element with corresponding ones of the attributes of the elements on the other pages, and the classifying comprises assigning the selected element to the background category based at least in part on a determination that the frequency of occurrence of similar elements on the other pages exceeds a threshold frequency; generating a classification record comprising an indication of the category into which the selected element is respectively classified; and storing the classification record in a computer-readable storage medium. - View Dependent Claims (3)
-
-
4. A computer-implemented method of classifying elements of a document, comprising:
-
receiving a data file defining a respective arrangement of elements on each of a plurality of pages of the document, wherein the data file associates respective attribute values with each of the elements; determining for each of the elements a respective content type value, at least one respective size value, and at least one respective page location value from the associated attribute values; classifying each of the elements into one of a content category and a background category based on frequency of occurrence of similar elements on one or more pages of the document other than the respective page on which the element being classified is arranged, the respective content type value, and at least one of the at least one respective size value and the at least one respective page location value, wherein the classifying comprises classifying each element of a first subset (C1) of the elements into a selected one of the content category and the background category based on the respective content type value and at least one of the at least one respective size value and the at least one respective page location value, classifying each element of a second subset (C2) of the elements into the selected category based on the frequency of occurrence of the similar elements on the other pages of the document, and identifying a third subset (C3) of the elements equal to one of a union of C1 and C2 and an intersection of C1 and C2; generating a classification record comprising for each of the elements an indication of the category into which the element is respectively classified; and storing the classification record in a computer-readable storage medium. - View Dependent Claims (5)
-
Specification