Generation of classification data used for classifying documents
First Claim
1. A computer-implemented method for generating classification data which is used for classifying documents, the method comprising:
- reading, in a memory, documents in a form of a spreadsheet;
collecting cell values in each of the documents;
finding, using a processor, in each of common or near cell locations among all or a part of the documents, one or more common cell values among the collected values;
counting, using the processor, for each of the common cell values, a number of the documents having the common cell value;
storing, if the number of the documents is equal to or larger than a predetermined number, the common cell value as a candidate header label in a memory;
calculating, using the processor, a distance between cell locations of the candidate header labels in each of the documents;
choosing, according to the calculated distance, two or more candidate header labels among the candidate header labels for each of the documents; and
storing, in a storage, one or more combinations of the chosen two or more candidate header labels (hereinafter referred to as “
header”
) as the classification data.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are provided for generating classification data which is used for classifying documents. The method includes reading documents in a form of a spreadsheet; collecting cell values in each of the documents; finding one or more common cell values among the collected values; counting, for each of the common cell values, a number of the documents having the common cell value; storing, if the number of the documents is equal to or larger than a predetermined number, the common cell value as a candidate header label in a memory; calculating a distance between cell locations of the candidate header labels in each of the documents; choosing, according to the calculated distance, two or more candidate header labels among the candidate header labels for each of the documents; and storing one or more combinations of the chosen two or more candidate header labels as the classification data.
11 Citations
20 Claims
-
1. A computer-implemented method for generating classification data which is used for classifying documents, the method comprising:
-
reading, in a memory, documents in a form of a spreadsheet; collecting cell values in each of the documents; finding, using a processor, in each of common or near cell locations among all or a part of the documents, one or more common cell values among the collected values; counting, using the processor, for each of the common cell values, a number of the documents having the common cell value; storing, if the number of the documents is equal to or larger than a predetermined number, the common cell value as a candidate header label in a memory; calculating, using the processor, a distance between cell locations of the candidate header labels in each of the documents; choosing, according to the calculated distance, two or more candidate header labels among the candidate header labels for each of the documents; and storing, in a storage, one or more combinations of the chosen two or more candidate header labels (hereinafter referred to as “
header”
) as the classification data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for generating classification data which is used for classifying documents, comprising:
-
a memory; and a processor configured to; read, in the memory, documents in a form of a spreadsheet and collecting cell values in each of the documents; find, in each of common or near cell locations among all or a part of the documents, one or more common cell values among the collected values; count, for each of the common cell values, the number of the documents having the common cell value; store, if the number of the documents is equal to or larger than a predetermined number, the common cell value as a candidate header label in a memory; calculate a distance between cell locations of the candidate header labels in each of the document; choose, according to the calculated distance, two or more candidate header labels among the candidate header labels for each of the documents; and store one or more combinations of the chosen two or more candidate header labels (hereinafter referred to as “
header”
) as the classification data in a storage. - View Dependent Claims (12, 13, 14)
-
-
15. A non-transitory computer readable storage medium comprising a computer readable program for generating classification data which is used for classifying documents, wherein the computer readable program when executed on a computer causes the computer to perform the steps of:
-
reading, in a memory, documents in a form of a spreadsheet and collecting cell values in each of the documents; finding, in each of common or near cell locations among all or a part of the documents, one or more common cell values among the collected values; counting, for each of the common cell values, a number of the documents having the common cell value; storing, if the number of the documents is equal to or larger than a predetermined number, the common cell value as a candidate header label in a memory; calculating a distance between cell locations of the candidate header labels in each of the document; choosing, according to the calculated distance, two or more candidate header labels among the candidate header labels for each of the documents; and storing one or more combinations of the chosen two or more candidate header labels (hereinafter referred to as “
header”
) as the classification data in a storage. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification