Method and system for classifying documents that have different scales
First Claim
1. A system for classifying documents that have different scales, the system comprising:
- one or more processors; and
a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to;
count instances for each character size in a first document and instances for each character size in a second document;
select a first plurality of character sizes for the first document and a second plurality of character sizes for the second document, based on a corresponding count of instances associated with each corresponding character size;
calculate a plurality of scales, wherein each scale of the plurality of scales is based on a corresponding ratio of a corresponding one of the first plurality of character sizes relative to a corresponding one of the second plurality of character sizes;
calculate a plurality of scale products based on each corresponding count of instances for each character size range associated with the first plurality of character sizes multiplied by each corresponding count of instances for each corresponding character size range associated with the second plurality of character sizes, wherein the corresponding character size range is based on a corresponding one of the plurality of scales;
calculate a plurality of scale scores based on summing each of the plurality of scale products associated with each corresponding one of the plurality of scales;
select a scale of the plurality of scales based a highest one of the plurality of scale scores associated with a corresponding one the plurality of scales;
determine whether the second document is in a class associated with the first document based on a comparison of location information associated with the first document and location information associated with the second document, wherein the location information associated with second document is based on the scale; and
classify the second document in the class associated with the first document in response to a determination that the second document is in the class associated with the first document.
12 Assignments
0 Petitions
Accused Products
Abstract
Classifying documents that have different scales is described. Instances are counted for each character size in documents. Character sizes for the first document and the second document are selected based on the instance count for each character size. Scales are calculated based on ratios of each first character size relative to each second character size. Scale products are calculated based on each instance count for each character size range for the first character sizes multiplied by each instance count for each corresponding character size range for the second character sizes. The corresponding character size range is based on a corresponding scale. Scale scores are calculated based on summing each of the scale products for each scale. A scale is selected based a highest scale score. The second document may be classified with the first document based on a comparison of first document location information and second document location information. The second document location information is based on the scale.
-
Citations
20 Claims
-
1. A system for classifying documents that have different scales, the system comprising:
-
one or more processors; and a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to; count instances for each character size in a first document and instances for each character size in a second document; select a first plurality of character sizes for the first document and a second plurality of character sizes for the second document, based on a corresponding count of instances associated with each corresponding character size; calculate a plurality of scales, wherein each scale of the plurality of scales is based on a corresponding ratio of a corresponding one of the first plurality of character sizes relative to a corresponding one of the second plurality of character sizes; calculate a plurality of scale products based on each corresponding count of instances for each character size range associated with the first plurality of character sizes multiplied by each corresponding count of instances for each corresponding character size range associated with the second plurality of character sizes, wherein the corresponding character size range is based on a corresponding one of the plurality of scales; calculate a plurality of scale scores based on summing each of the plurality of scale products associated with each corresponding one of the plurality of scales; select a scale of the plurality of scales based a highest one of the plurality of scale scores associated with a corresponding one the plurality of scales; determine whether the second document is in a class associated with the first document based on a comparison of location information associated with the first document and location information associated with the second document, wherein the location information associated with second document is based on the scale; and classify the second document in the class associated with the first document in response to a determination that the second document is in the class associated with the first document. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method for classifying documents that have different scales, the method comprising:
-
counting, by a server computer, instances for each character size in a first document and instances for each character size in a second document; selecting, by the server computer, a first plurality of character sizes for the first document and a second plurality of character sizes for the second document, based on a corresponding count of instances associated with each corresponding character size; calculating, by the server computer, a plurality of scales, wherein each scale of the plurality of scales is based on a corresponding ratio of a corresponding one of the first plurality of character sizes relative to a corresponding one of the second plurality of character sizes; calculating, by the server computer, a plurality of scale products based on each corresponding count of instances for each character size range associated with the first plurality of character sizes multiplied by each corresponding count of instances for each corresponding character size range associated with the second plurality of character sizes, wherein the corresponding character size range is based on a corresponding one of the plurality of scales; calculating, by the server computer, a plurality of scale scores based on summing each of the plurality of scale products associated with each corresponding one of the plurality of scales; selecting, by the server computer, a scale of the plurality of scales based a highest one of the plurality of scale scores associated with a corresponding one the plurality of scales; determining, by the server computer, whether the second document is in a class associated with the first document based on a comparison of location information associated with the first document and location information associated with the second document, wherein the location information associated with second document is based on the scale; and classifying, by the server computer, the second document in the class associated with the first document in response to a determination that the second document is in the class associated with the first document. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein to be executed by one or more processors, the program code including instructions to:
-
count instances for each character size in a first document and instances for each character size in a second document; select a first plurality of character sizes for the first document and a second plurality of character sizes for the second document, based on a corresponding count of instances associated with each corresponding character size; calculate a plurality of scales, wherein each scale of the plurality of scales is based on a corresponding ratio of a corresponding one of the first plurality of character sizes relative to a corresponding one of the second plurality of character sizes; calculate a plurality of scale products based on each corresponding count of instances for each character size range associated with the first plurality of character sizes multiplied by each corresponding count of instances for each corresponding character size range associated with the second plurality of character sizes, wherein the corresponding character size range is based on a corresponding one of the plurality of scales; calculate a plurality of scale scores based on summing each of the plurality of scale products associated with each corresponding one of the plurality of scales; select a scale of the plurality of scales based a highest one of the plurality of scale scores associated with a corresponding one the plurality of scales; determine whether the second document is in a class associated with the first document based on a comparison of location information associated with the first document and location information associated with the second document, wherein the location information associated with second document is based on the scale; and classify the second document in the class associated with the first document in response to a determination that the second document is in the class associated with the first document. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification