Extracting information from symbolically compressed document images
First Claim
Patent Images
1. A method comprising:
- representing an input document image with a sequence of template identifiers;
replacing the template identifiers with alphabet characters according to language statistics to generate a text string representing the input document image;
searching, among a plurality of documents in a database, for at least one of the plurality of documents that matches the input document based on the text string; and
examining whether the at least one matched document satisfies a predetermined security criteria based on an attribute associated with the at least one matched document, to determine whether an operation on the input document is allowed.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.
-
Citations
42 Claims
-
1. A method comprising:
-
representing an input document image with a sequence of template identifiers; replacing the template identifiers with alphabet characters according to language statistics to generate a text string representing the input document image; searching, among a plurality of documents in a database, for at least one of the plurality of documents that matches the input document based on the text string; and examining whether the at least one matched document satisfies a predetermined security criteria based on an attribute associated with the at least one matched document, to determine whether an operation on the input document is allowed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A document processing system comprising:
-
a deciphering module to generate a first text string on a sequence of template identifiers in a first document and to generate a second text string based on a sequence of template identifiers in a second document; a comparison module to generate a measure of similarity between the first and the second documents based on the first and second text strings to determine whether the first and second documents are matched; and a security module to examine whether the second document satisfies a predetermined security criteria based on an attribute associated with the second document to determine whether an operation on the first document is allowed. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 39, 40, 41, 42)
-
-
29. An article of manufacture including one or more computer-readable storage media that embody a program of instructions, when executed by one or more processors in the processing system, causes the one or more processors to performing a method, the method comprising:
-
generating a text string from an input document image represented by a sequence of template identifiers; replacing the template identifiers with alphabet characters according to language statistics to generate a text string representing the input document image; searching, among a plurality of documents in a database, for at least one of the plurality of documents that matches the input document based on the text string; and examining whether the at least one matched document satisfies a predetermined security criteria based on an attribute associated with the at least one matched document, to determine whether an operation on the input document is allowed. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38)
-
Specification