Data embedding and extraction techniques for documents
First Claim
1. A method for embedding a message in a text-containing document, comprising the steps of:
- obtaining a pixel representation of the document;
identifying text pixels of the document;
determining each text line of the document;
partitioning each determined text line into a plurality of blocks;
identifying each block as valid if that block contains at least a predetermined percentage of text pixels and that block is not an immediate neighbor of a block already identified as valid; and
embedding a binary element in each valid block by labeling text pixels within that block with a first color or a second color to embed the message in the document.
2 Assignments
0 Petitions
Accused Products
Abstract
Improved data embedding and extracting techniques provide a way to embed and extract messages in text sections of documents during copying. Extracted text pixels are grouped together to form text lines of the document. From this formation, a document layout is constructed that is used to embed the message in the text pixels. Each text line is partitioned into blocks, and those of which contain a certain threshold percentage of text pixels are identified as valid. Each valid block is used to embed one bit of information by labeling text pixels of that block with a certain predetermined color. The embedding of bits in valid blocks in a particular text line is done in a column-wise raster order. Only one message character (which may be comprised of multiple bits) is embedded in a particular text line, although that character may be embedded multiple times in the same line if there are enough valid blocks. Extracting a message so embedded involves forming a first representation of the document in which pixels are classified to locate blocks of pixels in which data is embedded, forming a second representation of the document to extract text lines and identify text pixels. These two representations are compared to identify clusters of color-labeled pixels in each text line to determine the location of embedded bits of the message. The clusters in each text line are sorted in accordance with the predetermined embedding order and converted into a sequence of bits which are decoded to determine the message character embedded in each text line.
48 Citations
40 Claims
-
1. A method for embedding a message in a text-containing document, comprising the steps of:
-
obtaining a pixel representation of the document;
identifying text pixels of the document;
determining each text line of the document;
partitioning each determined text line into a plurality of blocks;
identifying each block as valid if that block contains at least a predetermined percentage of text pixels and that block is not an immediate neighbor of a block already identified as valid; and
embedding a binary element in each valid block by labeling text pixels within that block with a first color or a second color to embed the message in the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for extracting a message embedded in text of a document, comprising the steps of:
-
obtaining a pixel representation of the document;
forming a first representation of the document in which pixels are classified to locate blocks of pixels in which data is embedded;
forming a second representation of the document to extract text lines and identify text pixels;
comparing the second representation with the first representation to identify clusters of first and second colored pixels in each text line to determine the location of embedded binary elements of the message;
sorting the identified first and second colored clusters in each text line in accordance with a predetermined embedding order;
converting the sorted first and second colored clusters in each text line into a sequence of binary elements; and
decoding the sequence of binary elements in each text line to determine an embedded character of the message. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. An apparatus for embedding a message in a text-containing document, the apparatus comprising:
-
a scanner that outputs a pixel representation of the document;
a text pixel identifying circuit, in communication with the scanner;
a text line determining circuit, in communication with the text pixel identifying circuit;
a block partitioning circuit in communication with the text pixel identifying circuit and the text line determining circuit;
a valid block identifying circuit, in communication with the text pixel identifying circuit and the block partitioning circuit, the valid block identifying circuit being configured to identify each block as valid if that block contains at least a predetermined percentage of text pixels and that block is not an immediate neighbor of a block already identified as valid; and
a binary element embedding circuit in communication with the text pixel identifying circuit and the valid block identifying circuit. - View Dependent Claims (16, 17)
-
-
18. An apparatus for extracting a message embedded in text of a document, the apparatus comprising:
-
a scanner that outputs a pixel representation of the document;
a first circuit, in communication with the scanner, that forms a first representation of the document in which pixels are classified to locate blocks of pixels in which data is embedded;
a second circuit, in communication with the first circuit, that forms a second representation of the document to extract text lines and identify text pixels;
a comparator circuit, in communication with the first and second circuits, that compares the second representation with the first representation to identify clusters of first and second colored pixels in each text line to determine the location of embedded binary elements of the message; and
an extracting circuit, in communication with the comparator circuit that sorts the identified first and second colored clusters in each text line in accordance with a predetermined embedding order, converts the sorted first and second colored clusters in each text line into a sequence of binary elements, and decodes the sequence of binary elements in each text line to determine an embedded character of the message. - View Dependent Claims (19, 20)
-
-
21. A machine-readable medium embodying a program of instructions for causing a machine to perform a method of embedding a message in a text-containing document, the program of instructions comprising instructions for:
-
obtaining a pixel representation of the document;
identifying text pixels of the document;
determining each text line of the document;
partitioning each determined text line into a plurality of blocks;
identifying each block as valid if that block contains at least a predetermined percentage of text pixels and that block is not an immediate neighbor of a block already identified as valid; and
embedding a binary element in each valid block by labeling text pixels within that block with a first color or a second color to embed the message in the document. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
-
-
29. A machine-readable medium embodying a program of instructions for causing a machine to perform a method of extracting a message embedded in text of document, the program of instructions comprising instructions for:
-
obtaining a pixel representation of the document;
forming a first representation of the document in which pixels are classified to locate blocks of pixels in which data is embedded;
forming a second representation of the document to extract text lines and identify text pixels;
comparing the second representation with the first representation to identify clusters of first and second colored pixels in each text line to determine the location of embedded binary elements of the message;
sorting the identified first and second colored clusters in each text line in accordance with a predetermined embedding order;
converting the sorted first and second colored clusters in each text line into a sequence of binary elements; and
decoding the sequence of binary elements in each text line to determine an embedded character of the message. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. An apparatus for embedding a message in a text-containing document, the apparatus comprising:
-
means for outputting a pixel representation of the document;
means, in communication with the outputting means, for identifying text pixels of the document;
means, in communication with the identifying means, for determining each text line of the document;
means, in communication with the identifying means and the determining means, for partitioning each determined text line into a plurality of blocks;
means, in communication with the identifying means and the partitioning means, for classifying each block as valid if that block contains at least a predetermined percentage of text pixels and if that block is not an immediate neighbor of a block already identified as valid; and
means, in communication with the identifying means and the classifying means, for embedding a binary element in each valid block by labeling text pixels within that valid block with a first color or a second color to embed the message in the document. - View Dependent Claims (36, 37)
-
-
38. An apparatus for extracting a message embedded in text of a document, the apparatus comprising:
-
means for outputting a pixel representation of the document;
means, in communication with the scanner, for forming a first representation of the document in which pixels are classified to locate blocks of pixels in which data is embedded;
means, in communication with the first representation forming means, for forming a second representation of the document to extract text lines and identify text pixels;
means, in communication with the first and second representation forming means, for comparing the second representation with the first representation to identify clusters of first and second colored pixels in each text line to determine the location of embedded binary elements of the message; and
extracting means, in communication with the comparing means, for sorting the identified first and second colored clusters in each text line in accordance with a predetermined embedding order, converting the sorted first and second colored clusters in each text line into a sequence of binary elements, and decoding the sequence of binary elements in each text line to determine an embedded character of the message. - View Dependent Claims (39, 40)
-
Specification