Method and apparatus for content identification and categorization of textual data
First Claim
Patent Images
1. A method of processing textual data comprising:
- performing a Burrows-Wheeler transform on the textual data to produce transformed textual data;
dividing the transformed textual data into a set of one or more intervals;
mapping the transformed textual data of the set of intervals of transformed textual data thereby producing a first pattern sheet, the first pattern sheet composed of a set of at least one entries; and
comparing the first pattern sheet to a second pattern sheet.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and an apparatus for content identification and categorization of textual data is disclosed. Using the Burrows-Wheeler transform in conjunction with mapping techniques and statistical comparison, useful information can be extracted from textual documents. This information can be used to categorize, authenticate, and compare such documents, thereby leading to automated searching of databases of documents.
-
Citations
24 Claims
-
1. A method of processing textual data comprising:
-
performing a Burrows-Wheeler transform on the textual data to produce transformed textual data;
dividing the transformed textual data into a set of one or more intervals;
mapping the transformed textual data of the set of intervals of transformed textual data thereby producing a first pattern sheet, the first pattern sheet composed of a set of at least one entries; and
comparing the first pattern sheet to a second pattern sheet. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
the method further comprises normalizing the first pattern sheet for purposes of comparison to the second pattern sheet, thereby producing a first pattern map; and
the step of comparing the first pattern sheet to a second pattern sheet comprises comparing the first pattern map to a second pattern map, the second pattern map derived from the second pattern sheet.
-
-
3. The method of claim 2 wherein normalizing the first pattern sheet comprises multiplying each entry of the first pattern sheet by a scale factor.
-
4. The method of claim 2 further comprising:
plotting a result of comparing the first pattern sheet to a second pattern sheet as a three dimensional graph.
-
5. The method of claim 2 further comprising:
summing the entries of a third pattern map to produce a number, the third pattern map derived from comparing the first pattern sheet to a second pattern sheet.
-
6. The method of claim 1 wherein the second pattern sheet is derived from documents selected from the group consisting of documents believed to have been written by a single individual, documents representative of a specific language, documents representative of a specific type of document, and documents written at a specific time.
-
7. The method of claim 1 wherein the intervals of the set of one or more intervals overlap.
-
8. The method of claim 1 further comprising:
-
normalizing the first pattern sheet to produce a first pattern map; and
updating a class map by combining the contents of the first pattern map with the contents of the class map.
-
-
9. The method of claim 8 wherein normalizing the first pattern sheet to produce a first pattern map comprises multiplying each entry in the first pattern sheet by a scale factor, the scale factor derived from the first pattern sheet and the class map.
-
10. The method of claim 8 wherein normalizing the first pattern sheet to produce a first pattern map comprises multiplying selected entries in the first pattern sheet by a first scale factor, the first scale factor derived from the first pattern sheet and the class map, and multiplying unselected entries by a second scale factor, the second scale factor also derived from the first pattern sheet and the class map.
-
11. The method of claim 1 wherein comparing the first pattern sheet to a second pattern sheet comprises comparing the magnitude of each entry in the first pattern sheet relative to the total of all entries in the first pattern sheet to the magnitude of a corresponding entry in the second pattern sheet relative to the total of all entries in the second pattern sheet.
-
12. A method of processing textual data comprising:
-
transforming the textual data;
mapping the textual data to produce a first pattern sheet; and
generating a result, the result reflecting a comparison of the first pattern sheet and a second pattern sheet. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A machine readable medium embodying instructions, the instructions when executed by a machine causing the machine to perform the method comprising:
-
performing a Burrows-Wheeler transform on the textual data to produce transformed textual data;
dividing the transformed textual data into a set of one or more intervals;
mapping the transformed textual data of the set of intervals of transformed textual data thereby producing a first pattern sheet, the first pattern sheet composed of a set of at least one entries; and
comparing the first pattern sheet to a second pattern sheet. - View Dependent Claims (18, 19, 20, 21)
the method further comprises normalizing the first pattern sheet for purposes of comparison to the second pattern sheet, thereby producing a first pattern map; and
the step of comparing the first pattern sheet to a second pattern sheet comprises comparing the first pattern map to a second pattern map, the second pattern map derived from the second pattern sheet.
-
-
19. The machine readable medium of claim 17 wherein the second pattern sheet is derived from documents selected from the group consisting of documents believed to have been written by a single individual, documents representative of a specific language, documents representative of a specific type of document, and documents written at a specific time.
-
20. The machine readable medium of claim 17 wherein the intervals of the set of one or more intervals overlap.
-
21. The machine readable medium of claim 17 wherein the method further comprises:
-
normalizing the first pattern sheet to produce a first pattern map; and
updating a class map by combining the contents of the first pattern map with the contents of the class map.
-
- 22. A system comprising a processor and memory, said processor configured to perform a Burrows-Wheeler transform on the textual data to produce transformed textual data, divide the transformed textual data into a set of one or more intervals,map the transformed textual data of the set of intervals of transformed textual data thereby producing a first pattern sheet, the first pattern sheet composed of a set of at least one entries, and compare the first pattern sheet to a second pattern sheet.
Specification