Method and apparatus for retrieving text using document signatures
First Claim
1. A method for retrieving text, comprising:
- identifying markers in a first text passage;
representing the number of non-marker characters between said identified markers in said first text passage to generate a first marker sequence;
identifying markers in a second text passage;
representing the number of non-marker characters between said identified markers in said second text passage to generate a second marker sequence; and
comparing said first marker sequence to said second marker sequence.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for retrieving similar or identical textual passages among different documents is disclosed. Normal discourse structures along with textual content attributes are used to encode a known passage with “marker sequences” that give a characterizing “signature” to the passage. The encoded known passage is then evaluated against similarly encoded passages appearing in a database of documents. If it is determined that there is a possible match between the encoded known passage and an encoded passage in a database document, a sequential string search is performed to determine whether the two passages are likely to be similar or identical. If the sequential string search records a probable match between the known passage and the database passage, the database passage is displayed for further review.
-
Citations
20 Claims
-
1. A method for retrieving text, comprising:
-
identifying markers in a first text passage;
representing the number of non-marker characters between said identified markers in said first text passage to generate a first marker sequence;
identifying markers in a second text passage;
representing the number of non-marker characters between said identified markers in said second text passage to generate a second marker sequence; and
comparing said first marker sequence to said second marker sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for retrieving text, comprising:
-
a processing unit coupled to an input device and a storage unit for selecting a first text passage, wherein said processor unit identifies markers in said first text passage and represents the number of non-marker characters between said markers in said first text passage to generate a first marker sequence; and
whereinsaid processing unit identifies markers in a second text passage and represents the number of non-marker characters between markers in said second text passage to generate a second marker sequence; and
whereinsaid processing unit compares said first marker sequence to said second marker sequence to retrieve text. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification