Method and apparatus for retrieving text using document signatures
First Claim
1. A method for retrieving documents from a database, comprising:
- identifying markers in a first text passage, representing the number of non-marker characters between said identified markers in said first text passage to generate a first marker sequence;
identifying markers in a plurality of documents from said database;
representing the number of non-marker characters between identified markers in said plurality of database documents to generate a plurality of database marker sequences; and
evaluating said first marker sequence against said plurality of database marker sequences to retrieve documents from said database.
4 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for retrieving similar or identical textual passages among different documents is disclosed. Normal discourse structures along with textual content attributes are used to encode a known passage with "marker sequences" that give a characterizing "signature" to the passage. The encoded known passage is then evaluated against similarly encoded passages appearing in a database of documents. If it is determined that there is a possible match between the encoded known passage and an encoded passage in a database document, a sequential string search is performed to determine whether the two passages are likely to be similar or identical. If the sequential string search records a probable match between the known passage and the database passage, the database passage is displayed for further review.
-
Citations
10 Claims
-
1. A method for retrieving documents from a database, comprising:
-
identifying markers in a first text passage, representing the number of non-marker characters between said identified markers in said first text passage to generate a first marker sequence; identifying markers in a plurality of documents from said database; representing the number of non-marker characters between identified markers in said plurality of database documents to generate a plurality of database marker sequences; and evaluating said first marker sequence against said plurality of database marker sequences to retrieve documents from said database. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus for retrieving documents from a database, comprising:
-
an input device coupled to a processing unit for selecting a first text passage; a storage unit coupled to said processor unit for storing said database documents; wherein said processor unit identifies markers in said first text passage and represents the number of non-marker characters between markers in said first text passage to generate a first marker sequence; wherein said processor unit identifies markers in a plurality of documents from said database and represents the number of non-marker characters between markers in said plurality of documents from said database to generate a plurality of database marker sequences; and wherein said processor evaluates said first marker sequence against said plurality of database marker sequences to retrieve documents from said database. - View Dependent Claims (7, 8, 9, 10)
-
Specification