Manual-search restriction on documents not having an ASCII index
First Claim
1. A method for finding a handwritten cursive record on a document using automated searching by a computing system and manual searching by an investigator comprising:
- scanning all documents containing the cursive records to provide electronic images of each document to the computing system;
extracting a snippet image of each cursive record on the documents and identifying each snippet image with its document;
automated searching of the snippet images by the computing system to select best matches between a query defined by the investigator and snippet images extracted by the extracting act, the best matches forming a candidate list;
manual review of the candidate list by the investigator to find select snippet images from the candidate list that match the query close enough to warrant manual review of the source document for the select snippet images whereby the number of documents that must be manually reviewed to find the cursive record are reduced;
the act of extracting comprises;
identifying a source document for each snippet image;
cutting the snippet image out of the document image containing the snippet image; and
storing the snippet image and an identification of its source document in snippet description data;
recognizing whether each snippet image matches entries in a draft dictionary and providing for each snippet image that does match an entry, a recognition answer and a similarity value for the recognition answer; and
adding the lists of recognition answers with associated similarity value to the snippet description data for each snippet image.
3 Assignments
0 Petitions
Accused Products
Abstract
Images of handwritten cursive records are extracted, and an automated search on the images of the cursive records is performed based on an ASCII query for a record. A cursive equivalent of the ASCII query is matched to the images of the cursive records, and a similarity value is generated to indicate the extent of match between features of the cursive equivalent of the ASCII query and features of each cursive record. The records are sorted based upon their similarity value determined in the matching process. This provides a candidate list of cursive record images to be manually examined by a user for the purpose of making a final determination as to whether any of the cursive records on the candidate list satisfy the query.
-
Citations
9 Claims
-
1. A method for finding a handwritten cursive record on a document using automated searching by a computing system and manual searching by an investigator comprising:
-
scanning all documents containing the cursive records to provide electronic images of each document to the computing system;
extracting a snippet image of each cursive record on the documents and identifying each snippet image with its document;
automated searching of the snippet images by the computing system to select best matches between a query defined by the investigator and snippet images extracted by the extracting act, the best matches forming a candidate list;
manual review of the candidate list by the investigator to find select snippet images from the candidate list that match the query close enough to warrant manual review of the source document for the select snippet images whereby the number of documents that must be manually reviewed to find the cursive record are reduced;
the act of extracting comprises;
identifying a source document for each snippet image;
cutting the snippet image out of the document image containing the snippet image; and
storing the snippet image and an identification of its source document in snippet description data;
recognizing whether each snippet image matches entries in a draft dictionary and providing for each snippet image that does match an entry, a recognition answer and a similarity value for the recognition answer; and
adding the lists of recognition answers with associated similarity value to the snippet description data for each snippet image. - View Dependent Claims (2, 3, 4)
matching the query from the investigator against the draft dictionary to determine if the query was in the draft dictionary;
where the query is not in the draft dictionary, matching the query from the investigator against each snippet image and generating a similarity value indicative of how well the query matches the snippet image and sorting the query matches by similarity value as recognition answers for each snippet image; and
where the query is in the draft dictionary or after the act of matching the query against each snippet image, matching the query against the recognition answers for each snippet image and generating the candidate list from matches between the query and the recognition answers.
-
-
3. The method of claim 2 wherein the act of matching the query against the snippet image sorts the query match as a recognition answer only if the similarity value exceeds a first threshold.
-
4. The method of claim 3 wherein the act of automated searching further comprises:
-
a second act of matching the query against each snippet image if there are not matches between recognition answers and the query; and
adding query matches, from the second act of matching, to the candidate list if the similarity value for such query matches exceeds a second threshold lower than the first threshold.
-
-
5. Apparatus for restricting manual searching of documents containing handwritten cursive records by electronically searching snippet images of the cursive records with a computing system to provide a candidate list for manual searching by an investigator, the apparatus comprising:
-
a scanner scanning the documents for handwritten cursive records and storing to the computing system electronic images of the handwritten cursive records as snippet images;
a search module in the computing system matching a query to a snippet image and generating a candidate list of answer pairs, each answer pair containing an identifier for the snippet image matched to the query and a similarity value indicative of a degree of match between query and snippet image;
the search module comprises;
a preprocessing module in the computing system extracting snippet images from the electronic images of the handwritten cursive records;
a matching module in the computing system matching the query to each snippet image and generating a similarity value indicative of the degree of match and providing the answer pair for each snippet image;
a sort module in the computing system sorting the answer pairs into an ordered list by similarity value; and
a select module in the computer selecting the answer pairs having a similarity value above a predetermined threshold and providing such answer pairs to the candidate list. - View Dependent Claims (6, 7, 8, 9)
a dictionary of common snippets stored in the computing system;
a recognition module comparing each snippet image to snippets in the dictionary and generating a list of answer pairs for each snippet image, the list containing answer pairs having a similarity value higher than a predetermined threshold.
-
-
7. The apparatus of claim 6 wherein the matching module has first, second and third matching modules in the computing system;
-
the first matching module testing whether the query matches an entry in the dictionary and matching the query against each snippet image to generate a list of answer pairs if the query does not match an entry in the dictionary;
the second matching module testing whether the query matches an answer in any of the lists of answer pairs and creating the candidate list from matches between the query and answers in the lists of answer pairs; and
the third matching module, if the query does not match any answer in the lists of answer pairs, matching the query against each snippet image to create the candidate list of answer pairs.
-
-
8. The apparatus of claim 7 wherein the first matching module comprises in addition a similarity value detecting module detecting if the similarity value of an answer pair exceeds a first threshold and adding the answer pair to the list of answer pairs only if the similarity value exceeds the first threshold.
-
9. The apparatus of claim 8 wherein the third matching module comprises in addition a second similarity value detecting module detecting if the similarity value of an answer pair exceeds a second threshold lower than the first threshold and adding the answer pair to the candidate list only if the similarity value exceeds the second threshold.
Specification