System and methods for data indexing and processing
First Claim
1. A method for indexing a document file comprising a plurality of characters arranged into an array of strings, the method comprising:
- filtering the array of strings to obtain a set of strings;
for each string in the set of strings, creating a first sequence list comprising a substring starting at a first character position in the string and a second sequence list comprising a substring starting at a second character position in the string;
generating a comparison reference database by querying the first and second sequence lists against a reference database, the reference database comprise a plurality of records and each record comprises a plurality of data fields;
for each record in the comparison reference database, generating a first set of substrings based upon a first set of data fields from the plurality of data fields in the record;
comparing the first set of substrings against the set of strings to identify a longest substring match, if any, for each of the first set of data fields from the record;
filtering the comparison reference database to create a second comparison reference database by selecting each record that has a longest substring match for one or more data fields from the first set of data fields. assigning a point value for each match found in a record and summing the point value for the record; and
responsive to a record having a total point value exceeding a threshold match value, associating the document file with that record.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed that allow for indexing, processing, or both of information from physical media or electronic media, which may be received from a plurality of sources. In embodiments, a document file may be matched using pattern matching methods and may include comparisons with a comparison reference database to improve or accelerate the indexing process. In embodiments, information may be presented to a user as potential matches thereby improving manual indexing processes. In embodiments, one or more additional actions may occur as part of the processing, including without limitation, association additional data with a document file, making observations from the document file, notifying individuals, creating composite messages, and billing events. In an embodiment, data from a document file may be associated with a key word, key phrase, or word frequency value that enables adaptive learning so that unindexed data may be automatically indexed based on user interaction history.
-
Citations
22 Claims
-
1. A method for indexing a document file comprising a plurality of characters arranged into an array of strings, the method comprising:
-
filtering the array of strings to obtain a set of strings;
for each string in the set of strings, creating a first sequence list comprising a substring starting at a first character position in the string and a second sequence list comprising a substring starting at a second character position in the string;
generating a comparison reference database by querying the first and second sequence lists against a reference database, the reference database comprise a plurality of records and each record comprises a plurality of data fields;
for each record in the comparison reference database, generating a first set of substrings based upon a first set of data fields from the plurality of data fields in the record;
comparing the first set of substrings against the set of strings to identify a longest substring match, if any, for each of the first set of data fields from the record;
filtering the comparison reference database to create a second comparison reference database by selecting each record that has a longest substring match for one or more data fields from the first set of data fields. assigning a point value for each match found in a record and summing the point value for the record; and
responsive to a record having a total point value exceeding a threshold match value, associating the document file with that record. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for indexing a document file comprising a plurality of characters arranged into an array of strings, the method comprising:
-
identifying date strings within the array of strings that correspond to a date and selecting a date string that corresponds to the earliest date;
comparing the date string that corresponds to the earliest date against a reference database, the reference database comprise a plurality of records and each record comprises at least one data field, to generate a comparison reference database comprising records from the reference database that possess at least one data field that matches the date string;
responsive to the comparison reference database comprising a plurality of records, performing a matching operation to reduce the number of records that comprise the comparison reference database; and
responsive to the comparison reference database comprising one record, associating the document file with that record. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A system for indexing a document file comprising:
-
a communications services module coupled to receive from a client a document file and a reference database comprising a plurality of records wherein each record comprises at least one data field element;
an extraction services module, communicatively coupled to the communications services module, that obtains from the document file a plurality of characters arranged into an array of strings; and
indexing services module, communicatively coupled to the extraction services module, that compares a first set of strings from the array of strings against a comparison reference database obtained by filtering the reference database, and;
responsive to at least a portion of the first set of strings exceeding a threshold match with at least a portion of a record in the comparison reference database, associates the document file with the record; and
responsive to the first set of strings matching a plurality of records in the comparison reference database, providing match information to a user for selection. - View Dependent Claims (20, 21, 22)
-
Specification