Light weight document matcher
First Claim
1. A computer implemented document matcher comprising:
- a back-end processor receiving input documents and generating a first data structure consisting of a set of local dictionaries of keywords for each document and then generating a second data structure consisting of a global dictionary resulting from the union of all keywords in the first data structure, said back-end processor computing a table of word weights; and
a front-end processor for matching input documents against documents represented by said second data structure, said front-end processor computing a score for the documents, then sorting the documents by score, stored documents being ranked by a relevance scoring scheme according to a formula
1 Assignment
0 Petitions
Accused Products
Abstract
A lightweight document matcher employs minimal processing and storage. The lightweight document matcher matches new documents to those stored in a database. The matcher lists, in order, those stored documents that are most similar to the new document. The new documents are typically problem statements or queries, and the stored documents are potential solutions such as FAQs (Frequently Asked Questions). Given a set of documents, titles, and possibly keywords, an automatic back-end process constructs a global dictionary of unique keywords and local dictionaries of relevant words for each document. The application front-end uses this information to score the relevance of stored documents to new documents. The scoring algorithm uses the count of matched words as a base score, and then assigns bonuses to words that have high predictive value. It optionally assigns an extra bonus for a match of words in special sections, e.g., titles. The method uses minimal data structures and lightweight scoring algorithms to compute efficiently even in restricted environments, such as mobile or small desktop computers.
-
Citations
10 Claims
-
1. A computer implemented document matcher comprising:
-
a back-end processor receiving input documents and generating a first data structure consisting of a set of local dictionaries of keywords for each document and then generating a second data structure consisting of a global dictionary resulting from the union of all keywords in the first data structure, said back-end processor computing a table of word weights; and
a front-end processor for matching input documents against documents represented by said second data structure, said front-end processor computing a score for the documents, then sorting the documents by score, stored documents being ranked by a relevance scoring scheme according to a formula - View Dependent Claims (2, 3, 4, 9)
-
-
5. A computer implemented process for matching new documents to those stored in a database comprising the steps of:
-
generating a first data structure consisting of a set of local dictionaries of keywords;
generating a second data structure which is a global dictionary resulting from the union of all keywords in the first data structure;
computing a table of word weights based on frequency of use in input documents;
matching input documents against documents represented by said second data structure; and
accessing the table of word weights, scoring input documents, and ranking stored documents by relevance scoring scheme according to a formula - View Dependent Claims (6, 7, 8, 10)
-
Specification