System and method for filtering a document stream
First Claim
1. An apparatus for filtering documents as received, the apparatus comprising:
- a document parser, said document parser accepting a document as input as it is received and providing an inverted list of terms contained in the document as output;
a profile parser, the profile parser accepting a user query comprised of terms and operators as input and providing a query net representing a parsed user query as output; and
a comparator that compares the inverted list for an incoming document against the query net representing the parsed query and providing as output an indication whether the incoming document matches the parsed query,wherein,the apparatus is programmed to filter documents using recursive statistical inference in which numerical weights assigned respectively to the query terms are combined as dictated by the query operators and in which evaluation of one level of a query tree pauses while evaluation of another lower level proceeds.
15 Assignments
0 Petitions
Accused Products
Abstract
A system for filtering documents and includes a document parser, a profile parser, and a comparator. The document parser accepts incoming documents as input and provides inverted lists of terms contained in the document'"'"'s output. The profile parser accepts as input user queries and provides as output query nets representing the user queries. The comparator compares the inverted lists representing the documents against the query that is representing the user queries to determine if an incoming document matches a user query. A related method for filtering incoming documents includes the steps of receiving an incoming document and parsing it to produce an inverted list of terms contained in the incoming document. The inverted list is then used to retrieve user queries. Any user queries matching less than a pre-determined number of terms are immediately discarded. The remaining user queries are scored and user queries having a score less than a predetermined threshold are discarded. The remaining user queries are the queries which the incoming document matches.
140 Citations
16 Claims
-
1. An apparatus for filtering documents as received, the apparatus comprising:
-
a document parser, said document parser accepting a document as input as it is received and providing an inverted list of terms contained in the document as output; a profile parser, the profile parser accepting a user query comprised of terms and operators as input and providing a query net representing a parsed user query as output; and a comparator that compares the inverted list for an incoming document against the query net representing the parsed query and providing as output an indication whether the incoming document matches the parsed query, wherein, the apparatus is programmed to filter documents using recursive statistical inference in which numerical weights assigned respectively to the query terms are combined as dictated by the query operators and in which evaluation of one level of a query tree pauses while evaluation of another lower level proceeds. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for filtering incoming documents, the method comprising the steps of:
-
(a) receiving an incoming document and parsing it to produce an inverted list of terms contained in incoming document; (b) receiving one or more user queries and parsing same into query nets comprised of query terms and operators; (c) using the produced inverted list to retrieve the query nets representing user queries by using recursive statistical inference in which numerical weights assigned respectively to the query terms are combined as dictated by the query operators and in which evaluation of one level of a query tree pauses while evaluation of another lower level proceeds; (d) discarding retrieved query nets matching less than a predetermined number of terms; (e) scoring remaining profile and discarding profiles having a score less than a predetermined threshold. - View Dependent Claims (8, 9, 10)
-
-
11. An article of manufacture having computer-readable program means for filtering incoming documents, the article comprising:
-
(a) computer-readable program means for receiving an incoming document and parsing it to produce an inverted list of terms contained in incoming document; (b) computer-readable program means for receiving user queries and parsing the user queries to produce query nets containing query terms and query operators representing the queries; (c) computer-readable program means for using the produced inverted list to retrieve query nets representing user queries by using recursive statistical inference in which numerical weights assigned respectively to the query terms are combined as dictated by the query operators and in which evaluation of one level of a query tree pauses while evaluation of another lower level proceeds; (d) computer-readable program means for discarding retrieved query nets matching less than a predetermined number of terms; (e) computer-readable program means for scoring remaining profile and discarding profiles having a score less than a predetermined threshold. - View Dependent Claims (12, 13, 14)
-
-
15. An apparatus for filtering documents as received, the apparatus comprising:
-
a profile parser, the profile parser accepting at least one structured query comprised of terms and operators as input and providing as output a query net representing a parsing of the at least one query; a document parser, said document parser accepting a document as input as it is received and providing an inverted list of terms contained in the document as output; and a recursive statistical inference comparator that compares the inverted list for the document against the query net using statistical inference weighting for the query terms and providing as output an indication whether the document matches the at least one query, wherein, the apparatus is programmed to filter documents using recursive statistical inference in which numerical weights assigned respectively to the query terms are combined as dictated by the query operators and in which evaluation of one level of a query tree pauses while evaluation of another lower level proceeds; and the apparatus is programmed to count the number of inverted list terms matching the query and to disregard the query if the count falls below a predetermined threshold. - View Dependent Claims (16)
-
Specification