System and method for monitoring web pages by comparing generated abstracts
First Claim
1. A computerized method for monitoring a set of documents resulting from a first query, the documents stored in memories of server computers, comprising the steps of:
- submitting the first query to the search engine, the first query generating a result set of documents corresponding to the first query;
generating a first abstract for each member of the set in a search engine, the first abstract including a feature vector and being highly dependent on the features of the document;
submitting a second query to the search engine, the second query generating a result set of documents corresponding to the second query;
generating a second abstract for each member of the result set, the second abstract being a feature vector and being highly dependent on the features of the document; and
comparing the first abstract with the second abstract to identify documents that have changed between the time the set of documents was generated and the time the result set is generated.
8 Assignments
0 Petitions
Accused Products
Abstract
Provided is a computerized method for monitoring the content of documents. A set of documents is stored in memories of server computers. The server computers can be connected to each other by a network such as the Internet. Entries are generated in a search engine for each document of the set. The search engine is also connected to the Internet. The entries are in the form of a full word index of the set of documents. The search engine also maintains a first abstract for each document that is indexed. The abstract is highly dependent on the content of each document. For example, the abstract is in the form of a sketch or a feature vector. Periodically a query is submitted to the search engine. The query locates a result set of documents that satisfy the query. A second abstract is generated for each document member of the result set. The first and second abstracts are compared to identify documents that have changed between the time the set of documents were indexed and the time the result set is generated.
161 Citations
18 Claims
-
1. A computerized method for monitoring a set of documents resulting from a first query, the documents stored in memories of server computers, comprising the steps of:
-
submitting the first query to the search engine, the first query generating a result set of documents corresponding to the first query;
generating a first abstract for each member of the set in a search engine, the first abstract including a feature vector and being highly dependent on the features of the document;
submitting a second query to the search engine, the second query generating a result set of documents corresponding to the second query;
generating a second abstract for each member of the result set, the second abstract being a feature vector and being highly dependent on the features of the document; and
comparing the first abstract with the second abstract to identify documents that have changed between the time the set of documents was generated and the time the result set is generated. - View Dependent Claims (3, 4)
-
-
2. A computerized method for monitoring a set of documents resulting from a first query, the documents stored in memories of server computers, comprising the steps of:
-
submitting a first query to the search engine, the first query generating a result set of documents corresponding to the first query;
generating a first abstract for each member of the result set in a search engine, wherein the first abstract is a sketch of the document and is highly dependent on the features of the document;
submitting a second query to the search engine, the second query generating a result set of documents corresponding to the second query;
generating a second abstract for each member of the result set, wherein the second abstract is a sketch of the document and being highly dependent on the features of the document; and
comparing the first abstract with the second abstract to identify documents that have changed between the time the set of documents was generated and the time the result set is generated.
-
-
5. A computerized method for monitoring a previously generated set of documents corresponding to a query from an end user, comprising:
-
submitting the query to a search engine;
receiving from the search engine a result set of documents and a result set of abstracts corresponding to the result set of documents, the abstracts including feature vectors and being highly dependent on the features of the documents;
comparing the result set of abstracts to a previous set of abstracts corresponding to the previously generated set of documents to identify documents that have changed between the time the previous set of abstracts was generated and the time the result set of abstracts was generated; and
notifying the end user of the changed documents. - View Dependent Claims (6, 7)
-
-
8. A computerized method for communicating with a query monitoring service comprising:
-
retrieving documents stored in server computers connected to each other by a network;
generating entries in a search engine, the entries in a form of a full word index, corresponding to the documents;
generating abstracts corresponding to the documents, the abstracts including feature vectors and being highly dependent on the features of the documents;
receiving a query from the query monitoring service;
locating a result set of documents that satisfy the query;
sending entries, the entries in a form of a full word index, corresponding to the result set of documents to the query monitoring service; and
sending abstracts corresponding to the result set of documents to the query monitoring service.
-
-
9. A computer system integrating a query monitoring service that communicates with at least one search engine, comprising:
-
a) a query monitoring service including;
a software portion configured to receive queries from end users, a software portion configured to submit the queries to at least one search engine at predefined intervals, a software portion configured to receive from at least one search engine a result set of documents and a corresponding set of abstracts, the abstracts including feature vectors and being highly dependent on the features of the documents, a software portion configured to compare the set of abstracts to previous sets of abstracts to identify documents that have changed, and a software portion configured to notify end users of the changed documents; and
b) at least one search engine, each search engine having;
a spider that periodically scans a plurality of server computers for changed or new documents, a query interface that processes queries submitted by the query monitoring service, a software portion configured to generate abstracts for each of the documents, and a software portion configured to deliver a result set of documents and a corresponding set of abstracts to the query monitoring service.
-
-
10. A query monitoring service computer system, comprising:
-
a software portion configured to receive queries from end users;
a software portion configured to submit the queries to at least one search engine at predefined intervals;
a software portion configured to receive from at least one search engine a result set of documents and a corresponding set of abstracts, the abstracts including feature vectors and being highly dependent on the features of the documents;
a software portion configured to compare the set of abstracts to previous sets of abstracts to identify documents that have changed; and
a software portion configured to notify end users of the changed documents.
-
-
11. A computer program product comprising:
-
a computer usable medium having computer readable program code means embodied in the medium for monitoring documents corresponding to a query from an end user, the computer program product including;
computer readable program code means to receive the query from the end user;
computer readable program code means to submit the query to at least one search engine at predefined intervals;
computer readable program code means to receive from at least one search engine a set of documents and a corresponding set of abstracts, the abstracts including feature vectors and being highly dependent on the features of the documents;
computer readable program code means to compare the set of abstracts to previous sets of abstracts to identify documents that have changed; and
computer readable program code means to notify the end user of the changed documents. - View Dependent Claims (12)
a computer usable medium having computer readable program code means embodied in the medium for communicating with a query monitoring service, the computer program product including;
computer readable program code means to periodically scan a plurality of server computers for changed or new documents;
computer readable program code means to process queries submitted by the query monitoring service;
computer readable program code means to generate abstracts for each of the documents, the abstracts including feature vectors and being highly dependent on the features of the documents; and
computer readable program code means to deliver a result set of documents and a corresponding set of abstracts to the query monitoring service.
-
-
13. A computerized method for monitoring a set of documents resulting from a first query, the documents stored in memories of server computers, comprising the steps of:
-
submitting the first query to the search engine, the first query generating a result set of documents corresponding to the first query;
generating a first abstract for each member of the set in a search engine, the first abstract being a feature vector and being highly dependent on the features of the document;
submitting a second query to the search engine, the second query generating a result set of documents corresponding to the second query;
generating a second abstract for each member of the result set, the second abstract being a feature vector and being highly dependent on the features of the document; and
comparing the first abstract with the second abstract to identify documents new in the result set.
-
-
14. A computerized method for monitoring a previously generated set of documents corresponding to a query from an end user, comprising:
-
submitting the query to a search engine;
receiving from the search engine a result set of documents and a result set of abstracts corresponding to the result set of documents, the abstracts including feature vectors and being highly dependent on the features of the documents;
comparing the result set of abstracts to a previous set of abstracts corresponding to the previously generated set of documents to identify documents new in the result set; and
notifying the end user of the new documents.
-
-
15. A computerized method for communicating with a query monitoring service comprising:
-
retrieving documents stored in server computers connected to each other by a network;
generating entries in a search engine, the entries in a form of a full word index, corresponding to the documents;
generating abstracts corresponding to the documents, the abstracts including feature vectors and being highly dependent on the features of the documents;
receiving a query from the query monitoring service;
locating a result set of documents that satisfy the query;
sending entries, the entries in a form of a full word index, corresponding to the result set of documents to the query monitoring service;
sending abstracts corresponding to the result set of documents to the query monitoring service; and
comparing the first abstract with the second abstract to identify documents new in the results set.
-
-
16. A computer system integrating a query monitoring service that communicates with at least one search engine, comprising:
-
a) a query monitoring service including;
a software portion configured to receive queries from end users, a software portion configured to submit the queries to at least one search engine at predefined intervals, a software portion configured to receive from at least one search engine a result set of documents and a corresponding set of abstracts, the abstracts including feature vectors and being highly dependent on the features of the documents, a software portion configured to compare the set of abstracts to previous sets of abstracts to identify documents new in the result set, and a software portion configured to notify end users of the new documents; and
b) at least one search engine, each search engine having;
a spider that periodically scans a plurality of server computers for changed or new documents, a query interface that processes queries submitted by the query monitoring service, a software portion configured to generate abstracts for each of the documents, and a software portion configured to deliver a result set of documents and a corresponding set of abstracts to the query monitoring service.
-
-
17. A query monitoring service computer system, comprising:
-
a software portion configured to receive queries from end users;
a software portion configured to submit the queries to at least one search engine at predefined intervals;
a software portion configured to receive from at least one search engine a result set of documents and a corresponding set of abstracts, the abstracts including feature vectors and being highly dependent on the features of the documents;
a software portion configured to compare the set of abstracts to previous sets of abstracts to identify documents new in the result set; and
a software portion configured to notify end users of the new documents.
-
-
18. A computer program product comprising:
-
a computer usable medium having computer readable program code means embodied in the medium for monitoring documents corresponding to a query from an end user, the computer program product including;
computer readable program code means to receive the query from the end user;
computer readable program code means to submit the query to at least one search engine at predefined intervals;
computer readable program code means to receive from at least one search engine a set of documents and a corresponding set of abstracts, the abstracts including feature vectors and being highly dependent on the features of the documents;
computer readable program code means to compare the set of abstracts to previous sets of abstracts to identify documents new in the set; and
computer readable program code means to notify the end user of the new documents.
-
Specification