Method and apparatus for rapidly producing document summaries and document browsing aids
First Claim
Patent Images
1. A computer-assisted method for generating a summary of a document, comprising the steps of:
- at an index creation time, with access to the entire document extracting from the document information that is relevant to at least one dummy query and is necessary to compile at least one temporary summary, and caching at least part of the information comprising substantially less than the entire document; and
at a later search time, generating the summary from the information cached.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a computer-assisted method for generating a summary of or a browsing aid for a document. At an index creation time, information that is relevant to at least one dummy query and is necessary to compile at least one temporary summary for the summary or browsing aid is extracted from a document and cached for later use. The information may be compiled into the summary and saved as such. At a search time, the summary or browsing aid is generated using the information that was cached at index creation time. An apparatus for performing this computer-assisted method is also disclosed.
-
Citations
37 Claims
-
1. A computer-assisted method for generating a summary of a document, comprising the steps of:
-
at an index creation time, with access to the entire document extracting from the document information that is relevant to at least one dummy query and is necessary to compile at least one temporary summary, and caching at least part of the information comprising substantially less than the entire document; and
at a later search time, generating the summary from the information cached. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
the information is extracted by the step of assigning at least one score to each sentence in the document according to the relevance of the sentence to the at least one dummy query, wherein at least the highest scoring sentence is extracted.
-
-
15. The computer-assisted method according to claim 14, wherein a pre-defined number of the highest scoring sentences are extracted.
-
16. The computer-assisted method according to claim 14, wherein the highest scoring sentences that have a score greater than a threshold are extracted.
-
17. The computer-assisted method according to claim 16, wherein up to a pre-defined number of the highest scoring sentences is extracted.
-
18. The computer-assisted method according to claim 14, wherein:
-
the information is extracted by the steps of;
assigning at least one score to each sentence in the document according to the relevance of the sentence to the at least one dummy query, and extracting all of the sentences of a paragraph of the document which contains a number of the highest scoring sentences, and the summary generated is the paragraph.
-
-
19. The computer-assisted method according to claim 14, wherein the score assigned to each sentence is based upon the similarity of the sentence to sentences of documents in a results document list created by execution of the at least one dummy query, and the dummy query contains terms most likely to be contained in a user query searching for the document.
-
20. The computer-assisted method according to claim 1, wherein the cached information is stored in an inverted document index with a document pointer and document attributes.
-
21. The computer-assisted method according to claim 1, wherein the cached information is stored in a query table.
-
22. The computer-assisted method according to claim 1, further including the steps of:
-
at index creation time, compiling the at least one temporary summary from the extracted information, and caching the at least one temporary summary as part of the information cached, wherein the summary generated is one of the at least one temporary summaries.
-
-
23. The computer-assisted method according to claim 1, further including the steps of:
-
generating a link that associates the information with at least one position within the document to which the information relates; and
caching the link.
-
-
24. The computer-assisted method according to claim 1, wherein:
-
the information extracted is relevant to at least two dummy queries each made up of at least one term and is separately cached for each dummy query, and the summary is generated from the information cached for the dummy query that substantially matches a user query having at least one term.
-
-
25. The computer-assisted method according to claim 24, wherein the information extracted is a summary or abstract coded in the document.
-
26. The computer-assisted method according to claim 24, wherein the information extracted is a set of results pages for each dummy query.
-
27. The computer-assisted method according to claim 24, wherein:
-
the information is extracted by steps of assigning at least one score to each sentence in the document according to the relevance of the sentence to the at least one dummy query, and wherein at least the highest scoring sentence is extracted.
-
-
28. The computer-assisted method according to claim 24, wherein:
-
the information extracted includes a label consisting of each term of the corresponding dummy query; and
the summary generated consists of the sentences associated with the dummy query in which the terms of the label substantially match terms of the user query.
-
-
29. The computer-assisted method according to claim 24, wherein:
-
the information cached includes one document summary generated from the information extracted for each of the at least two dummy queries, and a label consisting of each term of the corresponding dummy query, and the summary generated consists of the document summary associated with the dummy query in which the terms of the label substantially match the terms of the user query.
-
-
30. The computer-assisted method according to claim 24, wherein the information is sentences, and the computer-assisted method further includes the steps of:
-
generating links that associate the dummy query terms with the locations in the documents of the sentences that contain the terms; and
caching the links.
-
-
31. A computer-assisted method for generating a summary of a collection of documents, comprising the steps of:
-
at an index creation time, for one or more documents in the collection of documents with access to the entire document, extracting information that is relevant to at least one dummy query and is necessary to compile one or more temporary summaries for each of the one or more documents, compiling the one or more temporary summaries comprising substantially less than the entire document from the extracted information, caching the one or more temporary summaries; and
at a later search time, generating the summary for the collection of documents from the cached one or more temporary summaries.
-
-
32. A computer-assisted method for generating a query-biased document browsing aid, comprising the steps of:
-
at an index time, with access to the entire document extracting information that is relevant to at least one dummy query and is necessary to compile the browsing aid from the document;
caching at least part of the information comprising substantially less than the entire document; and
at a later search time, generating the browsing aid from the information cached. - View Dependent Claims (33, 34, 35, 36)
-
-
37. An apparatus to enable a method for generating a summary of at least one document, comprising:
-
a means for extracting information from an entire document that is relevant to at least one dummy query and is necessary to compile at least one temporary summary from the at least one document at an index creation time;
a means for caching at least part of the information comprising substantially less than the entire document at an index creation time; and
a means for generating the summary from the information cached at a search time.
-
Specification