Method and system for generating a document summary
First Claim
1. A computer-implemented method for generating a document summary, comprising:
- segmenting the document into document information when the document is indexed;
generating a memory stream using the document information;
comparing words in the memory stream to query terms;
ranking the sentences that include a word that matches a query term, wherein the sentences are ranked according to the number of words in each sentence that match a query term and the number of occurrences of the query terms in each sentence; and
generating the summary with a predetermined number of the sentences that together include as many query term matches as possible.
3 Assignments
0 Petitions
Accused Products
Abstract
A text document is segmented into word and sentence information when the document is first presented and indexed. A memory stream is generated for the document. The memory stream includes document title information, word offsets, sentence offsets, the alternate list, and the contents of the document. The memory stream is used to determine which sentences in the document include query terms. The sentences that include query terms are ranked according to a ranking algorithm. The ranking algorithm determines which sentences include the highest number of query terms and the number of occurrences of the query terms in each sentence. A predetermined number of sentences that together contain as many query terms as possible are selected such that the sentences that are most representative of the document with respect to the query are included in the summary. The summary is generated at query time by concatenating the selected sentences with the query terms highlighted.
-
Citations
20 Claims
-
1. A computer-implemented method for generating a document summary, comprising:
-
segmenting the document into document information when the document is indexed;
generating a memory stream using the document information;
comparing words in the memory stream to query terms;
ranking the sentences that include a word that matches a query term, wherein the sentences are ranked according to the number of words in each sentence that match a query term and the number of occurrences of the query terms in each sentence; and
generating the summary with a predetermined number of the sentences that together include as many query term matches as possible. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for generating a document summary, comprising:
-
a word breaker that is arranged to segment the document into document information when the document is indexed;
a summarization plug-in that is arranged to generate a memory stream using the document information; and
a summarizer that is arranged to;
compare words in the memory stream to query terms, rank the sentences that include a word that matches a query term, wherein the sentences are ranked according to the number of words in each sentence that match a query term and the number of occurrences of the query terms in each sentence, and generate the summary with a predetermined number of the sentences that together include as many query term matches as possible. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-readable medium having stored thereon a data structure, the data structure comprising:
-
a first field containing data representing the contents of a document;
a second field containing data representing alternate forms of words in the document; and
a third field containing data representing word offsets of the document, wherein the third field includes an alternate bit that associates the word with an alternate form of the word in the second field when the alternate bit is set. - View Dependent Claims (19, 20)
-
Specification