METHOD AND APPARATUS FOR GENERATING A LANGUAGE INDEPENDENT DOCUMENT ABSTRACT
First Claim
1. A method of automatic, computer based identification of a significant phrase in a document, the method comprising:
- storing the document, a threshold score, a verbosity setting, and a significant phrases data structure in a memory;
accessing the memory to read a sequence of words from the document;
determining by a processing unit a score for each word in the sequence based on the length of each word;
operating the processing unit to compare the score for each word in the sequence against the threshold score;
adding the sequence of words as a significant phrase to the significant phrase data structure if;
the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number andthe number of words in the sequence satisfies the verbosity setting;
retrieving a sentence from the document in the memory if the sentence contains a significant phrase stored in the significant phrases data structure; and
operating the processing unit to search an abstract of the document to determine whether the sentence is included in the abstract.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.
-
Citations
48 Claims
-
1. A method of automatic, computer based identification of a significant phrase in a document, the method comprising:
-
storing the document, a threshold score, a verbosity setting, and a significant phrases data structure in a memory; accessing the memory to read a sequence of words from the document; determining by a processing unit a score for each word in the sequence based on the length of each word; operating the processing unit to compare the score for each word in the sequence against the threshold score; adding the sequence of words as a significant phrase to the significant phrase data structure if; the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number and the number of words in the sequence satisfies the verbosity setting; retrieving a sentence from the document in the memory if the sentence contains a significant phrase stored in the significant phrases data structure; and operating the processing unit to search an abstract of the document to determine whether the sentence is included in the abstract. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 35, 36, 37)
-
-
13. A method of identifying a significant phrase in a document, the method comprising:
-
storing the document, a threshold score, a verbosity setting, and a significant phrases data structure in a memory; accessing the memory to read a sequence of words from the document; determining by a processing unit a score for each word in the sequence based on the length of each word; operating the processing unit to compare the score for each word in the sequence against a threshold score; adding the sequence of words as a significant phrase to the significant phrase data structure if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number and the number of words in the sequence satisfies the verbosity setting; storing the sequence of words in the significant phrases data structure and storing the number of words in the sequence in the memory, if the sequence of words is a significant phrase. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A tangible computer readable storage medium containing executable instructions which, if executed in a processing system, cause the system to perform a method for identifying a significant phrase in a document, the method comprising:
-
reading a sequence of words from the document; determining a score for each word in the sequence based on the length of each word; comparing the score for each word in the sequence against a threshold score; indicating that the sequence of words is a significant phrase if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number and the number of words in the sequence satisfies the verbosity setting; retrieving a sentence from the document, the sentence containing the sequence of words, if the sequence of words is a significant phrase; and searching an abstract of the document to determine whether the sentence is included in the abstract. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A tangible computer readable storage medium containing executable instructions which, if executed in a processing system, cause the system to perform a method for identifying a significant phrase in a document, the method comprising:
-
reading a sequence of words from the document; determining a score for each word in the sequence based on the length of each word; comparing the score for each word in the sequence against a threshold score; indicating that the sequence of words is a significant phrase if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number and the number of words in the sequence satisfies a verbosity setting; storing the sequence of words and the number of words in the sequence, if the sequence of words is a significant phrase. - View Dependent Claims (31, 32, 33, 34)
-
-
38. A processing system configured to create an abstract for a document comprising:
-
a memory storing the document, a threshold score, a verbosity setting, and a significant phrases data structure; a processing unit; a bus operably coupling the processing unit to the memory; the processing unit configured to; read a sequence of words from the document from the memory via the bus and a document pointer indicating a portion of the memory to read; determine a score for each word in the sequence based at least in part on the length of each word; compare the score for each word in the sequence against the threshold score stored in the memory; add the sequence of words as a significant phrase to the significant phrases data structure if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number and the number of words in the sequence satisfies the verbosity setting; retrieve a sentence from the document from the memory via the bus if the sentence contains a significant phrase stored in the significant phrases data structure; and search the abstract of the document to determine whether the sentence is included in the abstract. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
Specification