Method and apparatus for generating a language independent document abstract
First Claim
1. A method of identifying a significant phrase in a document, the method comprising:
- reading a sequence of words from the document;
determining a score for each word in the sequence based on the length of each word;
comparing the score for each word in the sequence against a threshold score;
indicating that the sequence of words is a significant phrase if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number;
retrieving a sentence from the document, the sentence containing the sequence of words, if the sequence of words is a significant phrase; and
searching an abstract of the document to determine whether the sentence is included in the abstract.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.
29 Citations
34 Claims
-
1. A method of identifying a significant phrase in a document, the method comprising:
-
reading a sequence of words from the document;
determining a score for each word in the sequence based on the length of each word;
comparing the score for each word in the sequence against a threshold score;
indicating that the sequence of words is a significant phrase if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number;
retrieving a sentence from the document, the sentence containing the sequence of words, if the sequence of words is a significant phrase; and
searching an abstract of the document to determine whether the sentence is included in the abstract. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of identifying a significant phrase in a document, the method comprising:
-
reading a sequence of words from the document;
determining a score for each word in the sequence based on the length of each word;
comparing the score for each word in the sequence against a threshold score;
indicating that the sequence of words is a significant phrase if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number;
storing the sequence of words and the number of words in the sequence, if the sequence of words is a significant phrase. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer readable medium containing executable instructions which, when executed in a processing system, cause the system to perform a method for identifying a significant phrase in a document, the method comprising:
-
reading a sequence of words from the document;
determining a score for each word in the sequence based on the length of each word;
comparing the score for each word in the sequence against a threshold score;
indicating that the sequence of words is a significant phrase if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number;
retrieving a sentence from the document, the sentence containing the sequence of words, if the sequence of words is a significant phrase; and
searching an abstract of the document to determine whether the sentence is included in the abstract. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer readable medium containing executable instructions which, when executed in a processing system, cause the system to perform a method for identifying a significant phrase in a document, the method comprising:
-
reading a sequence of words from the document;
determining a score for each word in the sequence based on the length of each word;
comparing the score for each word in the sequence against a threshold score;
indicating that the sequence of words is a significant phrase if the number of words in the sequence that have the score greater than the threshold score equals or exceeds a predetermined number;
storing the sequence of words and the number of words in the sequence, if the sequence of words is a significant phrase. - View Dependent Claims (31, 32, 33, 34)
-
Specification