METHOD AND APPARATUS FOR GENERATING A LANGUAGE INDEPENDENT DOCUMENT ABSTRACT
First Claim
1. A method of automatic, computer based creation of a cross-index for a set of documents, the method comprising:
- accessing a memory to read at least a sequence of words from a document in the set of documents;
determining by a processing unit a respective score for at least a subset of words in the sequence based at least in part on word length;
operating the processing unit to determine a number of the at least a subset of words in the sequence that have a score greater than or equal to a threshold score;
operating the processing unit to determine whether the sequence of words contains a number of words that satisfies a verbosity setting;
determining that the sequence of words is a significant phrase in response to determining that the number of the at least a subset of words in the sequence that have a score greater than or equal to the threshold score equals or exceeds a predetermined number and determining that the number of words in the sequence satisfies the verbosity setting; and
adding the significant phrase to a cross-index for the set of documents in response to determining that the significant phrase has been found in more than one document in the set of documents.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of extracting significant phrases from one or more documents stored in a computer-readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.
52 Citations
43 Claims
-
1. A method of automatic, computer based creation of a cross-index for a set of documents, the method comprising:
-
accessing a memory to read at least a sequence of words from a document in the set of documents; determining by a processing unit a respective score for at least a subset of words in the sequence based at least in part on word length; operating the processing unit to determine a number of the at least a subset of words in the sequence that have a score greater than or equal to a threshold score; operating the processing unit to determine whether the sequence of words contains a number of words that satisfies a verbosity setting; determining that the sequence of words is a significant phrase in response to determining that the number of the at least a subset of words in the sequence that have a score greater than or equal to the threshold score equals or exceeds a predetermined number and determining that the number of words in the sequence satisfies the verbosity setting; and adding the significant phrase to a cross-index for the set of documents in response to determining that the significant phrase has been found in more than one document in the set of documents. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A tangible computer readable medium having instructions stored thereon, the instructions configured to cause a computing device to perform operations comprising:
-
accessing a memory to read at least a sequence of words from a document in the set of documents; determining a respective score for at least a subset of words in the sequence based at least in part on word length; determining a number of the at least a subset of words in the sequence that have a score greater than or equal to a threshold score; determining whether the sequence of words contains a number of words that satisfies a verbosity setting; determining that the sequence of words is a significant phrase in response to determining that the number of the at least a subset of words in the sequence that have a score greater than or equal to the threshold score equals or exceeds a predetermined number and determining that the number of words in the sequence satisfies the verbosity setting; and adding the significant phrase to a cross-index for the set of documents in response to determining that the significant phrase has been found in more than one document in the set of documents. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A processing system configured to create a cross-index for a set of documents, the processing system comprising:
-
a memory device configured to store the set of documents, a threshold score, and a verbosity setting; a processing device; a bus operably coupling the processing device to the memory; the processing device configured to; access the memory to read at least a sequence of words from a document in the set of documents; determine a respective score for at least a subset of words in the sequence based at least in part on word length; determine a number of the at least a subset of words in the sequence that have a score greater than or equal to a threshold score; determine whether the sequence of words contains a number of words that satisfies a verbosity setting; determine that the sequence of words is a significant phrase in response to determining that the number of the at least a subset of words in the sequence that have a score greater than or equal to the threshold score equals or exceeds a predetermined number and determining that the number of words in the sequence satisfies the verbosity setting; and add the significant phrase to a cross-index for the set of documents in response to determining that the significant phrase has been found in more than one document in the set of documents. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A tangible computer readable medium having instructions store thereon, the instructions configured to cause a computing device to perform operations comprising:
-
receiving a threshold score; reading a sequence of words in the document; determining a score for respective words in the sequence of words based on at least a length of the respective words; comparing the score to the threshold score; and terminating the reading of the sequence of words in response to determining that a phrase delimiter has been reached, wherein the phrase delimiter includes at least one of a word longer than a predetermined length or a sequence of a first predetermined number of words having a score less than the threshold score. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
28. A processing system configured to create an abstract for a document, the processing system comprising:
-
a memory device configured to store the document and a threshold score; a processing device; a bus operably coupling the processing device to the memory; the processing device configured to; access the memory to retrieve the document and the threshold score; read a sequence of words in the document; determine a score for respective words in the sequence of words based on at least a length of the respective words; compare the score to the threshold score; and terminate the reading of the sequence of words in response to determining that a phrase delimiter has been reached, wherein the phrase delimiter includes at least one of a word longer than a predetermined length or a sequence of a first predetermined number of words having a score less than the threshold score. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A tangible computer readable medium having instructions stored thereon, the instructions configured to cause a computing device to perform operations comprising:
-
receiving at least a verbosity score, a threshold score, and a sequence threshold score from a user input; accessing a memory to read at least a sequence of words from the document; extracting the sequence of words from the document based upon a determination that the sequence includes a number of words greater than or equal to the verbosity score and a number of words having a score greater than or equal to the threshold score is greater than or equal to the sequence threshold score; searching the abstract to determine whether the sequence of words is included in the abstract; and adding the sequence of words to the abstract in response to determining that the sequence was not included in the abstract. - View Dependent Claims (36, 37, 38)
-
-
39. A processing system configured to create an abstract for a document, the processing system comprising:
-
a memory device configured to store the document; a processing device; a user input; the processing device configured to; receive at least a verbosity score, a threshold score, and a sequence threshold score from the user input; access the memory to read at least a sequence of words from the document; extract the sequence of words from the document based upon a determination that the sequence includes a number of words greater than or equal to the verbosity score and a number of words having a score greater than or equal to the threshold score is greater than or equal to the sequence threshold score; search the abstract to determine whether the sequence of words is included in the abstract; and add the sequence of words to the abstract in response to determining that the sequence was not included in the abstract. - View Dependent Claims (40, 41, 42, 43)
-
Specification