Methods and systems for extracting keyphrases from natural text for search engine indexing
First Claim
1. A computer implemented method for extracting keyphrases from natural text, characterized in that it comprises:
- (a) generating one or more phrases in the natural text based on one or more phrase separators in the natural text, wherein each of the one or more phrase separators comprises one or more words from the natural text;
(b) assigning a weight to each of the one or more phrases in the natural text based on its frequency in the semantic frames of one or more sentences of the natural text, wherein each of the one or more sentences is divided into one or more sub-texts by the one or more phrase separators, and the assigned weight is calculated for each of the one or more phrases based on a frequency of the respective phrase within the one or more sub-texts of each sentence; and
(c) ranking the one or more phrases based on their weights to extract one or more keyphrases having the highest ranks.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention is a method and system for the extraction of keyphrases from natural text. For the purpose of this document, keyphrases are text segments that represent the main topic of a text. The method of the present invention may facilitate keyphrase extraction from any length of text. The text may be of several varieties, such as, for example a sentence, paragraph, document or collection of documents. Phrase separator methods may be applied to the text to extract phrases from the text. From these phrases the present invention may identify the one or more phrases that are integral to the meaning of the text and these may be identified as the keyphrases of the text. The text may be indexed using the keyphrases so that a search based upon any of the keyphrases will cause search engines and/or text retrieval means to retrieve the text.
18 Citations
23 Claims
-
1. A computer implemented method for extracting keyphrases from natural text, characterized in that it comprises:
-
(a) generating one or more phrases in the natural text based on one or more phrase separators in the natural text, wherein each of the one or more phrase separators comprises one or more words from the natural text; (b) assigning a weight to each of the one or more phrases in the natural text based on its frequency in the semantic frames of one or more sentences of the natural text, wherein each of the one or more sentences is divided into one or more sub-texts by the one or more phrase separators, and the assigned weight is calculated for each of the one or more phrases based on a frequency of the respective phrase within the one or more sub-texts of each sentence; and (c) ranking the one or more phrases based on their weights to extract one or more keyphrases having the highest ranks. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer implemented method for extracting keyphrases from natural text, characterized in that it comprises:
-
(a) generating one or more phrases in the natural text based on one or more phrase separators in the natural text, wherein each of the one or more phrase separators comprises one or more words from the natural text; (b) identifying semantic frames that are associated with the one or more phrase separators and analyzing the semantic frames so as to associate with one another phrases that have a related meaning; (c) assigning a weight to each of the one or more phrases in the natural text based on its frequency in the semantic frames of one or more sentences of the natural text and also based on the associations between each phrase and other phrases based on related meaning, wherein each of the one or more sentences is divided into one or more sub-texts by the one or more phrase separators, and the assigned weight is calculated for each of the one or more phrases based on a frequency of the respective phrase within the one or more sub-texts of each sentence; and (d) ranking the one or more phrases based on their weights to extract one or more keyphrases having the highest ranks.
-
-
22. A system having a processor and memory adapted to perform a method comprising the steps of:
-
generating one or more phrases in the natural text based on an identification of one or more phrase separators in the natural text, wherein each of the one or more phrase separators comprises one or more words from the natural text; identifying semantic frames that are associated with the one or more phrase separators and analyzing the semantic frames so as to associate with one another phrases that have a related meaning; assigning a weight to each of the one or more phrases in the natural text based on its frequency in the semantic frames of one or more sentences of the natural text and also based on the associations between each phrase and other phrases based on related meaning, wherein each of the one or more sentences is divided into one or more sub-texts by the one or more phrase separators, and the assigned weight is calculated for each of the one or more phrases based on a frequency of the respective phrase within the one or more sub-texts of each sentence; and ranking the one or more phrases based on their weights to extract one or more keyphrases having the highest ranks.
-
-
23. A non-transitory computer readable storage medium storing a set of computer program instructions, which when executed by a processor, causes a computer device to perform a method comprising the steps of:
-
generating one or more phrases in the natural text based on one or more phrase separators in the natural text, wherein each of the one or more phrase separators comprises one or more words from the natural text; identifying semantic frames that are associated with the one or more phrase separators and analyzing the semantic frames so as to associate with one another phrases that have a related meaning; assigning a weight to each of the one or more phrases in the natural text based on its frequency in the semantic frames of one or more sentences of the natural text and also based on the associations between each phrase and other phrases based on related meaning, wherein each of the one or more sentences is divided into one or more sub-texts by the one or more phrase separators, and the assigned weight is calculated for each of the one or more phrases based on a frequency of the respective phrase within the one or more sub-texts of each sentence; and ranking the one or more phrases based on their weights to extract one or more keyphrases having the highest ranks.
-
Specification