Abstract generating search method and system
First Claim
1. A method comprising:
- receiving, by a computing device, an inquiry word;
segmenting, by the computing device, the inquiry word into one or more keywords;
searching, by the computing device, an inverted index of a group of documents to determine in the group one or more documents in which one or more of the keywords are matched; and
searching, by the computing device, a forward index of a respective document of the determined one or more documents to generate an abstract for the respective document, the searching including;
determining a length limit of the abstract;
identifying a plurality of portions within the respective document, each portion of the plurality of portions including a respective beginning position in the respective document and a respective ending position in the respective document, the identifying including identifying, within the respective document, every portion that is within the length limit by traversing the forward index character-by-character or word-by-word;
finding a portion among the plurality of portions, the portion including a highest number of the one or more keywords between a beginning position and an ending position compared with any other portion of the plurality of portions; and
selecting the found portion to be the abstract of the respective document.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure provides an information search method and system applicable in an information search system wherein each document has corresponding forward index data to address the issue of low search efficiency suffered by existing information search techniques. In one aspect, the method may include: receiving an inquiry word and obtaining one or more keywords contained in the inquiry word by segmentation; searching one or more documents matching the one or more keywords and forward index data corresponding to the one or more documents through the information search system'"'"'s inverted index data; and determining an abstract of each of the one or more documents according to a corresponding document'"'"'s forward index data, and outputting the abstract and information of the one or more documents as a search result. The proposed techniques can increase efficiency of information search and, at the meantime, guarantee accuracy of the search to a certain extent.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving, by a computing device, an inquiry word; segmenting, by the computing device, the inquiry word into one or more keywords; searching, by the computing device, an inverted index of a group of documents to determine in the group one or more documents in which one or more of the keywords are matched; and searching, by the computing device, a forward index of a respective document of the determined one or more documents to generate an abstract for the respective document, the searching including; determining a length limit of the abstract; identifying a plurality of portions within the respective document, each portion of the plurality of portions including a respective beginning position in the respective document and a respective ending position in the respective document, the identifying including identifying, within the respective document, every portion that is within the length limit by traversing the forward index character-by-character or word-by-word; finding a portion among the plurality of portions, the portion including a highest number of the one or more keywords between a beginning position and an ending position compared with any other portion of the plurality of portions; and selecting the found portion to be the abstract of the respective document. - View Dependent Claims (2, 3, 4, 11, 12, 19)
-
-
5. A system comprising:
-
one or more data processing devices; and one or more tangible computer-readable storage media having stored thereon computer executable components comprising; a storage module configured to store an inverted index of documents in the system and forward indices corresponding to each of the documents; an input module configured to receive an inquiry word; and a search module configured to segment the inquiry word into one or more keywords, search the inverted index to determine one or more documents in which one or more of the keywords are matched, search a forward index of a respective document of the determined one or more documents to generate an abstract for the respective document, the search including; determining a length limit of the abstract; identifying a plurality of portions within the respective document, each portion of the plurality of portions including a respective beginning position in the respective document and a respective ending position in the respective document, the identifying including identifying, within the respective document, every portion that is within the length limit by traversing the forward index character-by-character or word-by-word; finding a portion among the plurality of portions, the portion including a highest number of the one or more keywords between a beginning position and an ending position compared with any other portion of the plurality of portions; and selecting the found portion to be the abstract of the respective document. - View Dependent Claims (6, 7, 8, 13, 14)
-
-
9. One or more tangible computer-readable storage media having computer-executable instructions stored thereon that are configured to program one or more computing devices to perform operations comprising:
-
receiving an inquiry word; segmenting the inquiry word into one or more keywords; searching an inverted index of a group of documents to determine in the group one or more documents in which one or more of the keywords are matched; and searching a forward index of a respective document of the determined one or more documents to generate an abstract for the respective document, the searching including; determining a length limit of the abstract; identifying a plurality of portions within the respective document, each portion of the plurality of portions including a respective beginning position in the respective document and a respective ending position in the respective document, the identifying including identifying, within the respective document, every portion that is within the length limit by traversing the forward index character-by-character or word-by-word; finding a portion among the plurality of portions, the portion including a highest number of the one or more keywords between a beginning position and an ending position compared with any other portion of the plurality of portions; and selecting the found portion to be the abstract of the respective document. - View Dependent Claims (10, 15, 16, 17, 18, 20)
-
Specification