DOCUMENT PROCESSING METHOD AND SYSTEM
First Claim
1. A method for expanding a seed document in a seed document set, wherein the seed document set comprises at least one seed document, the method comprising:
- identifying one or more entity words of the seed document, wherein the entity words are words indicating focused entities of the seed document;
identifying, based on each identified entity word, one or more topic words related to the based entity word in the seed document where the entity word is located;
forming an entity word-topic word pair from each identified topic word and the entity word as the basis for identifying the each identified topic word; and
obtaining one or more expanded documents through the web by taking the entity word and topic word in each entity word-topic word pair as key words at the same time, wherein the expanded documents comprise not only the entity word in the each entity word-topic word pair but also the topic word in the each entity word-topic word pair.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for expanding a document set as a search data source in the field of business related search. The present invention provides a method of expanding a seed document in a seed document set. The method includes identifying one or more entity words of the seed document; identifying one or more topic words identifying one or more topic words related to the based entity word in the seed document where the entity word is located; forming an entity word-topic word pair from each identified topic word and the entity word on the basis of which each topic word is identified; and obtaining one or more expanded documents through web by taking the entity word and topic word in the each entity word-topic word pair as key words at the same time. A system for executing the above method is also provided.
-
Citations
19 Claims
-
1. A method for expanding a seed document in a seed document set, wherein the seed document set comprises at least one seed document, the method comprising:
-
identifying one or more entity words of the seed document, wherein the entity words are words indicating focused entities of the seed document; identifying, based on each identified entity word, one or more topic words related to the based entity word in the seed document where the entity word is located; forming an entity word-topic word pair from each identified topic word and the entity word as the basis for identifying the each identified topic word; and obtaining one or more expanded documents through the web by taking the entity word and topic word in each entity word-topic word pair as key words at the same time, wherein the expanded documents comprise not only the entity word in the each entity word-topic word pair but also the topic word in the each entity word-topic word pair. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for filtering a candidate document in a candidate document set, wherein the candidate document set comprises at least one candidate document, the method comprising:
-
receiving one or more entity word-topic word pairs; identifying one or more entity words of the candidate document, wherein the entity words are words indicating focused entities of the candidate document; identifying, based on each identified entity word, one or more topic words related to the based entity words in the candidate document where the entity word is located; determining whether to add the candidate document into a filtered document set using the entity words and topic words in the given entity word-topic word pairs and the identified entity words and topic words in the candidate document; and adding the candidate document into a filtered document set in response to determining that the candidate document should be added into said filtered document set, wherein; each of the given entity word-topic word pairs comprise an entity word and a topic word; all entity words in the entity word-topic word pair form an entity word set; and all topic words in the entity word-topic word pair where each entity word is form a topic word set corresponding to the entity word. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for expanding a seed document in a seed document set, wherein the seed document set comprises at least one seed document, the system comprising:
-
entity word identifying means for identifying one or more entity words of the seed document, the entity words being words indicating focused entities of the seed document; topic word identifying means for identifying, based on each identified entity word, one or more topic words related to the based entity word in the seed document where the entity word is located; pairing means for forming an entity word-topic word pair from each identified topic word and the entity word based on which the each topic word is identified; and document expanding means for obtaining one or more expanded documents through the web by taking the entity word and topic word in the each entity word-topic word pair as key words at the same time, the expanded documents comprising not only the entity word in each entity word-topic word pair but also the topic word in the each entity word-topic word pair. - View Dependent Claims (18)
-
-
19. A system for filtering a candidate document in a candidate document set, wherein the candidate document set comprises at least one candidate document, the system comprising:
-
receiving means for receiving one or more entity word-topic word pairs; entity word identifying means for identifying one or more entity words of the candidate document, the entity words being words indicating focused entities of the document; topic word identifying means for identifying, based on the identified each entity word, one or more topic words related to the based entity word in the candidate document where the entity word is located; and determining means for determining whether to add the candidate document into a filtered document set using the entity words and topic words in the given entity word-topic word pairs and the identified entity words and topic words in the candidate document and for adding the candidate document into a filtered document set in response to a determining result yes, wherein; each of the given entity word-topic word pairs comprise an entity word and a topic word; all entities in the entity word-topic word pair form an entity word set; and all topic words in the entity word-topic word pair where each entity word is located forming a topic word set corresponding to the entity word.
-
Specification