INFORMATION COLLECTION APPARATUS, SEARCH ENGINE, INFORMATION COLLECTION METHOD, AND PROGRAM
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides an information collection apparatus, an information collection method, and a program capable of collecting information from information resources on a network effectively as well as a search engine that searches the information resources collected. An information collection apparatus of the present invention that collects information from information resources on a network includes an extraction unit that acquires data from an information resource via the network to extract a link-destination address included in the data, a calculation unit that calculates, by comparing each link-destination address with a collection rule describing a set of addresses qualified for a collection target, a score for each link-destination address that reflects a distance from the set to a link-destination information resource indicated by the link-destination address, and a judgment unit that judges whether the link-destination information resource is to be included in the collection target or not in accordance with the score calculated for the link-destination information resource.
20 Citations
37 Claims
-
1-16. -16. (canceled)
-
17. A method comprising:
-
providing a collection rule defining a set of qualifying addresses; acquiring data from an information resource over a network to extract link destination addresses in the data; determining whether the link destination addresses are included in the set of qualifying addresses; assigning scores to the link destination addresses based on the determination of whether the link destination addresses are in the set of qualifying addresses; determining from the scores of the link destination addresses whether the link destination addresses are to be included in a collection target; and accessing data at the link destination addresses included in the collection target to store for searching. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer readable recording medium including a program executed to perform operations, the operations comprising:
-
providing a collection rule defining a set of qualifying addresses; acquiring data from an information resource over a network to extract link destination addresses in the data; determining whether the link destination addresses are included in the set of qualifying addresses; assigning scores to the link destination addresses based on the determination of whether the link destination addresses are in the set of qualifying addresses; determining from the scores of the link destination addresses whether the link destination addresses are to be included in a collection target; and accessing data at the link destination addresses included in the collection target to store for searching. - View Dependent Claims (33, 34)
-
-
35. A system in communication with a network having information resources, comprising:
-
a processor; and a computer readable medium having a program executed by the processor to perform operations, the operations comprising; providing a collection rule defining a set of qualifying addresses; acquiring data from an information resource over a network to extract link destination addresses in the data; determining whether the link destination addresses are included in the set of qualifying addresses; assigning scores to the link destination addresses based on the determination of whether the link destination addresses are in the set of qualifying addresses; determining from the scores of the link destination addresses whether the link destination addresses are to be included in a collection target; and accessing data at the link destination addresses included in the collection target to store for searching. - View Dependent Claims (36, 37)
-
Specification