Information collection apparatus, search engine, information collection method, and program
First Claim
1. A method comprising:
- providing a collection rule defining a set of qualifying addresses;
acquiring data from an information resource over a network to extract link destination addresses in the data;
determining whether the link destination addresses are included in the set of qualifying addresses by determining whether the link destination addresses comprise addresses in an allowed set explicitly allowed by the collection rule, explicitly excluded by the collection rule, and outside of the allowed set;
assigning scores to the link destination addresses based on the determination of whether the link destination addresses are in the set of qualifying addresses, which is based on the determination of whether the link destination addresses comprise addresses explicitly allowed, explicitly excluded, or outside of the allowed set;
determining from the scores of the link destination addresses whether the link destination addresses are to be included in a collection target; and
accessing data at the link destination addresses included in the collection target to store for searching.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides an information collection apparatus, an information collection method, and a program capable of collecting information from information resources on a network effectively as well as a search engine that searches the information resources collected. An information collection apparatus of the present invention that collects information from information resources on a network includes an extraction unit that acquires data from an information resource via the network to extract a link-destination address included in the data, a calculation unit that calculates, by comparing each link-destination address with a collection rule describing a set of addresses qualified for a collection target, a score for each link-destination address that reflects a distance from the set to a link-destination information resource indicated by the link-destination address, and a judgment unit that judges whether the link-destination information resource is to be included in the collection target or not in accordance with the score calculated for the link-destination information resource.
16 Citations
21 Claims
-
1. A method comprising:
-
providing a collection rule defining a set of qualifying addresses; acquiring data from an information resource over a network to extract link destination addresses in the data; determining whether the link destination addresses are included in the set of qualifying addresses by determining whether the link destination addresses comprise addresses in an allowed set explicitly allowed by the collection rule, explicitly excluded by the collection rule, and outside of the allowed set; assigning scores to the link destination addresses based on the determination of whether the link destination addresses are in the set of qualifying addresses, which is based on the determination of whether the link destination addresses comprise addresses explicitly allowed, explicitly excluded, or outside of the allowed set; determining from the scores of the link destination addresses whether the link destination addresses are to be included in a collection target; and accessing data at the link destination addresses included in the collection target to store for searching. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer readable recording medium including a program executed to perform operations, the operations comprising:
-
providing a collection rule defining a set of qualifying addresses; acquiring data from an information resource over a network to extract link destination addresses in the data; determining whether the link destination addresses are included in the set of qualifying addresses by determining whether the link destination addresses comprise addresses in an allowed set explicitly allowed by the collection rule, explicitly excluded by the collection rule, and outside of the allowed set; assigning scores to the link destination addresses based on the determination of whether the link destination addresses are in the set of qualifying addresses, which is based on the determination of whether the link destination addresses comprise addresses explicitly allowed, explicitly excluded, or outside of the allowed set; determining from the scores of the link destination addresses whether the link destination addresses are to be included in a collection target; and accessing data at the link destination addresses included in the collection target to store for searching. - View Dependent Claims (17, 18)
-
-
19. A system in communication with a network having information resources, comprising:
-
a processor; and a computer readable medium having a program executed by the processor to perform operations, the operations comprising; providing a collection rule defining a set of qualifying addresses; acquiring data from an information resource over a network to extract link destination addresses in the data; determining whether the link destination addresses are included in the set of qualifying addresses by determining whether the link destination addresses comprise addresses in an allowed set explicitly allowed by the collection rule, explicitly excluded by the collection rule, and outside of the allowed set; assigning scores to the link destination addresses based on the determination of whether the link destination addresses are in the set of qualifying addresses, which is based on the determination of whether the link destination addresses comprise addresses explicitly allowed, explicitly excluded, or outside of the allowed set; determining from the scores of the link destination addresses whether the link destination addresses are to be included in a collection target; and accessing data at the link destination addresses included in the collection target to store for searching. - View Dependent Claims (20, 21)
-
Specification