System for collecting specific information from several sources of unstructured digitized data
First Claim
1. A system for collecting specific information from several sources of unstructured digitized data, said system comprising:
- a) an input for receiving at least one instruction governing the collection of the specific information;
b) a processing unit coupled to said input, said processing unit operative to;
i) establish a data connection with a plurality of sources of unstructured digitized data from which the specific information is to be collected, at least in part on the basis of the at least one instruction;
ii) analyse the contents of each one of said plurality of sources of unstructured digitized data to identify information elements relevant to the specific information;
iii) extract the identified information elements from each source of unstructured digitized data where information elements have been identified;
iv) process the extracted information elements for generating an output signal conveying at least a portion of the specific information, said processing including;
for each source of unstructured digitized data, correlating the information elements extracted therefrom on the basis of predetermined clustering rules for assembling the extracted information elements into coherent information relevant to the specific information;
compiling the coherent information assembled from the plurality of sources of unstructured digitized data into said at least a portion of the specific information; and
discarding redundant information from said at least a portion of the specific information;
c) an output coupled to said processing unit for releasing said output signal from said system.
2 Assignments
0 Petitions
Accused Products
Abstract
A system for collecting specific information from several sources of unstructured digitized data. The system receives at least one instruction governing the collection of the specific information. The system includes a processing unit operative to analyze the contents of several sources of unstructured digitized data to identify therein information elements relevant to the specific information, at least in part on the basis of the received instruction(s). The processing unit extracts the identified information elements from each source of unstructured digitized data where information elements have been identified, and processes the extracted information elements for generating an output signal conveying the specific information.
146 Citations
53 Claims
-
1. A system for collecting specific information from several sources of unstructured digitized data, said system comprising:
-
a) an input for receiving at least one instruction governing the collection of the specific information;
b) a processing unit coupled to said input, said processing unit operative to;
i) establish a data connection with a plurality of sources of unstructured digitized data from which the specific information is to be collected, at least in part on the basis of the at least one instruction;
ii) analyse the contents of each one of said plurality of sources of unstructured digitized data to identify information elements relevant to the specific information;
iii) extract the identified information elements from each source of unstructured digitized data where information elements have been identified;
iv) process the extracted information elements for generating an output signal conveying at least a portion of the specific information, said processing including;
for each source of unstructured digitized data, correlating the information elements extracted therefrom on the basis of predetermined clustering rules for assembling the extracted information elements into coherent information relevant to the specific information;
compiling the coherent information assembled from the plurality of sources of unstructured digitized data into said at least a portion of the specific information; and
discarding redundant information from said at least a portion of the specific information;
c) an output coupled to said processing unit for releasing said output signal from said system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
a) generate a query request on the basis of the at least one search parameter;
b) send the query request to the search engine;
c) receive a response to the query request from the search engine including at least one URL address indicating the address of a WWW page containing information related to the at least one search parameter;
d) process the response and generate an instruction including the at least one URL address returned by the search engine;
e) transmit said instruction to said input of said system.
-
-
7. A system as defined in claim 6, wherein said processing unit is responsive to the instruction received from said prospector unit for extracting therefrom the at least one URL address returned by the search engine, said processing operative to collect the specific information from the WWW pages connected to the at least one URL address returned by the search engine.
-
8. A system as defined in claim 6, wherein said prospector unit receives a response to the query request from the search engine including a plurality of URL addresses, each URL address indicating the address of a WWW page containing information related to the at least one search parameter, said prospector unit being further operative to:
-
a) select a particular URL address from said plurality of URL addresses returned by the search engine on the basis of said at least one search parameter;
b) discard the unselected URL addresses;
c) generate an instruction including the selected URL address;
d) transmit said instruction to said input of said system.
-
-
9. A system as defined in claim 8, wherein the specific information is business information.
-
10. A system as defined in claim 9, wherein the at least one search parameter is the name of a company.
-
11. A system as defined in claim 10, wherein the selected URL address indicates the address of the home page for the company.
-
12. A system as defined in claim 3, wherein said processing unit performs lexical analysis and text interpretation operations for identifying information elements relevant to the specific information in each source of unstructured digitized data.
-
13. A system as defined in claim 12, wherein the lexical analysis and text interpretation operations are performed by said processing unit at least in part on the basis of a plurality of dictionaries.
-
14. A system as defined in claim 12, wherein, for each source of unstructured digitized data where information elements have been identified, said processing unit is operative to establish relationships between the identified information elements at least in part on the basis of predetermined clustering rules, for assembling the identified information elements into coherent information relevant to the specific information.
-
15. A system as defined in claim 14, wherein said processing unit is operative to process said coherent information generated from all of the sources of unstructured digitized data in which information elements relevant to the specific information were identified, for removing repetitive information and combining complementary information.
-
16. A system as defined in claim 1, wherein said processing unit generates a data structure holding the specific information, said output signal including the data structure.
-
17. A system as defined in claim 16, wherein said data structure is a list.
-
18. A system as defined in claim 16, wherein said data structure is a table.
-
19. A system as defined in claim 1, wherein each source of unstructured digitized data is selected from the group consisting of a WWW page, a database, a server, a memory module, a text file and a digitized document.
-
20. A system as defined in claim 1, wherein the specific information is business information.
-
21. A system as defined in claim 20, wherein the specific information is contact information for prospecting potential clients.
-
22. A system as defined in claim 21, wherein the information elements relevant to the contact information are selected from the group consisting of business name, business description, telephone number, fax number, postal address, street name, city, country, region, postal code, e-mail address, name of a contact person and title of a contact person.
-
23. A computer readable storage medium containing a program element for execution by a computing apparatus to implement a system for collecting specific information from several sources of unstructured digitized data, said system including:
-
a) an input for receiving at least one instruction governing the collection of the specific information;
b) a processing unit coupled to said input, said processing unit operative to;
i) establish a data connection with a plurality of sources of unstructured digitized data from which the specific information is to be collected, at least in part on the basis of the at least one instruction;
ii) analyse the contents of each one of said plurality of sources of unstructured digitized data to identify information elements relevant to the specific information;
iii) extract the identified information elements from each source of unstructured digitized data where information elements have been identified;
iv) process the extracted information elements for generating an output signal conveying at least a portion of the specific information, said processing including;
for each source of unstructured digitized data, correlating the information elements extracted therefrom on the basis of predetermined clustering rules for assembling the extracted information elements into coherent information relevant to the specific information;
compiling the coherent information assembled from the plurality of sources of unstructured digitized data into said at least a portion of the specific information; and
discarding redundant information from said at least a portion of the specific information;
c) an output coupled to said processing unit for releasing said output signal from said system. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A data processing device for collecting specific information from several sources of unstructured digitized data, said data processing device comprising:
-
a) an input for receiving at least one instruction governing the collection of the specific information;
b) an identification unit coupled to said input, said identification unit operative to;
i) establish a data connection with a plurality of sources of unstructured digitized data from which the specific information is to be collected, at least in part on the basis of the at least one instruction;
ii) analyse the contents of each one of said plurality of sources of unstructured digitized data to identify information elements relevant to the specific information;
c) an extractor unit operative to extract the identified information elements from each source of unstructured digitized data where information elements have been identified;
d) an aggregator unit operative to process the extracted information elements for generating an output signal conveying at least a portion of the specific information, said processing including;
for each source of unstructured digitized data, correlating the information elements extracted therefrom on the basis of predetermined clustering rules for assembling the extracted information elements into coherent information relevant to the specific information;
compiling the coherent information assembled from the plurality of sources of unstructured digitized data into said at least a portion of the specific information; and
discarding redundant information from said at least a portion of the specific information;
e) an output for releasing said output signal from said data processing device. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
-
-
53. A method for collecting specific information from several sources of unstructured digitized data, said method comprising:
-
a) receiving at least one instruction governing the collection of the specific information;
b) establishing a data connection with a plurality of sources of unstructured digitized data from which the specific information is to be collected, at least in part on the basis of the at least one instruction;
c) analyzing the contents of each one of the plurality of sources of unstructured digitized data to identify information elements relevant to the specific information;
d) extracting the identified information elements from each source of unstructured digitized data where information elements have been identified;
e) processing the extracted information elements for generating an output signal conveying at least a portion of the specific information, said processing including;
for each source of unstructured digitized data, correlating the information elements extracted therefrom on the basis of predetermined clustering rules for assembling the extracted information elements into coherent information relevant to the specific information;
compiling the coherent information assembled from the plurality of sources of unstructured digitized data into said at least a portion of the specific information; and
discarding redundant information from said at least a portion of the specific information.
-
Specification