System and method for identifying the owner of a document on the world-wide web
First Claim
1. A method comprising:
- maintaining a database of root URLs;
utilizing, using at least one processor, a root URL from the database of root URLs to retrieve a web page;
parsing the text of the web page to identify a company name within the text of the web page;
in response to identifying the company name, querying, using the identified company name, a third-party database to obtain information associated with the web page;
utilizing the obtained information to identify a geographic location associated with the web page; and
updating, within the database of root URLs, information associated with the web page to include the geographic location.
7 Assignments
0 Petitions
Accused Products
Abstract
A method and search engine for classifying a source publishing a document on a portion of a network, includes steps of electronically receiving a document, based on the document, determining a source which published the document, and assigning a code to the document based on whether data associated with the document published by the source matches with data contained in a database. An intelligent geographic- and business topic-specific resource discovery system facilitates local commerce on the World-Wide Web and also reduces search time by accurately isolating information for end-users. Distinguishing and classifying business pages on the Web by business categories using Standard Industrial Classification (SIC) codes is achieved through an automatic iterative process.
-
Citations
29 Claims
-
1. A method comprising:
-
maintaining a database of root URLs; utilizing, using at least one processor, a root URL from the database of root URLs to retrieve a web page; parsing the text of the web page to identify a company name within the text of the web page; in response to identifying the company name, querying, using the identified company name, a third-party database to obtain information associated with the web page; utilizing the obtained information to identify a geographic location associated with the web page; and updating, within the database of root URLs, information associated with the web page to include the geographic location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer-readable storage medium including a set of instructions that, when executed, cause at least one processor to perform steps comprising:
-
maintaining a database of root URLs; utilizing a root URL from the database of root URLs to retrieve a web page; parsing the text of the web page to identify a company name within the text of the web page; in response to identifying the company name, utilizing the company name associated with the web page to obtain additional information about the web page; associating the additional information with the web page; and updating, within the database of root URLs, to include the information associated with the web page. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system comprising:
-
at least one processor; and a computer-readable storage medium storing instructions thereon that, when executed by the at least one processor, cause the at least one processor to; maintain a database of root URLs; utilize a root URL from the database of root URLs to retrieve a web page; parse the text of the web page to identify a company name within the text of the web page; in response to identifying the company name, utilize the company name to obtain additional information; use the additional information to identify geographic location information associated with the company name; update, within the database of root URLs, information associated with the web page to include the geographic location. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
-
Specification