System and method for geographically organizing and classifying businesses on the world-wide web
First Claim
1. A method of classifying a document published by a source on a portion of a network, comprising the steps of:
- electronically receiving a document;
based on the document, determining a source which published the document; and
assigning a code to said document based on whether data associated with the document published by the source matches with data contained in a database,wherein said portion of said network comprises a graphical multimedia portion of said network, said source comprises a Web site publishing a home page, and said network comprises the Internet.
9 Assignments
0 Petitions
Accused Products
Abstract
A method and search engine for classifying a source publishing a document on a portion of a network, includes steps of electronically receiving a document, based on the document, determining a source which published the document, and assigning a code to the document based on whether data associated with the document published by the source matches with data contained in a database. An intelligent geographic- and business topic-specific resource discovery system facilitates local commerce on the World-Wide Web and also reduces search time by accurately isolating information for end-users. Distinguishing and classifying business pages on the Web by business categories using Standard Industrial Classification (SIC) codes is achieved through an automatic iterative process.
240 Citations
18 Claims
-
1. A method of classifying a document published by a source on a portion of a network, comprising the steps of:
-
electronically receiving a document; based on the document, determining a source which published the document; and assigning a code to said document based on whether data associated with the document published by the source matches with data contained in a database, wherein said portion of said network comprises a graphical multimedia portion of said network, said source comprises a Web site publishing a home page, and said network comprises the Internet. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of classifying a document published by a source on a portion of a network, comprising the steps of:
-
electronically receiving a document; based on the document, determining a source which published the document; and assigning a code to said document based on whether data associated with the document published by the source matches with data contained in a database, wherein said step of determining a source includes; extracting a domain name from a predetermined uniform resources locator (URL) database; querying a registered domain name database for storing registered domain names; and merging addresses from said registered domain name database with predetermined data. - View Dependent Claims (7, 8, 9)
-
-
10. A search engine for use on a network for distinguishing between business web pages and personal web pages, comprising:
-
means for parsing the content of a hyper-text markup language (HTML) at a web address and searching for criteria contained therein; means for analyzing a uniform resources locator (URL) of the web address to determine characteristics of a web page at the web address; means for determining whether said criteria match with data contained in a database; and means for cross-referencing a match, determined by said determining means, to a second database to classify a source which published the web page, wherein said criteria include at least one of an address, a telephone number, a facsimile number, a contact and a key-word contained in said HTML, and wherein the characteristics of said web page include a geographical location and a web page host computer.
-
-
11. A search engine for use on a network for distinguishing between business web pages and personal web pages, comprising:
-
means for parsing the content of a hyper-text markup language (HTML) at a web address and searching for criteria contained therein; means for analyzing a uniform resources locator (URL) of the web address to determine characteristics of a web page at the web address; means for determining whether said criteria match with data contained in a database; and means for cross-referencing a match, determined by said determining means, to a second database to classify a source which published the web page, wherein said second database includes a Business Semantic Terminology database having information related to business categories in a Yellow Pages directory.
-
-
12. A search engine for use on a network for distinguishing between business web pages and personal web pages, comprising:
-
means for parsing the content of a hyper-text markup language (HTML) at a web address and searching for criteria contained therein; means for analyzing a uniform resources locator (URL) of the web address to determine characteristics of a web page at the web address; means for determining whether said criteria match with data contained in a database; and means for cross-referencing a match, determined by said determining means, to a second database to classify a source which published the web page, wherein said second database includes a Yellow Pages database.
-
-
13. A search engine for use on a network for distinguishing between business web pages and personal web pages, comprising:
-
means for parsing the content of a hyper-text markup language (HTML) at a web address and searching for criteria contained therein; means for analyzing a uniform resources locator (URL) of the web address to determine characteristics of a web pare at the web address; means for determining whether said criteria match with data contained in a database; and means for cross-referencing a match, determined by said determining means, to a second database to classify a source which published the web page, wherein said web page comprises hyperlinks, and said means for parsing comprises an indexer robot for traversing said hyperlinks in said web page and a web page index database, said indexer robot for indexing a content of said web page into said web index database.
-
-
14. A search engine for use on a network for distinguishing between business web pages and personal web pages, comprising:
-
means for parsing the content of a hyper-text markup language (HTML) at a web address and searching for criteria contained therein; means for analyzing a uniform resources locator (URL) of the web address to determine characteristics of a web page at the web address; means for determining whether said criteria match with data contained in a database; and means for cross-referencing a match, determined by said determining means, to a second database to classify a source which published the web page, wherein said means for analyzing comprises; means for determining whether said URL comprises one of a root URL and a leaf URL. - View Dependent Claims (15)
-
-
16. A method of indexing textual content on the world-wide web, comprising:
-
robotically traversing the world-wide web to identify uniform resource locators; and determining whether the identified uniform resource locators are associated with a business or an individual, wherein the determining step comprises; extracting ownership data from content associated with the identified uniform resource locators; querying a business listing database based on the ownership data; and determining that the identified uniform resource locators are associated with businesses if the querying matches the ownership data to a business listing in the business listing database. - View Dependent Claims (17, 18)
-
Specification