×

System and method for searching for internet-accessible content

  • US 8,346,753 B2
  • Filed: 11/14/2007
  • Issued: 01/01/2013
  • Est. Priority Date: 11/14/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system for searching for and organizing content on a plurality of computer networks, comprising:

  • a plurality of processors;

    a central repository to collect, process, store, and disseminate information from and to a plurality of dedicated local meta servers, and a plurality of standard web browsers, and a plurality of enhanced web browsers;

    a toolbar plug-in for the standard web browsers in order to enhance the operation of the standard web browsers, with the toolbar plug-in maintaining communication with the central repository over the plurality of computer networks;

    a plurality of distributed local meta sites, with each of the local meta sites storing processed content provided by a plurality of host web sites, and with each of the meta sites hosting the dedicated local meta servers, and a dedicated local search engine, with each of the dedicated local search engines having an associated local spider and an associated local scraper, with each of the distributed local meta sites maintaining communication with the central repository over the plurality of computer networks;

    a plurality of plug-ins for the standard web browsers, and for the local scrapers, and for the local spiders, wherein the installation of the plug-in allows the functionality of the standard web browsers and the local scrapers and the local spiders, to be directed and reconfigured, such that the directing is accomplished by reading a plurality of extended robots.txt files, each containing a plurality of extended directives, with each of the extended directives being understood by the corresponding plug-in, and allowing the functionality of the standard web browser, the local spider, and the local scraper, to be reconfigured by a plurality of configuration files, which are understood by the corresponding plug-ins;

    one or more of the plurality of standard web browsers enhanced with one or more of the plug-ins to allow a plurality of users to display and navigate a standard HTML network so as to view content stored on the plurality of host web sites, accessible from the plurality of computer networks, to allow users to display and navigate and view content on a plurality of non-HTML networks on the plurality of host web sites, accessible from the plurality of computer networks, and to allow a plurality of users to select from a list of supported search engines, which search engine to submit a query to and to interact with during a search session, with the list of supported search engines is provided by the toolbar plug-in used to enhance a standard web browser comprising the analysis of a plurality of user interactions with the plurality of supported search engines, using a standard web browser enhanced with the toolbar plug-in for authenticated and secure communication with the central repository, to monitor the plurality of user interactions during a search session, with storage of the results of the monitoring and with analysis of the plurality of user interactions done at the central repository, wherein the central repository stores a global set of Query Language Progressions (QLPs) generated by a plurality of users, with one QLP being generated per user, and one QLP being generated per search session, wherein the sequence of query entries made by the user over a predetermined period of time is deemed to constitute a QLP, the QLPs having been harvested by the toolbar enhanced standard web browsers, and by the dedicated local search engines hosted on the plurality of dedicated local meta servers;

    with the QLPs being transmitted to the central repository, where each set of QLPs received from the meta-servers is merged into the global set of Query Language Progressions, with the global set being periodically transmitted to each of the distributed local meta servers;

    the local spiders to navigate and index standard HTML networks as well as the plurality of non-HTML networks stored on the plurality of host web sites, accessible from a plurality of computer networks;

    the local scrapers to gather content from the standard HTML networks and to gather and process content from the plurality of the non-HTML networks, and to translate and merge the non-HTML networks into the standard HTML networks and to process the gathered and newly generated HTML networks into link map data, and to store the link map data and associated scraped content on their respective distributed local meta-sites;

    the storage of content by the local scrapers on the local meta sites, and indexing of content by the local spiders, wherein a local index data structure is produced by each of the local meta servers, and stored at each of the corresponding distributed local meta sites, and used by each of the associated local search engines when conducting local searches, which are limited in scope to the associated local meta server and host web site, with each set of the link map data and index data structure periodically transmitted to the central repository, with the central repository integrating all received sets of local link map data into the global link map, and the central repository periodically integrating all received sets of local index data structures into a global index data structure, each of the dedicated local meta servers periodically processing the local set of link map data, with the NodeRank algorithm, to form a local NodeRanked list of HTML links, and periodically transmitting over the computer networks the local NodeRanked list of HTML links to the central repository, with the central repository then merge-sorting the pre-sorted local NodeRanked list of HTML links into a global NodeRanked list of HTML links, the global NodeRanked list of HTML links having resulted from all previous such merge-sorts and based on periodic application of the NodeRank algorithm at the central repository on a global link map;

    a URL server at each of the distributed local meta sites, with each of the URL servers providing a sequence of URLs which are used to guide the standard web browsers and control the navigation of the local spiders and the local scrapers in order to index, and scrape content from the associated host web site to the shadowing distributed local meta site;

    a URI generator at each of the distributed local meta sites being in communication with the standard web browsers which have been enhanced by the plug-ins, the local spiders, and the local scrapers, in order to convert the non-HTML network links into new standard HTML network links during operations, with the new standard HTML network links being displayed on the plug-in enhanced standard web browsers, and being navigated for purposes of indexing by the local spiders, and being navigated for purposes of scraping content by the local scrapers, with the index and content being stored on each of the corresponding distributed local meta sites;

    the plurality of extended robots.txt files, each of which is stored at an associated host web site, wherein the extended directives contained in each of the extended robots.txt files are read by and used to direct the plug-in enhanced standard web browsers, the local spiders and the local scrapers, wherein one of the extended directives is able to direct the reader to the location of each of the associated configuration files and the other extended directives are able to direct the reader in how to conduct the browsing, spidering, and the scraping, for compatibility and compliance with the plurality of host web sites;

    the configuration files which are tailored to the format of each of the particular host web sites and are read and incorporated by the plug-in enhanced standard web browsers, and by the local spiders, and by the local scrapers, so as to customize their functionalities for operation on each of the particular host web sites in order to convert dynamically generated HTML network links into the static standard HTML network links and to convert the non-HTML network links into the standard HTML network links, with the resulting standard HTML network links and associated content being displayed on the plug-in enhanced standard web browsers, and being indexed by the local spiders, and being stored by the local scrapers on the corresponding distributed local meta site dedicated to each of the associated host web sites;

    the plurality of dedicated local meta servers which taken in combination function as an always-up ultra-peer back bone for a plurality of peer-to-peer networks, by supporting the transfer and caching of peer-to-peer network content at each of the distributed local meta sites and by incorporating peer-to-peer search capability at each of the distributed local meta sites.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×