System, apparatus, method, and computer program product for indexing a file
First Claim
1. A method for processing data files stored at distributed addresses on a data processing network, at least some of the data files having text and graphic content, the method comprising:
- analyzing at least a subset of the data files to produce a database of information characterizing aspects of the data files that tend to distinguish the data files from one another, and cross referencing said information to addresses of the data files;
generating an image of at least a portion of the subset of data files, and storing a graphic file of said image in a manner cross referenced to the addresses of the data files, whereby the graphic file represents an image of the data files at a time of generation;
receiving search queries and applying the search queries to the database for selecting a hit list from among the data files;
reporting the hit list in a search report including the addresses of each of the data files selected and the image corresponding to the data files in the hit list at the respective time of generation.
2 Assignments
0 Petitions
Accused Products
Abstract
A search engine manages the indexing of web page contents and accepts user selection criteria to find and report hits that meet the search criteria. The inventive search engine has an associated crawler function wherein display images of the web pages are rendered and stored as snapshots, preferably when the pages are indexed. The search engine reports search results by composing an html page with links to the corresponding page hits and containing snapshot reduced size graphic images showing the web pages as they appeared when fetched and stored as snapshots.
55 Citations
44 Claims
-
1. A method for processing data files stored at distributed addresses on a data processing network, at least some of the data files having text and graphic content, the method comprising:
-
analyzing at least a subset of the data files to produce a database of information characterizing aspects of the data files that tend to distinguish the data files from one another, and cross referencing said information to addresses of the data files; generating an image of at least a portion of the subset of data files, and storing a graphic file of said image in a manner cross referenced to the addresses of the data files, whereby the graphic file represents an image of the data files at a time of generation; receiving search queries and applying the search queries to the database for selecting a hit list from among the data files; reporting the hit list in a search report including the addresses of each of the data files selected and the image corresponding to the data files in the hit list at the respective time of generation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A network search engine for managing user selection of information contained on data files stored at distributed network addresses on a global information processing network wherein distributed users have control over associated data files accessible by other users, each of said data files having at least some associated text and each of the data files having at least one mode of graphic presentation, comprising:
-
a crawler having at least one processor operable to address and load successive data files comprising at least a subset of said data files stored at said distributed network addresses, the crawler being operable to produce and store a database of information characterizing aspects of the data files that tend to distinguish the data files from one another, cross referenced to addresses of the data files; and
,wherein the crawler is further operable to produce graphic image files representing at least some of the data files, the graphic image files each corresponding to content of corresponding said data files at a point in time, and wherein the crawler is operable to store the graphic image file so as to cross reference the graphic image file to the data files in the database. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. An improved Internet search engine for managing user search and selection of web pages stored at distributed systems coupled at network addresses to the Internet, the search engine having an associated web crawler operable to address and load successive web pages, and to index text data associated with said successive web pages so as to obtain parameter information that distinguishes at least groups of the web pages from one another, the crawler storing the parameter information and associated addresses of the web pages, and the search engine being operable responsive to user submitted search criteria to search the parameter information and to report at least the associated addresses of web pages that met the search criteria when indexed, wherein the improvement comprises:
-
said crawler being operable in conjunction with obtaining the parameter information for at least a subset of said successive web pages to generate a graphic image file containing a visual image that is substantially identical to an appearance of said web pages, for display in a size proportionally smaller than said web pages; and wherein the search engine is operable when reporting the associated addresses of web pages that met the search criteria to include a representation of the graphic image file in said proportionally smaller size. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A system comprising:
-
a fetching agent configured to receive a website file via at least one network interface, wherein the website file is associated with a web page; a rendering agent configured to generate, based on the website file, a visual representation file that represents a rendered appearance of the web page that is substantially identical to an appearance of the web page and to compress the visual representation file of the web page into a reduced image file, wherein the reduced image file represents a reduced-size rendered appearance of the web page for display in a size proportionally smaller than the web page, and wherein the rendering agent is further configured to limit a dynamic aspect of dynamic content in the website file to a static display, wherein the static display comprises an image from the dynamic content in the web page at a fixed time; a memory, configured to store the reduced image file and at least one network address associated with a network location of the website file, wherein the memory is further configured to cross reference the reduced image file with the at least one network address; and a first plurality of fetching agents and a second plurality of rendering agents, and wherein a ratio of the first plurality of fetching agents to the second plurality of rendering agents is modified during processing of website files to maintain a consumption of the memory within a range of fractions of a capacity of the memory. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A method comprising:
-
receiving, at a computer, a website file via at least one network interface, wherein the website file is associated with a web page; generating, at the computer, a visual representation file that represents a rendered appearance of the web page that is based on the website file, wherein the generating limits a dynamic aspect of dynamic content in the website file to a static display, wherein the static display comprises an image from the dynamic content in the web page at a fixed time; compressing, at the computer, the visual representation file of the web page into a reduced image file, wherein the reduced image file represents a reduced-size rendered appearance of the web page; storing, at the computer, the reduced image file and at least one network address associated with a network location of the website file, wherein storing the reduced image file and the at least one network address comprises cross referencing the reduced image file with the at least one network address; and modifying a ratio of first plurality of fetching agents to second plurality of rendering agents during processing of website files to maintain a consumption of memory within a range of fractions of a capacity of the memory. - View Dependent Claims (35, 36)
-
-
37. A non-transitory computer-readable storage medium having instructions stored thereon, the instructions comprising:
-
instructions for receiving a website file via at least one network interface, wherein the website file is associated with a web page, instructions for generating, based on the website file, a visual representation file that represents a rendered appearance of the web page, wherein the generating limits a dynamic aspect of dynamic content in the website file to a static display, and wherein the static display comprises an image from the dynamic content in the web page at a fixed time; instructions for compressing the visual representation file of the web page into a reduced image file, wherein the reduced image file represents a reduced-size rendered appearance of the web page; instructions for storing the reduced image file and at least one network address associated with a network location of the website file, wherein the instructions for storing the reduced image file comprise instructions for cross referencing the reduced image file with the at least one network address; and instructions for modifying a ratio of first plurality of fetching agents to second plurality of rendering agents during processing of website files to maintain a consumption of memory within a range of fractions of a capacity of the memory. - View Dependent Claims (38, 39, 40, 41, 42)
-
-
43. A non-transitory computer-readable storage medium having instructions stored thereon, the instructions comprising:
-
instructions for receiving a file via at least one network interface, wherein the file includes formatting information; instructions for generating, based on the formatting information, a visual representation file that represents a rendered appearance of the file, wherein the generating limits a dynamic aspect of dynamic content in the file to a static display, wherein the static display comprises an image from the dynamic content in a web page at a fixed time; instructions for compressing the visual representation file into a reduced image file, wherein the reduced image file represents a reduced-size rendered appearance of the file; instructions for storing the reduced image file and at least one network address associated with a network location of the file, wherein the instructions for storing the reduced image file comprise instructions for cross referencing the reduced image file with the at least one network address; and instructions for modifying a ratio of first plurality of fetching agents to second plurality of rendering agents during processing of website files to maintain a consumption of memory within a range of fractions of a capacity of the memory. - View Dependent Claims (44)
-
Specification