Source editing, internationalization, advanced configuration wizard, and summary page selection for information automation systems
First Claim
1. An improved information automation system for deep web harvesting of data the information automation system including computer hardware and being configured to connect with individual ones of a plurality of separately-searchable deep web sources and, for each deep web source, to (a) automatically analyze a search query entry form and search result output format;
- (b) based on that analysis, automatically develop a source-specific query format and search result collection process corresponding to that deep web source; and
(c) automatically store each source-specific query format as a source record associated with a corresponding deep web source;
wherein for performing a new deep web harvest, the system is configured to accept input of new search criteria and, based on a plurality of source records, formulate a plurality of distinct queries based on a corresponding plurality of source records and on the new search criteria, the improvement comprising;
a user interface device configured to display a visual output and to accept user input including the input of the new search criteria, the user interface device being programmed to;
display, via the visual output, contents of a first automatically-configured source record in a user-readable form;
facilitate user editing of certain portions of the first source record via the interaction with user-readable form; and
facilitate functionality testing, via user interaction with the user-readable form, of a sample automated interaction between the information automation system and the deep web source that corresponds to the first source record, with the automatic interaction being conducted based on the first source record being edited;
wherein the sample automated interaction is user-initiatable via the user-readable form; and
wherein in response to an initiation of the sample automated interaction, at least a portion of a harvesting operation is automatically performed, and a sample result set of the sample automated interaction is displayed via the user interface device for review by the user.
5 Assignments
0 Petitions
Accused Products
Abstract
A source manager includes an editor program that can be used to edit an existing source record via a graphical user interface (GUI). Test Action and Test Source functions allow a user to test enter a query and to test a source expeditiously. A conversion tool converts existing sources to the design and format to reconcile data scattered among the source engine data and source partition record. For handling internationalization issues, aspects of the invention include persistently storing the source'"'"'s encoding type during the configuration process, and then using that encoding type later during the deep harvest phase. According to another aspect of the invention a solution for selecting a summary passage for a particular source is provided. Other aspects of the invention include solutions for character encoding, “Next Links” recognition and “Next Results” handling.
-
Citations
20 Claims
-
1. An improved information automation system for deep web harvesting of data the information automation system including computer hardware and being configured to connect with individual ones of a plurality of separately-searchable deep web sources and, for each deep web source, to (a) automatically analyze a search query entry form and search result output format;
- (b) based on that analysis, automatically develop a source-specific query format and search result collection process corresponding to that deep web source; and
(c) automatically store each source-specific query format as a source record associated with a corresponding deep web source;
wherein for performing a new deep web harvest, the system is configured to accept input of new search criteria and, based on a plurality of source records, formulate a plurality of distinct queries based on a corresponding plurality of source records and on the new search criteria, the improvement comprising;a user interface device configured to display a visual output and to accept user input including the input of the new search criteria, the user interface device being programmed to; display, via the visual output, contents of a first automatically-configured source record in a user-readable form; facilitate user editing of certain portions of the first source record via the interaction with user-readable form; and facilitate functionality testing, via user interaction with the user-readable form, of a sample automated interaction between the information automation system and the deep web source that corresponds to the first source record, with the automatic interaction being conducted based on the first source record being edited; wherein the sample automated interaction is user-initiatable via the user-readable form; and wherein in response to an initiation of the sample automated interaction, at least a portion of a harvesting operation is automatically performed, and a sample result set of the sample automated interaction is displayed via the user interface device for review by the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- (b) based on that analysis, automatically develop a source-specific query format and search result collection process corresponding to that deep web source; and
-
19. In a computer-implemented information automation system for deep web harvesting of data in which the information automation system includes computer hardware and is configured to connect with individual ones of a plurality of separately-searchable deep web sources and, for each deep web source, to (a) automatically analyze a search query entry form and search result output format;
- (b) based on that analysis, automatically develop a source-specific query format and search result collection process corresponding to that deep web source; and
(c) automatically store each source-specific query format as a source record associated with a corresponding deep web source;
wherein for performing a new deep web harvest, the system is configured to accept input of new search criteria and, based on a plurality of source records, formulate a plurality of distinct queries based on a corresponding plurality of source records and on the new search criteria, a method for interactively editing source records, the method comprising;displaying, via a user interface device, contents of a first automatically-configured source record, the contents being displayed in a user-readable form; providing, via the user-readable form, editable fields for modifying content of the first source record; providing, via the user-readable form, a functionality testing initiation control, such that, when activated, the functionality testing initiation control initiates a sample automated interaction between the information automation system and the deep web source that corresponds to the first source record, with the automatic interaction being conducted based on the first source record being edited; in response to an initiation of the sample automated interaction, automatically performing at least a portion of a harvesting operation, and displaying, via the user interface device, a sample result set of the sample automated interaction. - View Dependent Claims (20)
- (b) based on that analysis, automatically develop a source-specific query format and search result collection process corresponding to that deep web source; and
Specification