Method for processing a file to generate a database
First Claim
Patent Images
1. A method of processing data in a file, comprising the steps of:
- reading the file, the file having data in a plurality of data formats;
identifying one or more blocks of data within the file;
extracting a plurality of data items from the one or more blocks of data;
automatically generating one or more database tables to correspond to the one or more blocks of data; and
loading the plurality of data items into the one or more database tables.
5 Assignments
0 Petitions
Accused Products
Abstract
A method for automatically processing a file, such as a web page or an ASCII file, is provided to treat the file as a database with one or more database tables. An HTML page is retrieved from a user specified URL or from a disk file and is parsed for any HTML tables or text blocks that are translated into a database table in a database representation of the HTML page. ASCII files can also be parsed to identify data blocks to be represented as a database table.
-
Citations
32 Claims
-
1. A method of processing data in a file, comprising the steps of:
-
reading the file, the file having data in a plurality of data formats;
identifying one or more blocks of data within the file;
extracting a plurality of data items from the one or more blocks of data;
automatically generating one or more database tables to correspond to the one or more blocks of data; and
loading the plurality of data items into the one or more database tables. - View Dependent Claims (2, 3)
-
-
4. A method for generating a database table from a HTML file, comprising the steps of:
-
retrieving a source HTML document;
retrieving a source HTML frameset document and merging the source HTML frameset document into the source HTML document if the source HTML document references the HTML frameset document;
creating a database object for the source HTML document;
identifying a HTML table in the source HTML document;
creating a datatable object for the HTML table;
parsing the HTML table to extract data for storage in the datatable object; and
loading the extracted data into the datatable object. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
translating an HTML escape sequence to a corresponding ASCII representation;
deleting a carriage return;
deleting a line feed;
deleting leading and trailing white spaces;
compressing internal white spaces into a white space; and
deleting remaining HTML tags.
-
-
16. The method according to claim 12, further comprising the step of, when the predetermined HTML tag includes a start of table tag, determining if a nested HTML table has been identified.
-
17. The method according to claim 16, wherein if a nested HTML table is identified, the steps of creating a datatable object, parsing the HTML table and loading the extracted data are repeated for each nested HTML table.
-
18. A method for generating a database table from a text table in a source HTML document, comprising the steps of:
-
retrieving a source HTML document;
creating a database object for the source HTML document;
identifying text data in the source HTML document to translate into a database table, the text data not tagged as HTML table data;
creating a datatable object for the text data; and
loading the text data into the datatable object. - View Dependent Claims (19, 20, 21, 22, 23, 24)
reading the source HTML document to identify a predetermined number of lines in the source HTML document having a matching length; and
identifying at least one column break in the predetermined number of lines.
-
-
20. The method according to claim 19, wherein the step of identifying at least one column break includes:
-
generating a blank pattern array;
merging each non-space character of each line of the predetermined number of lines into the pattern array; and
identifying any spaces in the merged pattern array, each space in the merged pattern array identifying a column break.
-
-
21. The method according to claim 20, wherein a start location, a stop location and a length of each column in the text table are determined as a function of the column break.
-
22. The method according to claim 21, wherein the loading step is performed as a function of the start location, stop location and length of each column in the text table.
-
23. The method according to claim 22, wherein prior to the loading step, a content of the text table is cleaned.
-
24. The method according to claim 23, wherein the cleaning includes at least one of:
-
translating an HTML escape sequence to a corresponding ASCII representation;
deleting a carriage return;
deleting a line feed;
deleting leading and trailing white spaces;
compressing internal white spaces into a white space; and
deleting remaining HTML tags.
-
-
25. A method for generating a database table from a text table in an ASCII file, comprising the steps of:
-
retrieving an ASCII file;
creating a database object for the ASCII file;
identifying one or more text tables in the ASCII file;
creating one or more datatable objects to correspond to the one or more text tables; and
automatically loading data from the one or more text tables into the corresponding one or more datatable objects. - View Dependent Claims (26, 27, 28, 29, 30, 31)
reading the ASCII file to identify a predetermined number of lines in the ASCII file having a matching length; and
identifying at least one column break in the predetermined number of lines.
-
-
27. The method according to claim 25, wherein the step of identifying at least one column break includes:
-
generating a blank pattern array;
merging each non-space character of each line of the predetermined number of lines into the pattern array; and
identifying any spaces in the merged pattern array, each space in the merged pattern array identifying a column break.
-
-
28. The method according to claim 27, wherein a start location, a stop location and a length of each column in the text table are determined as a function of the column break.
-
29. The method according to claim 28, wherein the loading step is performed as a function of the start location, stop location and length of each column in the text table.
-
30. The method according to claim 29, wherein prior to the loading step, a content of the text table is cleaned.
-
31. The method according to claim 30, wherein the cleaning includes at least one of:
-
deleting a carriage return;
deleting a line feed;
deleting leading and trailing white spaces; and
compressing internal white spaces into a white space.
-
-
32. A method for generating a database table from a HTML file, comprising the steps of:
-
retrieving a source HTML document;
creating a database object for the source HTML document;
identifying a HTML table in the source HTML document;
creating a first datatable object for the HTML table;
parsing the HTML table to extract data for storage in the first datatable object;
loading the extracted data into the first datatable object;
identifying text data in the source HTML document to translate into a database table, the text data not tagged as HTML table data;
creating a second datatable object for the text data; and
loading the text data into the second datatable object.
-
Specification