AUTOMATIC GENERATION OF STRUCTURED DATA FROM SEMI-STRUCTURED DATA
First Claim
1. A method for generating structured data from semi-structured data, the method comprising:
- reading a plurality of records from a data file comprising semi-structured data;
obtaining aligned delimiters in a list for every record that has been read;
selecting a most occurring delimiter from the list;
constructing a regular expression using the selected delimiter to split the records into different fields;
reconstructing the records for the regular expression to fit and split into fields; and
displaying the records split into the fields.
6 Assignments
0 Petitions
Accused Products
Abstract
A method and system for generating structured data from semi-structured data are provided. The method includes reading a plurality of records from a data file including semi-structured data. Further, the method includes obtaining aligned delimiters in a list for every record that has been read. The method also includes selecting a most occurring delimiter from the list. The method then includes constructing a regular expression using the selected delimiter to split the records into different fields. The method also includes reconstructing the records for the regular expression to fit and split into fields. In addition, the method includes displaying the records split into the fields.
-
Citations
20 Claims
-
1. A method for generating structured data from semi-structured data, the method comprising:
reading a plurality of records from a data file comprising semi-structured data; obtaining aligned delimiters in a list for every record that has been read; selecting a most occurring delimiter from the list; constructing a regular expression using the selected delimiter to split the records into different fields; reconstructing the records for the regular expression to fit and split into fields; and displaying the records split into the fields. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A method for generating structured data from semi-structured data, the method comprising:
-
reading a plurality of records from a data file comprising semi-structured data; obtaining aligned delimiters in a list for every record that has been read; selecting a most occurring delimiter from the list; constructing a regular expression using the selected delimiter to split the records into different fields; identifying missing delimiters and missing values; reconstructing the records for the regular expression to fit and split into fields, and subsequently filling in NULL for the missing values; and displaying the records in a split tabulated form. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A system comprising:
a processor; and a memory coupled to the processor, the memory storing instructions which when executed by the processor cause the system to perform a method for providing information to a user, the method comprising reading a plurality of records from a data file comprising semi-structured data; obtaining aligned delimiters in a list for every record that has been read; selecting a most occurring delimiter from the list; constructing a regular expression using the selected delimiter to split the records into different fields; reconstructing the records for the regular expression to fit and split into fields; and displaying the records split into the fields. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
Specification