Automatic file structure and field data type detection
First Claim
1. A computer-implemented method for determining data structure and field types of a data source that is to be processed by an application, the method being executed using one or more processors and comprising:
- receiving the data source;
providing base data associated with the data source;
determining, by the one or more processors, a number of fields of the data source and, for each field, a field type based on the data source and the base data, wherein determining the number of fields of the data source comprises;
selecting a field separator from a plurality of field separators;
for each sample row in a set of sample rows, determining an estimated number of fields based on the field separator; and
iterating over all separators in the plurality of field separators and determining that, for each field separator, the estimated number of fields for each sample row is unequal across two or more sample rows in the set of sample rows, and in response, setting the number of fields equal to one;
generating, by the one or more processors, data structure data, the data structure data comprising the number of fields and field types; and
providing, by the one or more processors, the data structure data to the application.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and computer-readable storage media for determining data structure and field types of a data source that is to be processed by an application. Actions include receiving the data source, providing base data associated with the data source, determining a number of fields of the data source and, for each field, a field type based on the data source and the base data, generating data structure data, the data structure data comprising the number of fields and field types, and providing the data structure data to the application.
6 Citations
15 Claims
-
1. A computer-implemented method for determining data structure and field types of a data source that is to be processed by an application, the method being executed using one or more processors and comprising:
-
receiving the data source; providing base data associated with the data source; determining, by the one or more processors, a number of fields of the data source and, for each field, a field type based on the data source and the base data, wherein determining the number of fields of the data source comprises; selecting a field separator from a plurality of field separators; for each sample row in a set of sample rows, determining an estimated number of fields based on the field separator; and iterating over all separators in the plurality of field separators and determining that, for each field separator, the estimated number of fields for each sample row is unequal across two or more sample rows in the set of sample rows, and in response, setting the number of fields equal to one; generating, by the one or more processors, data structure data, the data structure data comprising the number of fields and field types; and providing, by the one or more processors, the data structure data to the application. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for determining data structure and field types of a data source that is to be processed by an application, the operations comprising:
-
receiving the data source; providing base data associated with the data source; determining a number of fields of the data source and, for each field, a field type based on the data source and the base data, wherein determining the number of fields of the data source comprises; selecting a field separator from a plurality of field separators; for each sample row in a set of sample rows, determining an estimated number of fields based on the field separator; and iterating over all separators in the plurality of field separators and determining that, for each field separator, the estimated number of fields for each sample row is unequal across two or more sample rows in the set of sample rows, and in response, setting the number of fields equal to one; generating data structure data, the data structure data comprising the number of fields and field types; and providing the data structure data to the application.
-
-
15. A system, comprising:
-
a computing device including at least one processor; and a non-transitory computer-readable storage device coupled to the at least one processor and having instructions stored thereon which, when executed by the at least one processor, cause the at least one processor to perform operations for determining data structure and field types of a data source that is to be processed by an application, the operations comprising; receiving the data source; providing base data associated with the data source; determining a number of fields of the data source and, for each field, a field type based on the data source and the base data, wherein determining the number of fields of the data source comprises; selecting a field separator from a plurality of field separators; for each sample row in a set of sample rows, determining an estimated number of fields based on the field separator; and iterating over all separators in the plurality of field separators and determining that, for each field separator, the estimated number of fields for each sample row is unequal across two or more sample rows in the set of sample rows, and in response, setting the number of fields equal to one; generating data structure data, the data structure data comprising the number of fields and field types; and providing the data structure data to the application.
-
Specification