Dynamic visual profiling and visualization of high volume datasets and real-time smart sampling and statistical profiling of extremely large datasets
First Claim
Patent Images
1. A method comprising:
- analyzing, by a computer system of a data enrichment service, one or more columns of a data set obtained from one or more data sources;
identifying, by the computer system of the data enrichment service, one or more patterns in the one or more columns of the data set obtained from one or more data sources, wherein the one or more patterns comprises a repeated type of data;
matching the one or more identified patterns to entity information from a knowledge system;
identifying a category to which the one or more patterns in the one or more columns of the data set belongs;
generating a graphical visualization based on the one or more identified categories of the one or more identified patterns matching the entity information;
causing the graphical visualization to be displayed in a user interface;
receiving input related to the graphical visualization through the user interface; and
updating the graphical visualization based on the input.
1 Assignment
0 Petitions
Accused Products
Abstract
The present disclosure relates generally to a data enrichment service that automatically profiles data sets and provides visualizations of the profiles using a visual-interactive model within a client application (such as a web browser or mobile app). The visual profiling can be refined through end user interaction with the visualization objects and guide exploratory data visualization and discovery. Additionally, data sampling of heterogeneous data streams can be performed during ingestion to extract statistical attributes from multi-columnar data (e.g., standard deviation, median, mode, correlation coefficient, histogram, etc.). Data sampling can continue in real-time as data sources are updated.
65 Citations
22 Claims
-
1. A method comprising:
-
analyzing, by a computer system of a data enrichment service, one or more columns of a data set obtained from one or more data sources; identifying, by the computer system of the data enrichment service, one or more patterns in the one or more columns of the data set obtained from one or more data sources, wherein the one or more patterns comprises a repeated type of data; matching the one or more identified patterns to entity information from a knowledge system; identifying a category to which the one or more patterns in the one or more columns of the data set belongs; generating a graphical visualization based on the one or more identified categories of the one or more identified patterns matching the entity information; causing the graphical visualization to be displayed in a user interface; receiving input related to the graphical visualization through the user interface; and updating the graphical visualization based on the input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 21, 22)
-
-
8. A system comprising:
-
a plurality of data sources; a plurality of data targets; and a cloud computing infrastructure system comprising; one or more processors communicatively coupled to the plurality of data sources and communicatively coupled to the plurality of data targets, over at least one communication network; and a memory coupled to the one or more processors, the memory storing instructions to provide a data enrichment service, wherein the instructions, when executed by the one or more processors, cause the one or more processors to; analyze one or more columns of a data set obtained from one or more data sources; identify one or more patterns in the one or more columns of the data set obtained from one or more data sources, wherein the one or more patterns comprises a repeated type of data; match the one or more identified patterns to entity information from a knowledge system; identify a category to which the one or more patterns in the one or more columns of the data set belongs; generate a graphical visualization based on the one or more identified categories of the one or more identified patterns matching the entity information; cause the graphical visualization to be displayed in a user interface; receive input related to the graphical visualization through the user interface; and update the graphical visualization based on the input. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable storage medium including instructions stored thereon which, when executed by one or more processors, cause the one or more processors to:
-
analyze, by a computer system of a data enrichment service, one or more columns of a data set obtained from one or more data sources; identify, by the computer system of the data enrichment service, one or more patterns in the one or more columns of the data set obtained from one or more data sources, wherein the one or more patterns comprises a repeated type of data; match the one or more identified patterns to entity information from a knowledge system; identify a category to which the one or more patterns in the one or more columns of the data set belongs; generate a graphical visualization based on the one or more identified categories of the one or more identified patterns matching the entity information; cause the graphical visualization to be displayed in a user interface; receive input related to the graphical visualization through the user interface; and update the graphical visualization based on the input. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification