Optimized data visualization according to natural language query
First Claim
1. A method comprising:
- accessing by a computer a corpus of numeric data;
priming a synonym list with a plurality of absolute weights and a plurality of proportional weights, wherein each absolute weight is associated with a symbol and one of a plurality of numeric visualization formats, wherein each proportional weight is associated with a symbol and one of a plurality of numeric visualization formats, wherein values assigned to the proportional weights and to the absolute weights reflect greater suitability for a symbol to be visualized by a corresponding visualization format;
receiving, from a user input device, by a computer, a query about a data corpus comprising a natural language expression;
identifying, by a computer, using natural language processing, one or more symbols provided within the expression;
removing, by a computer, one or more aliased meanings by translating uncontrolled language expressed by the one or more identified symbols within the expression to controlled language using one or more normalized symbols according to the primed synonym list;
inferring, by a computer, using natural language processing of the translated controlled language, using one or more of a language dictionary, a model and an ontology, identified symbols, at least one characteristic, property or relationship within the data corpus about which the user is querying but which is not explicitly stated by the user in the expression;
scoring, by a computer, each of the plurality of numeric data visualization formats according to the absolute weights and the proportional weights for each of the different numeric data visualization formats across all of the normalized symbols, wherein the different visualization formats comprise at least a plurality of different formats of charts selected from the group consisting of pie charts, bar graphs, stacked bar charts, time series plots, parts-of-the-whole illustrations, distribution charts, scattergrams, line charts, box plots, correlation charts, comparison charts, and heat maps; and
generating, by a computer on a user interface device, a numeric data visualization of the corpus having a format according to the greatest scoring, wherein the format does not rely on any explicit user chart format or feature selections.
1 Assignment
0 Petitions
Accused Products
Abstract
An optimal visualization format for a data corpus is automatically selected and generated based upon a natural language query or statement about the data corpus from a user by accessing the subject data corpus; receiving the query or statement from the user a natural language expression; identifying symbols in the query or statement through natural language processing; mapping the symbols to weights for a plurality of visualization formats; scoring the visualization formats; and generating a visualization of the subject data corpus according to the scores. Optional metadata, such as row and column labels, database field labels, and XML DTD'"'"'s may be mined for symbols, as well. The new tool may generate the visualization as a digital image file, a digital document file, a digital movie file, a digital three-dimensional model file, or a combination of these.
-
Citations
18 Claims
-
1. A method comprising:
-
accessing by a computer a corpus of numeric data; priming a synonym list with a plurality of absolute weights and a plurality of proportional weights, wherein each absolute weight is associated with a symbol and one of a plurality of numeric visualization formats, wherein each proportional weight is associated with a symbol and one of a plurality of numeric visualization formats, wherein values assigned to the proportional weights and to the absolute weights reflect greater suitability for a symbol to be visualized by a corresponding visualization format; receiving, from a user input device, by a computer, a query about a data corpus comprising a natural language expression; identifying, by a computer, using natural language processing, one or more symbols provided within the expression; removing, by a computer, one or more aliased meanings by translating uncontrolled language expressed by the one or more identified symbols within the expression to controlled language using one or more normalized symbols according to the primed synonym list; inferring, by a computer, using natural language processing of the translated controlled language, using one or more of a language dictionary, a model and an ontology, identified symbols, at least one characteristic, property or relationship within the data corpus about which the user is querying but which is not explicitly stated by the user in the expression; scoring, by a computer, each of the plurality of numeric data visualization formats according to the absolute weights and the proportional weights for each of the different numeric data visualization formats across all of the normalized symbols, wherein the different visualization formats comprise at least a plurality of different formats of charts selected from the group consisting of pie charts, bar graphs, stacked bar charts, time series plots, parts-of-the-whole illustrations, distribution charts, scattergrams, line charts, box plots, correlation charts, comparison charts, and heat maps; and generating, by a computer on a user interface device, a numeric data visualization of the corpus having a format according to the greatest scoring, wherein the format does not rely on any explicit user chart format or feature selections. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product comprising:
-
one or more computer readable data storage devices; and program instructions stored by the data storage device for causing a processor to; access a corpus of numeric data; prime a synonym list with a plurality of absolute weights and a plurality of proportional weights, wherein each absolute weight is associated with a symbol and one of a plurality of numeric visualization formats, wherein each proportional weight is associated with a symbol and one of a plurality of numeric visualization formats, wherein values assigned to the proportional weights and to the absolute weights reflect greater suitability for a symbol to be visualized by a corresponding visualization format; receive, from a user input device, a query about a data corpus comprising a natural language expression; identify, using natural language processing, one or more symbols provided within the expression; remove one or more aliased meanings by translating uncontrolled language expressed by the one or more identified symbols within the expression to controlled language using one or more normalized symbols according to the primed synonym list; infer, using natural language processing of the translated controlled language, using one or more of a language dictionary, a model and an ontology, at least one characteristic, property or relationship within the data corpus about which the user is querying but which is not explicitly stated by the user in the expression; score each of the plurality of numeric data visualization formats according to the absolute weights and the proportional weights for each of the different numeric data visualization formats across all of the normalized symbols, wherein the different visualization formats comprise at least a plurality of different formats of charts selected from the group consisting of pie charts, bar graphs, stacked bar charts, time series plots, parts-of-the-whole illustrations, distribution charts, scattergrams, line charts, box plots, correlation charts, comparison charts, and heat maps; and generate, on a user interface device, a numeric data visualization of the corpus having a format according to the greatest scoring, wherein the format does not rely on any explicit user chart format or feature selections. - View Dependent Claims (12, 13, 14, 16, 17, 18)
-
-
15. A system comprising:
-
a processor for executing program instructions; and one or more computer readable data storage devices storing program instructions for causing a processor to; access a corpus of numeric data; prime a synonym list with a plurality of absolute weights and a plurality of proportional weights, wherein each absolute weight is associated with a symbol and one of a plurality of numeric visualization formats, wherein each proportional weight is associated with a symbol and one of a plurality of numeric visualization formats, wherein values assigned to the proportional weights and to the absolute weights reflect greater suitability for a symbol to be visualized by a corresponding visualization format; receive, from a user input device, a query about a data corpus comprising a natural language expression; identify, using natural language processing, one or more symbols provided within the expression; remove one or more aliased meanings by translating uncontrolled language expressed by the one or more identified symbols within the expression to controlled language using one or more normalized symbols according to the primed synonym list; infer, using natural language processing of the translated controlled language, using one or more of a language dictionary, a model and an ontology, at least one characteristic, property or relationship within the data corpus about which the user is querying but which is not explicitly stated by the user in the expression; score each of the plurality of numeric data visualization formats according to the absolute weights and the proportional weights for each of the different numeric data visualization formats across all of the normalized symbols, wherein the different visualization formats comprise at least a plurality of different formats of charts selected from the group consisting of pie charts, bar graphs, stacked bar charts, time series plots, parts-of-the-whole illustrations, distribution charts, scattergrams, line charts, box plots, correlation charts, comparison charts, and heat maps; and generate, on a user interface device, a numeric data visualization of the corpus having a format according to the greatest scoring, wherein the format does not rely on any explicit user chart format or feature selections.
-
Specification