Method, system and computer program product for visually approximating scattered data using color to represent values of a categorical variable
First Claim
1. A method for visually approximating a scatter plot of data points, comprising the steps of:
- grouping the data points into spatial bins;
determining a position for each bin;
determining a count of data points in each bin;
for each bin, determining a distribution of a variable associated with the data points in a respective bin, wherein the variable has multiple categorical values; and
rendering splats at bin positions of corresponding bins, wherein said rendering step renders at least one splat with multiple colors representative of the distribution determined for a corresponding bin.
6 Assignments
0 Petitions
Accused Products
Abstract
A method, system, and computer program product for a new data visualization tool for determining distribution weights that represent values of a categorical variable and then mapping a distinct color to each of the weights so as to visually represent the different values of the categorical variable (or data attribute) in a scatter plot. The distinct colors of a splat are based on the distribution of categorical variable values in a corresponding bin, the distribution of which is represented by a vector. The vector contains as many locations as the number of different values for the categorical variable. The value stored in each location is typically a weight or percentage for that particular value of the categorical variable. Each location in the vector is also associated with a distinct color. The coloring of a single splat with multiple colors involves the rendering of each vector by looping through each vector location, and then based on the weight stored in that location, randomly selecting the same percentage of triangles in the splat for the color associated with that vector location. A threshold is used to help reduce confusion and decrease processing time by summing all weights below the threshold and assigning to it a single neutral color. A slider or other controller can be used to vary the value of the threshold.
108 Citations
28 Claims
-
1. A method for visually approximating a scatter plot of data points, comprising the steps of:
-
grouping the data points into spatial bins;
determining a position for each bin;
determining a count of data points in each bin;
for each bin, determining a distribution of a variable associated with the data points in a respective bin, wherein the variable has multiple categorical values; and
rendering splats at bin positions of corresponding bins, wherein said rendering step renders at least one splat with multiple colors representative of the distribution determined for a corresponding bin. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
determining n number of values in the variable;
creating a vector having n number of locations;
associating each value with one of the vector locations;
calculating a weight for each value; and
storing the calculated percentage of total bin weight for each value in its associated vector location.
-
-
4. The method of claim 3, wherein the weight is a percentage.
-
5. The method of claim 3, wherein the weight is a count.
-
6. The method of claim 1, further comprising the step of assigning a visually distinct color to each of the vector locations.
-
7. The method of claim 6, wherein said rendering step uses a splat, wherein the splat is divided into multiple regions and wherein said rendering step shades the multiple regions such that the shaded areas of the multiple regions has a distribution approximately the same as the distribution determined in said step of determining a distribution of a variable.
-
8. The method of claim 7, wherein the region takes the shape of a triangle, wherein each triangle covers the same area in the splat as all the other triangles.
-
9. The method of claim 7, wherein said rendering step renders each splat with multiple distinct colors that are a function of the weights stored in the vector locations.
-
10. The method of claim 9, wherein said rendering step, prior to rendering each splat with multiple distinct colors, sums all the weights in the vector locations below a threshold and assigns a new value to the weights, wherein the splat gets colored a neutral color that is a function of the summed weight.
-
11. The method of claim 10, further comprising the step of:
globally scaling the threshold of each splat.
-
12. A system for visually approximating a scatter plot of data points, comprising:
-
means for grouping the data points into spatial bins;
means for determininig a position for each bin;
means for determining a count of data points in each bin;
for each bin, means for determining a distribution of a variable associated with the data points in a respective bin, wherein the variable has multiple categorical values; and
means for rendering splats at bin positions of corresponding bins, wherein said means for rendering renders at least one splat with multiple colors representative of the distribution determined for a corresponding bin. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
means for determining n number of values in the variable;
means for creating a vector having n number of locations;
means for associating each value with one of the vector locations;
means for calculating a weight for each value; and
means for storing the calculated percentage of total bin weight for each value in its associated vector location.
-
-
15. The system of claim 14, wherein the weight is a percentage.
-
16. The system of claim 14, wherein the weight is a count.
-
17. The system of claim 12, further comprising means for assigning a visually distinct color to each of the vector locations.
-
18. The system of claim 17, wherein said rendering means uses a splat, wherein the splat is divided into multiple regions and wherein said means for rendering shades the multiple regions such that the shaded areas of the multiple regions has a distribution approximately the same as the distribution determined by said means for determining a distribution of a variable.
-
19. The system of claim 18, wherein the region takes the shape of a triangle, wherein each triangle covers the same area in the splat as all the other triangles.
-
20. The system of claim 18, wherein said rendering means renders each splat with multiple distinct colors that are a function of the weights stored in the vector locations.
-
21. The system of claim 20, wherein said rendering means, prior to rendering each splat with multiple distinct colors, sums all the weights in the vector locations below a threshold and assigns a new value to the weights, wherein the splat gets colored a neutral color that is a function of the summed weight.
-
22. The system of claim 21, further comprising means for globally scaling the threshold of each splat.
-
23. A computer program product comprising a computer useable medium having computer program logic recorded thereon for enabling a graphics processor in a computer system to visually approximate a scatter plot of data points, the computer program logic comprising:
-
means for enabling the graphics processor to bin the data points into bins;
means for enabling the graphics processor to determine a bin position for each bin;
means for enabling the graphics processor to determine a count of data points in each bin;
for each bin, means for enabling the graphics processor to determine a distribution of a variable associated with the data points in a respective bin, wherein the variable has multiple values; and
means for enabling the graphics processor to render splats at bin positions of corresponding bins, each splat having an opacity that is a function of the count of data points in a corresponding bin, whereby, a splat plot can be displayed that visually approximates the scatter plot of data points, and wherein said rendering step renders each splat with respective distinct colors that is a function of the distribution determined for a corresponding bin. - View Dependent Claims (24)
means for enabling the graphics processor to determine the n number of values in the variable;
means for enabling the graphics processor to create a vector having n number of locations;
means for enabling the graphics processor to associate each value with one of the vector locations;
means for enabling the graphics processor to calculate a weight for each value; and
means for enabling the graphics processor to store the weight for each value in its associated vector location, wherein said means for enabling the graphics processor to render splats renders each splat with multiple distinct colors that are a function of the weights stored in the vector locations.
-
-
25. A method for visualizing information related to a large number of bins where distinct colors are mapped to represent different values of a categorical variable, comprising the steps of:
-
determining distribution weights that represent the different values of the categorical variable;
mapping a distinct color to each of the weights; and
rendering splats for each bin, wherein said rendering step renders at least one splat with multiple colors representative of the distribution weights determined for the values of the categorical variable.
-
-
26. A system for visualizing information related to a large number of bins where distinct colors are mapped to represent different values of a categorical variable, comprising the steps of:
-
means for determining distribution weights that represent the different values of the categorical variable;
means for mapping a distinct color to each of the weights; and
means for rendering splats for each bin, wherein said means for rendering renders at least one splat with multiple colors representative of the distribution weights determined for the values of the categorical variable.
-
-
27. A method of interpolating data for animating an external query attribute of a scatter plot of data points in a computer system capable of displaying a plurality of colors, comprising the steps of:
-
(1) determining adjacent data structures corresponding to a position of a first external querying device that queries the data attribute, wherein the adjacent data structures include a first data structure and a second data structure, and wherein the data structures comprise a plurality of processed bins of data points, wherein the first and second data structures each have a vector, wherein each vector has multiple locations storing values representing a distribution of a categorical variable;
(2) merging the first adjacent data structure vector with the second adjacent data structure vector, wherein the values in the same location in the vectors are merged together;
(3) aggregating the first adjacent data structure vector with the second adjacent data structure vector, wherein the values in the same location in the vectors are aggregated together using) a spatial column of the data structure as a unique key;
(4) interpolating the first adjacent data structure vector with the second adjacent data structure vector generating an interpolated vector, wherein the values in the same location in the vectors are interpolated together;
(5) mapping to color the interpolated vector, wherein values in the interpolated vector are weighted by count; and
(6) rendering a data visualization representative of the interpolated vector.
-
-
28. A system of interpolating data for animating an external query attribute of a scatter plot of data points in a computer system capable of displaying a plurality of colors, comprising the steps of:
-
means for determining adjacent data structures corresponding to a position of a first external querying device that queries the data attribute, wherein the adjacent data structures include a first data structure and a second data structure, and wherein the data structures comprise a plurality of processed bins of data points, wherein the first and second data structures each have a vector, wherein each vector has multiple locations storing values representing a distribution of a categorical variable;
means for merging the first adjacent data structure vector with the second adjacent data structure vector, wherein the values in the same location in the vectors are merged together;
means for aggregating the first adjacent data structure vector with the second adjacent data structure vector, wherein the values in the same location in the vectors are aggregated together using a spatial column of the data structure as a unique key;
means for interpolating the first adjacent data structure vector with the second adjacent data structure vector generating an interpolated vector, wherein the values in the same location in the vectors are interpolated together;
means for mapping to color the interpolated vector, wherein values in the interpolated vector are weighted by count; and
means for rendering a data visualization representative of the interpolated vector.
-
Specification