×

Interactive visualization of big data sets and models including textual data

  • US 9,501,540 B2
  • Filed: 09/25/2014
  • Issued: 11/22/2016
  • Est. Priority Date: 11/04/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • accessing a set of sample data instances, each instance comprising a corresponding value for at least some of a plurality of data fields and at least one of the data fields characterized as a text data type;

    processing the sample data instances so as to form a dataset, the processing including analyzing the sample data instances to recognize a data type for each of the plurality of data fields, the recognition including selecting the data type from a predetermined set of data types that includes at least a numeric data type, a categorical data type, and a text data type;

    generating a visual summary of the dataset on a computing device, the visual summary comprising a tabular presentation including a series of rows and columns of information, each row corresponding to one of the data fields of the sample data, and each column displaying a corresponding parameter in each of the rows, wherein the displayed column parameters include a data field name, a type of the data field named in the row, and a count of sample data instances in the data set that include a value in the named field;

    in response to recognizing a text data type for one of the data fields of a sample data instance, matching the values of the text data field to a human language;

    based on the matched human language, tokenizing a value of each text data field to form a corresponding token;

    incorporating the corresponding token as a new value for the corresponding text data field in the dataset; and

    displaying parameters of the text data field in a corresponding row of the visual summary;

    wherein processing the sample data further includesapplying a selected tokenization process to form a set of tokens based on the values in the text data fields;

    for a given row in the visual summary corresponding to a text data field in the sample data set, tokenizing all of the respective values of the text data field found in the sample data set to form a set of tokens for the given row;

    counting a respective number of occurrences of each one of the tokens; and

    storing the counted numbers of occurrences.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×