Please download the dossier by clicking on the dossier button x
×

Systems and methods for ingesting and parsing datasets generated from disparate data sources

  • US 10,444,945 B1
  • Filed: 10/04/2017
  • Issued: 10/15/2019
  • Est. Priority Date: 10/10/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • receiving, by a computer, a plurality of text files from a plurality of data sources, each text file associated with a respective contact event via a respective data source, wherein the contact event corresponds to an electronic telecommunication session;

    for each text file in the plurality of text files, removing, by the computer, a set of words satisfying a stop word list;

    generating, by the computer, a vocabulary file for each text file from the plurality of text files containing a set of words extracted from the plurality of text files, wherein the set of words extracted from the plurality of text files are extracted, by the computer, based on a frequency of occurrence associated with each word satisfying a threshold value;

    generating, by the computer, a vector for each text file in the plurality of text files based upon the set of words extracted from each respective text file, wherein a value corresponding to each dimension of the vector is determined by a frequency of occurrence associated with each word in the set of words;

    generating, by the computer, a matrix corresponding to the generated vectors;

    determining, by the computer, a set of topics for the plurality of text files by decomposing the matrix using a non-negative matrix factorization algorithm;

    determining, by the computer, a distance value for each text file in the plurality of text files relative to other text files in the plurality of text files, wherein the distance value between two text files is determined based upon a similarity between two vectors corresponding to the two text files;

    generating, by the computer, a graphical user interface displaying a plurality of images representing each respective contact event based upon the distance value determined for each respective text file of each respective contact event;

    displaying, by the computer, the graphical user interface on a user device operated by a user; and

    in response to receiving from the user device a selection of a subset of the images representing contact events, generating, by the computer, a second graphical user interface containing a plurality of data fields associated with each of text file associated with the contact events of the selection, wherein at least one data field contains one or more extracts of a portion of each text file and the corresponding topic from the set of topics, and wherein the user selects the subset of the images by interacting with the graphical user interface displayed on the user device.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×