×

Data clustering

  • US 10,452,702 B2
  • Filed: 05/18/2017
  • Issued: 10/22/2019
  • Est. Priority Date: 05/18/2017
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method, comprising:

  • receiving a plurality of documents, each of the plurality of documents represented by a vector of words and associated with a point in time, wherein the received plurality of documents is received and processed in chronological order;

    dividing the received plurality of documents into first time slices using a first time interval to form a plurality of consecutive sets of documents;

    sub-dividing each of the plurality of consecutive sets of documents into second time slices using respective second time intervals to form one or more subsets of documents;

    identifying a plurality of topics in each of the plurality of consecutive sets of documents and the one or more subsets of documents, each of the plurality of topics represented by a set of most relevant topic keywords;

    clustering each of the plurality of consecutive sets of documents and the one or more subsets of documents in accordance with each of the identified plurality of topics;

    comparing each of the identified plurality of topics with respect to each of the plurality of consecutive sets of documents and the one or more subsets of documents to detect patterns of changes in the set of most relevant topic keywords over time, wherein comparing each of the identified plurality of topics with respect to each of the plurality of consecutive sets of documents and the one or more subsets of documents to detect patterns of changes in the set of most relevant topic keywords over time, comprises;

    identifying each of the plurality of topics from each of the plurality of consecutive sets of documents and the one or more subsets of documents of the overlapping time slices to detect patterns of changes in the set of most relevant topic keywords over time;

    identifying a topic drift based on the detected patterns of changes in the set of most relevant topic keywords over time; and

    identifying a topic convergence based on the detected patterns of changes in the set of most relevant topic keywords over time;

    redefining each of the clustered plurality of consecutive sets of documents and the one or more subsets of documents to form homogenous clusters based on the identified topic convergence;

    redefining each of the clustered plurality of consecutive sets of documents and the one or more subsets of documents to form homogenous clusters based on the identified topic drift;

    outputting the redefined clustered plurality of consecutive sets of documents and the one or more subsets of documents; and

    defining a template based on the outputted redefined clustered plurality of consecutive sets of documents and the one or more subsets of documents.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×