×

Identifying categories within textual data

  • US 10,157,178 B2
  • Filed: 02/05/2016
  • Issued: 12/18/2018
  • Est. Priority Date: 02/06/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • identifying a plurality of documents associated with a predetermined subject, where;

    each of the plurality of documents contains textual data, andthe predetermined subject includes one or more terms identifying common subject matter shared by each of the plurality of documents;

    analyzing the textual data of each of the plurality of documents to identify one or more categories within the plurality of the documents, the analyzing including;

    refining the textual data by removing one or more words from the textual data that have a predetermined frequency and a predetermined significance, to create refined textual data,transforming the refined textual data into an array, anddetermining the one or more categories from the array, where each of the one or more categories includes a plurality of topic vectors that each include one or more identified keywords and a frequency of the one or more keywords within the refined textual data;

    linking each of the one or more categories to the predetermined subject;

    returning the one or more categories identified within the plurality of the documents as categories indicative of the predetermined subject; and

    classifying additional textual data, utilizing the one or more categories, including comparing the additional textual data to the one or more categories to determine a probability that the additional textual data is associated with the predetermined subject linked to the one or more categories.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×