×

Conceptual document analysis and characterization

  • US 9,886,488 B2
  • Filed: 07/20/2016
  • Issued: 02/06/2018
  • Est. Priority Date: 04/27/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving, by at least one data processor, a plurality of data files from a plurality of data sources that comprise textual content;

    categorizing, by the at least one data processor, the plurality of data files into a taxonomy of categories in which each category has associated sample textual content defining a concept for the category and each category associated with a memory-optimized structure that comprises a collection of at least one identification corresponding to at least one of the plurality of data files, the categorizing comprising, for each category;

    comparing, by the at least one data processor, for each of the plurality of data files, the textual content of the data file with the sample textual content for the category;

    calculating, by the at least one data processor, based on the comparing and for each of the plurality of data files, a file score corresponding to the degree of similarity between the defined concept of the category and a determined concept for the data file; and

    generating, by the at least one data processor, the identification stored in the memory-optimized structure that comprises the collection by at least associating, for each of the plurality of data files, the data file with the category if the file score is equal to or greater than a pre-determined minimum score for the category; and

    providing, by the at least one data processor, at least a portion of the data file and/or the associated file score.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×