×

Method and system for content classification

  • US 9,916,381 B2
  • Filed: 12/30/2008
  • Issued: 03/13/2018
  • Est. Priority Date: 12/30/2008
  • Status: Active Grant
First Claim
Patent Images

1. A processing method for classification of contents in a domain represented through a taxonomy, comprising:

  • generating a first digital mathematical representation of a pre-existing version of the taxonomy, wherein the taxonomy represents a hierarchical relationship of elements in a category which includes nodes of specific elements that fall under a node of a broader element, the nodes existing independently of the contents themselves and representing at least one concept;

    generating a second digital mathematical representation of text documents different from said contents and comprising keywords, the text documents including at least one journalistic publication written about the contents;

    processing the first and second digital mathematical representation to enrich the taxonomy, from an initial state to an enriched state, by associating keywords of the text documents with the first digital mathematical representation, such that at least one keyword is extracted from one of the text documents to provide a new label to be associated with a node in the enriched taxonomy, wherein the at least one keyword is present in the text documents but not present in the initial taxonomy, and wherein enriching the first digital mathematical representation with keywords of the text documents comprises;

    implementing a similarity function in order to classify the context documents to the taxonomy by associating the text documents with concepts of the taxonomy;

    selecting representative keywords associated with said context documents to be used for enrichment; and

    vectorial summing of vectors corresponding to nodes in the taxonomy and vectors of selected keywords associated with the context documents, in accordance with factors taking into account representative statistical significance, to generate enriched vectors for the nodes in the taxonomy;

    generating a third digital mathematical representation of the contents from text descriptors of the contents which are separate and different from the text documents and the taxonomy; and

    processing a first enriched digital mathematical representation and third digital mathematical representation for classifying the contents in the enriched taxonomy,wherein the at least one keyword that is present in the text documents but not present in the initial taxonomy is associated with another keyword within the text documents that is the same as the node in the initial taxonomy,wherein;

    said first digital mathematical representation comprises weights indicating a contribution to a node of the taxonomy supplied by a plurality of concepts of the taxonomy, and a plurality of weights for at least one node in the enriched taxonomy is different than a plurality of weights for the same at least one node in the pre-existing version of the taxonomy;

    said second digital mathematical representation comprises respective weights associated with keywords of the text documents, each indicating a relevance of a respective keyword;

    said third digital mathematical representation comprises additional weights associated with additional keywords of the contents;

    each indicating a corresponding relevance of a respective additional keyword, andgenerating the first digital mathematical representation comprises;

    defining a propagation matrix by adjacency associated with the taxonomy and representative of contributions of concepts toward other concepts of the taxonomy;

    defining a matrix of concept vectors representative of the taxonomy;

    defining a matrix of propagation effect depending upon the matrix by adjacency of the matrix of concept vectors; and

    calculating a matrix representing the taxonomy obtained from the matrix of propagation by adjacency from the matrix of propagation effect and from the matrix of concept vectors.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×