×

System and method for providing multi-core and multi-level topical organization in social indexes

  • US 9,031,944 B2
  • Filed: 04/30/2010
  • Issued: 05/12/2015
  • Est. Priority Date: 04/30/2010
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented system for providing multi-core topic indexing in electronically-stored social indexes, comprising:

  • a storage device comprising;

    a corpus of articles each comprised of online textual materials and a topics;

    a finite state pattern for each topic, each finite state pattern defining a fine-grained topic model that is used to identify the articles that are potentially on-topic; and

    on-topic training examples and off-topic training examples from the articles for each topic;

    one or more of distinct core meanings for the topic by assigning at least one of the on-topic training examples and the off-topic training examples;

    a set of average on-topic articles, comprising;

    a training module configured to provide a set of random training examples from the corpus;

    a match module configured to match the set of random training examples to the finite state pattern for the topic;

    an off-topic elimination module configured to eliminate an article that is similar to the off-topic training examples; and

    an on-topic addition module configured to add the on-topic training examples into the set of the random training examples; and

    an average on-topic core meaning based on the set of the average on-topic articles;

    a social indexing system, comprising;

    a characteristic words selector configured to specify characteristic words for each of the on-topic training examples, the off-topic training examples, and the set of average on-topic articles, and to assign scores to the characteristic words that were specified for the on-topic training examples, off-topic training examples, and the set of average on-topic articles;

    a characteristic words organizer configured to specify on-topic characteristic word term vectors, each on-topic characteristic word term vector comprising the scores of the characteristic words that were specified for each topic for each of the on-topic training examples;

    a characteristic words scorer configured to specify off-topic characteristic word term vectors, each off-topic characteristic word term vector comprising the scores of the characteristic words that were specified for each topic for each of the off-topic training examples;

    a characteristic words specifier configured to specify average on-topic characteristic word term vectors, each average on-topic characteristic word term vector comprising the scores of the characteristic words that were specified for each topic for the set of average on-topic articles;

    an information collector configured to obtain a new article;

    a finite state pattern matcher configured to match the new article to the finite state pattern of each of the topics to designate the new article as a candidate article for each topic to which the finite state pattern was matched;

    a candidate article characteristic words selector configured to specify characteristic words extracted from the candidate article;

    a candidate article characteristic words scorer configured to assign candidate article scores to the characteristic words of the candidate article;

    a topic comparer configured to compare the candidate article scores to the off-topic characteristic word term vectors of each topic and to form an off-topic score for each topic, and to discard the candidate article as off-topic for each topic in which the off-topic score for that topic exceeds an off-topic threshold; and

    a similarity score comparer configured to compare the candidate article scores to the on-topic characteristic word term vectors and the average on-topic characteristic word term vectors of each topic and to form an on-topic score for each topic and configured to select only the candidate articles as candidate on-topic articles which the on-topic score for that topic exceeds an on-topic threshold; and

    a display configured to present the candidate on-topic articles.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×