Category-sensitive ranking for text
First Claim
1. A method comprising:
- receiving a plurality of documents of text, wherein each document is associated with one or more category labels and includes one or more sequences of one or more words;
determining a plurality of topics from the plurality of documents, wherein each topic represents a subdivision of a respective category label;
performing a plurality of sampling iterations to generate a category-topic model that represents co-occurrence relationships between sequences and topics and co-occurrence relationships between topics and categories, wherein performing each of the plurality of sampling iterations comprises, for each sequence in each of the plurality of documents;
sampling a category label for the sequence from the category labels associated with the document that includes the sequence;
sampling a topic for the sequence; and
updating current values of representations of the co-occurrence relationships based on the category label and the topic sampled for the sequence.
4 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods, systems and apparatus which include computer program products, for generating topic models for text summarization In one aspect, a method includes receiving a first document of text that is associated with one or more category labels and that includes one or more sequences of one or more words, determining a category label that represents a first category associated with the first document, sampling the one or more sequences to determine a topic and a co-occurrence relationship between the topic and the category label, where a topic represents a subdivision within a category, sampling the one or more sequences to determine a co-occurrence relationship between a sequence in the first document and the topic, and generating a category-topic model that represents the co-occurrence relationships.
16 Citations
20 Claims
-
1. A method comprising:
-
receiving a plurality of documents of text, wherein each document is associated with one or more category labels and includes one or more sequences of one or more words; determining a plurality of topics from the plurality of documents, wherein each topic represents a subdivision of a respective category label; performing a plurality of sampling iterations to generate a category-topic model that represents co-occurrence relationships between sequences and topics and co-occurrence relationships between topics and categories, wherein performing each of the plurality of sampling iterations comprises, for each sequence in each of the plurality of documents; sampling a category label for the sequence from the category labels associated with the document that includes the sequence; sampling a topic for the sequence; and updating current values of representations of the co-occurrence relationships based on the category label and the topic sampled for the sequence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
one or more computer-readable storage devices including computer program instructions; and one or more computers operable to execute to execute the instructions to perform operations comprising; receiving a plurality of documents of text, wherein each document is associated with one or more category labels and includes one or more sequences of one or more words; determining a plurality of topics from the plurality of documents, wherein each topic represents a subdivision of a respective category label; performing a plurality of sampling iterations to generate a category-topic model that represents co-occurrence relationships between sequences and topics and co-occurrence relationships between topics and categories, wherein performing each of the plurality of sampling iterations comprises, for each sequence in each of the plurality of documents; sampling a category label for the sequence from the category labels associated with the document that includes the sequence; sampling a topic for the sequence; and updating current values of representations of the co-occurrence relationships based on the category label and the topic sampled for the sequence. - View Dependent Claims (14, 15, 16)
-
-
17. One or more non-transitory computer-readable storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
-
receiving a plurality of documents of text, wherein each document is associated with one or more category labels and includes one or more sequences of one or more words; determining a plurality of topics from the plurality of documents, wherein each topic represents a subdivision of a respective category label; performing a plurality of sampling iterations to generate a category-topic model that represents co-occurrence relationships between sequences and topics and co-occurrence relationships between topics and categories, wherein performing each of the plurality of sampling iterations comprises, for each sequence in each of the plurality of documents; sampling a category label for the sequence from the category labels associated with the document that includes the sequence; sampling a topic for the sequence; and updating current values of representations of the co-occurrence relationships based on the category label and the topic sampled for the sequence. - View Dependent Claims (18, 19, 20)
-
Specification