CATEGORY-SENSITIVE RANKING FOR TEXT
First Claim
1. A method comprising:
- receiving a first document of text that is associated with one or more category labels and that includes one or more sequences of one or more words;
determining a category label that represents a first category associated with the first document;
sampling the one or more sequences to determine a topic and a co-occurrence relationship between the topic and the category label, where a topic represents a subdivision within a category;
sampling the one or more sequences to determine a co-occurrence relationship between a sequence in the first document and the topic; and
generating a category-topic model that represents the co-occurrence relationships.
2 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods, systems and apparatus which include computer program products, for generating topic models for text summarization In one aspect, a method includes receiving a first document of text that is associated with one or more category labels and that includes one or more sequences of one or more words, determining a category label that represents a first category associated with the first document, sampling the one or more sequences to determine a topic and a co-occurrence relationship between the topic and the category label, where a topic represents a subdivision within a category, sampling the one or more sequences to determine a co-occurrence relationship between a sequence in the first document and the topic, and generating a category-topic model that represents the co-occurrence relationships.
28 Citations
18 Claims
-
1. A method comprising:
-
receiving a first document of text that is associated with one or more category labels and that includes one or more sequences of one or more words; determining a category label that represents a first category associated with the first document; sampling the one or more sequences to determine a topic and a co-occurrence relationship between the topic and the category label, where a topic represents a subdivision within a category; sampling the one or more sequences to determine a co-occurrence relationship between a sequence in the first document and the topic; and generating a category-topic model that represents the co-occurrence relationships. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
receiving a textual input; receiving a category-topic model that represents co-occurrence relationships between topics and category labels and co-occurrence relationships between sequences of one or more words and topics; determining in the textual input one or more sequences of one or more words that are each associated with a respective category label; ranking each of the determined sequences using the category-topic model; identifying one or more of the sequences that have a ranking greater than a threshold ranking value; and generating a summary of the textual input that includes the identified sequences. - View Dependent Claims (17)
-
-
18. A system comprising:
-
a machine-readable storage device including a program product; and one or more processors operable to execute the program product and perform operations comprising; receiving a textual input; receiving a category-topic model that represents co-occurrence relationships between topics and category labels and co-occurrence relationships between sequences of one or more words and topics; determining in the textual input one or more sequences of one or more words that are each associated with a respective category label; ranking each of the determined sequences using the category-topic model; identifying one or more of the sequences that have a ranking greater than a threshold ranking value; and generating a summary of the textual input that includes the identified sequences.
-
Specification