METHOD OF AUTOMATED DISCOVERY OF TOPICS RELATEDNESS
First Claim
1. A method comprising:
- generating, via a first topic model computer, a first term vector identifying a first topic in a plurality of documents in a document corpus;
generating, via a second topic model computer, a second term vector identifying a second topic in the plurality of documents in the document corpus;
linking, via a topic detection computer, each of the first and second topics across the plurality of documents in the document corpus;
assigning, via the topic detection computer, a relatedness score to each of the linked first and second topics based on co-occurrence of each of the linked first and second topics across the plurality of documents in the document corpus; and
determining, via the topic detection computer, whether the first and second linked topics are related across the plurality of documents in the document corpus based at least in part on the relatedness score.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer system and method for automated discovery of topic relatedness are disclosed. According to an embodiment, topics within documents from a corpus may be discovered by applying multiple topic identification (ID) models, such as multi-component latent Dirichlet allocation (MC-LDA) or similar methods. Each topic model may differ in a number of topics. Discovered topics may be linked to the associated document. Relatedness between discovered topics may be determined by analyzing co-occurring topic IDs from the different models, assigning topic relatedness scores, where related topics may be used for matching/linking a feature of interest. The disclosed method may have an increased disambiguation precision, and may allow the matching and linking of documents using the discovered relationships.
-
Citations
20 Claims
-
1. A method comprising:
-
generating, via a first topic model computer, a first term vector identifying a first topic in a plurality of documents in a document corpus; generating, via a second topic model computer, a second term vector identifying a second topic in the plurality of documents in the document corpus; linking, via a topic detection computer, each of the first and second topics across the plurality of documents in the document corpus; assigning, via the topic detection computer, a relatedness score to each of the linked first and second topics based on co-occurrence of each of the linked first and second topics across the plurality of documents in the document corpus; and determining, via the topic detection computer, whether the first and second linked topics are related across the plurality of documents in the document corpus based at least in part on the relatedness score. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a first topic model computer comprising a processor configured to generate a first term vector identifying a first topic in a plurality of documents in a document corpus; a second topic model computer comprising a processor configured to generate a second term vector identifying a second topic in the plurality of documents in the document corpus; and a topic detection computer comprising a processor configured to; (a) link each of the first and second topics across the plurality of documents in the document corpus, (b) assign a relatedness score to each of the linked first and second topics based on co-occurrence of each of the linked first and second topics across the plurality of documents in the document corpus, and (c) determine whether the first and second linked topics are related across the plurality of documents in the document corpus based at least in part on the relatedness score. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium having stored thereon computer executable instructions comprising:
-
generating, via a first topic model computer module of a computer, a first term vector identifying a first topic in a plurality of documents in a document corpus; generating, via a second topic model computer module of the computer, a second term vector identifying a second topic in the plurality of documents in the document corpus; linking, via a topic detection computer module of the computer, each of the first and second topics across the plurality of documents in the document corpus; assigning, via the topic detection computer module of the computer, a relatedness score to each of the linked first and second topics based on co-occurrence of each of the linked first and second topics across the plurality of documents in the document corpus; and determining, via the topic detection computer module of the computer, whether the first and second linked topics are related across the plurality of documents in the document corpus based at least in part on the relatedness score. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification