Method of automated discovery of new topics
First Claim
1. A method comprising:
- automatically extracting, by a database source computer, from a document corpus, data associated with a plurality of co-occurring topics;
in response to automatically extracting the plurality of co-occurring topics, extracting, by a synchronizing framework computer, a plurality of topic identifiers from the plurality of co-occurring topics;
creating, by the synchronizing framework computer, a master topic computer model for the document corpus from a first plurality of term vectors;
creating, by the synchronizing framework computer, a periodic new topic computer model by comparing topic significance among the plurality of topic identifiers, the periodic new topic computer model including a second plurality of term vectors; and
selecting, by the synchronizing framework computer, one or more new topics by identifying one or more term vectors from the second plurality of term vectors in the periodic new topic computer model that have no correlation with the first plurality of term vectors in the master topic computer model.
2 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure relates to a method for performing automated discovery of new topics from unlimited documents related to any subject domain, employing a multi-component extension of Latent Dirichlet Allocation (MC-LDA) topic models, to discover related topics in a corpus. The resulting data may contain millions of term vectors from any subject domain identifying the most distinguished co-occurring topics that users may be interested in, for periodically building new topic ID models using new content, which may be employed to compare one by one with existing model to measure the significance of changes, using term vectors differences with no correlation with a Periodic New Model, for periodic updates of automated discovery of new topics, which may be used to build a new topic ID model in-memory database to allow query-time linking on massive data-set for automated discovery of new topics.
92 Citations
18 Claims
-
1. A method comprising:
-
automatically extracting, by a database source computer, from a document corpus, data associated with a plurality of co-occurring topics; in response to automatically extracting the plurality of co-occurring topics, extracting, by a synchronizing framework computer, a plurality of topic identifiers from the plurality of co-occurring topics; creating, by the synchronizing framework computer, a master topic computer model for the document corpus from a first plurality of term vectors; creating, by the synchronizing framework computer, a periodic new topic computer model by comparing topic significance among the plurality of topic identifiers, the periodic new topic computer model including a second plurality of term vectors; and selecting, by the synchronizing framework computer, one or more new topics by identifying one or more term vectors from the second plurality of term vectors in the periodic new topic computer model that have no correlation with the first plurality of term vectors in the master topic computer model. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a database source computer module configured to extract data associated with a plurality of co-occurring topics in a document corpus; and a synchronizing framework computer module configured to; extract a plurality of topic identifies from the plurality of co-occurring topics; create a master topic computer model for the document corpus from a first plurality of term vectors; create a periodic new topic computer model by comparing topic significance among the plurality of topic identifiers, the periodic new topic computer model including a second plurality of term vectors; and select one or more new topics by identifying one or more term vectors from the second plurality of term vectors in the periodic new topic computer model that have no correlation with the first plurality of term vectors in the master topic computer model. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer readable medium having stored thereon computer executable instructions executed by a processor comprising:
-
automatically extracting, by a processor executing a database source computer module, from a document corpus data associated with a plurality of co-occurring topics; in response to automatically extracting the plurality of co-occurring topics, extracting, by the processor executing a synchronizing framework computer module, a plurality of topic identifiers from the plurality of co-occurring topics; creating, by the processor executing the synchronizing framework computer, a master topic computer model for the document corpus from a first plurality of term vectors; creating, by the processor executing the synchronizing framework computer, a periodic new topic computer model by comparing topic significance among the plurality of topic identifiers, the periodic new topic computer model including a second plurality of term vectors; and selecting, by the processor executing the synchronizing framework computer, one or more new topics by identifying one or more term vectors from the second plurality of term vectors in the periodic new topic computer model that have no correlation with the first plurality of term vectors in the master topic computer model. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification