Method of automated discovery of new topics

US 9,177,262 B2
Filed: 12/02/2014
Issued: 11/03/2015
Est. Priority Date: 12/02/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

automatically extracting, by a database source computer, from a document corpus, data associated with a plurality of co-occurring topics;

in response to automatically extracting the plurality of co-occurring topics, extracting, by a synchronizing framework computer, a plurality of topic identifiers from the plurality of co-occurring topics;

creating, by the synchronizing framework computer, a master topic computer model for the document corpus from a first plurality of term vectors;

creating, by the synchronizing framework computer, a periodic new topic computer model by comparing topic significance among the plurality of topic identifiers, the periodic new topic computer model including a second plurality of term vectors; and

selecting, by the synchronizing framework computer, one or more new topics by identifying one or more term vectors from the second plurality of term vectors in the periodic new topic computer model that have no correlation with the first plurality of term vectors in the master topic computer model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure relates to a method for performing automated discovery of new topics from unlimited documents related to any subject domain, employing a multi-component extension of Latent Dirichlet Allocation (MC-LDA) topic models, to discover related topics in a corpus. The resulting data may contain millions of term vectors from any subject domain identifying the most distinguished co-occurring topics that users may be interested in, for periodically building new topic ID models using new content, which may be employed to compare one by one with existing model to measure the significance of changes, using term vectors differences with no correlation with a Periodic New Model, for periodic updates of automated discovery of new topics, which may be used to build a new topic ID model in-memory database to allow query-time linking on massive data-set for automated discovery of new topics.

92 Citations

View as Search Results

18 Claims

1. A method comprising:
- automatically extracting, by a database source computer, from a document corpus, data associated with a plurality of co-occurring topics;
  
  in response to automatically extracting the plurality of co-occurring topics, extracting, by a synchronizing framework computer, a plurality of topic identifiers from the plurality of co-occurring topics;
  
  creating, by the synchronizing framework computer, a master topic computer model for the document corpus from a first plurality of term vectors;
  
  creating, by the synchronizing framework computer, a periodic new topic computer model by comparing topic significance among the plurality of topic identifiers, the periodic new topic computer model including a second plurality of term vectors; and
  
  selecting, by the synchronizing framework computer, one or more new topics by identifying one or more term vectors from the second plurality of term vectors in the periodic new topic computer model that have no correlation with the first plurality of term vectors in the master topic computer model.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 further comprising adding, via the synchronizing framework computer module, the one or more new topics to the master topic computer model.
  - 3. The method of claim 1 further comprising:
    - receiving, via the database source computer module, an indication of a topic of interest to a user; and
      
      automatically extracting the data associated with the plurality of co-occurring topics based on the topic of interest.
  - 4. The method of claim 1 wherein the master topic computer model is a multi-component extension of a Latent Dirichlet Allocation (MC-LDA) topic model.
  - 5. The method of claim 1 wherein the periodic new topic computer model is a multi-component extension of a Latent Dirichlet Allocation (MC-LDA) topic model.
  - 6. The method of claim 1 wherein comparing topic significance among the plurality of topic identifiers is based on a predetermined significance threshold.

7. A system comprising:
- a database source computer module configured to extract data associated with a plurality of co-occurring topics in a document corpus; and
  
  a synchronizing framework computer module configured to;
  
  extract a plurality of topic identifies from the plurality of co-occurring topics;
  
  create a master topic computer model for the document corpus from a first plurality of term vectors;
  
  create a periodic new topic computer model by comparing topic significance among the plurality of topic identifiers, the periodic new topic computer model including a second plurality of term vectors; and
  
  select one or more new topics by identifying one or more term vectors from the second plurality of term vectors in the periodic new topic computer model that have no correlation with the first plurality of term vectors in the master topic computer model.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7 wherein the synchronizing framework computer module is further configured to add the one or more new topics to the master topic computer model.
  - 9. The system of claim 7 wherein the database source computer module is further configured to receive an indication of a topic of interest to a user for extraction of the data associated with the plurality of co-occurring topics based on the topic of interest.
  - 10. The system of claim 7 wherein the master topic computer model is a multi-component extension of a Latent Dirichlet Allocation (MC-LDA) topic model.
  - 11. The system of claim 7 wherein the periodic new topic computer model is a multi-component extension of a Latent Dirichlet Allocation (MC-LDA) topic model.
  - 12. The system of claim 7 wherein the synchronizing framework computer module is further configured to compare topic significance among the plurality of topic identifiers based on a predetermined significance threshold.

13. A non-transitory computer readable medium having stored thereon computer executable instructions executed by a processor comprising:
- automatically extracting, by a processor executing a database source computer module, from a document corpus data associated with a plurality of co-occurring topics;
  
  in response to automatically extracting the plurality of co-occurring topics, extracting, by the processor executing a synchronizing framework computer module, a plurality of topic identifiers from the plurality of co-occurring topics;
  
  creating, by the processor executing the synchronizing framework computer, a master topic computer model for the document corpus from a first plurality of term vectors;
  
  creating, by the processor executing the synchronizing framework computer, a periodic new topic computer model by comparing topic significance among the plurality of topic identifiers, the periodic new topic computer model including a second plurality of term vectors; and
  
  selecting, by the processor executing the synchronizing framework computer, one or more new topics by identifying one or more term vectors from the second plurality of term vectors in the periodic new topic computer model that have no correlation with the first plurality of term vectors in the master topic computer model.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer readable medium of claim 13, wherein the instructions further comprise adding, by the processor executing the synchronizing framework computer module, the one or more new topics to the master topic computer model.
  - 15. The computer readable medium of claim 13, wherein the instructions further comprise:
    - receiving, by the processor executing database source computer module, an indication of a topic of interest to a user; and
      
      automatically extracting the data associated with the plurality of co-occurring topics based on the topic of interest.
  - 16. The computer readable medium of claim 13, wherein the master topic computer model is a multi-component extension of a Latent Dirichlet Allocation (MC-LDA) topic model.
  - 17. The computer readable medium of claim 13, wherein the periodic new topic computer model is a multi-component extension of a Latent Dirichlet Allocation (MC-LDA) topic model.
  - 18. The computer readable medium of claim 13, wherein comparing topic significance among the plurality of topic identifiers is based on a predetermined significance threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Finch Computing LLC (Qbase, LLC)
Original Assignee
Qbase, LLC
Inventors
Lightner, Scott, Weckesser, Franz, Boddhu, Sanjay, Flagg, Robert
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
SITIRICHE, LUIS A

Application Number

US14/558,076
Publication Number

US 20150154148A1
Time in Patent Office

336 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 16/93   Document management systems

G06F 40/00   Handling natural language d...

G06F 40/205   Parsing

G06N 20/00   Machine learning

G06N 5/022   Knowledge engineering; Know...

G06N 5/04   Inference or reasoning models

Method of automated discovery of new topics

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

92 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Method of automated discovery of new topics

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links