METHOD OF AUTOMATED DISCOVERY OF TOPICS RELATEDNESS

US 20150154305A1
Filed: 12/02/2014
Published: 06/04/2015
Est. Priority Date: 12/02/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

generating, via a first topic model computer, a first term vector identifying a first topic in a plurality of documents in a document corpus;

generating, via a second topic model computer, a second term vector identifying a second topic in the plurality of documents in the document corpus;

linking, via a topic detection computer, each of the first and second topics across the plurality of documents in the document corpus;

assigning, via the topic detection computer, a relatedness score to each of the linked first and second topics based on co-occurrence of each of the linked first and second topics across the plurality of documents in the document corpus; and

determining, via the topic detection computer, whether the first and second linked topics are related across the plurality of documents in the document corpus based at least in part on the relatedness score.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer system and method for automated discovery of topic relatedness are disclosed. According to an embodiment, topics within documents from a corpus may be discovered by applying multiple topic identification (ID) models, such as multi-component latent Dirichlet allocation (MC-LDA) or similar methods. Each topic model may differ in a number of topics. Discovered topics may be linked to the associated document. Relatedness between discovered topics may be determined by analyzing co-occurring topic IDs from the different models, assigning topic relatedness scores, where related topics may be used for matching/linking a feature of interest. The disclosed method may have an increased disambiguation precision, and may allow the matching and linking of documents using the discovered relationships.

Citations

20 Claims

1. A method comprising:
- generating, via a first topic model computer, a first term vector identifying a first topic in a plurality of documents in a document corpus;
  
  generating, via a second topic model computer, a second term vector identifying a second topic in the plurality of documents in the document corpus;
  
  linking, via a topic detection computer, each of the first and second topics across the plurality of documents in the document corpus;
  
  assigning, via the topic detection computer, a relatedness score to each of the linked first and second topics based on co-occurrence of each of the linked first and second topics across the plurality of documents in the document corpus; and
  
  determining, via the topic detection computer, whether the first and second linked topics are related across the plurality of documents in the document corpus based at least in part on the relatedness score.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the topic detection computer determines one or more differences between the first and second topics to identify at least one new topic and adds the at least one new topic as a model parameter to the first topic model computer.
  - 3. The method of claim 1, wherein the first topic model computer executes a master topic computer model based on a multi-component extension of latent Dirichlet allocation having a first set of model parameters.
  - 4. The method of claim 3, wherein the second topic model computer executes a periodic new topic computer model based on the multi-component extension of latent Dirichlet allocation having a second set of model parameters different from the first set of model parameters.
  - 5. The method of claim 1, further comprising generating, via the topic detection computer, a hierarchy of related topics in the corpus of documents.
  - 6. The method of claim 1, wherein the first topic model computer and the second topic model computer execute respective first and second topic computer models having one or more parameters selected from the group consisting of a multi-document component, a vocabulary size, and a parameter setting for a prior Dirichlet distribution on a topic term.
  - 7. The method of claim 1, wherein the linking step further comprising generating a graphical representation of co-occurring linked topics across the plurality of documents in the corpus.

8. A system comprising:
- a first topic model computer comprising a processor configured to generate a first term vector identifying a first topic in a plurality of documents in a document corpus;
  
  a second topic model computer comprising a processor configured to generate a second term vector identifying a second topic in the plurality of documents in the document corpus; and
  
  a topic detection computer comprising a processor configured to;
  
  (a) link each of the first and second topics across the plurality of documents in the document corpus,(b) assign a relatedness score to each of the linked first and second topics based on co-occurrence of each of the linked first and second topics across the plurality of documents in the document corpus, and(c) determine whether the first and second linked topics are related across the plurality of documents in the document corpus based at least in part on the relatedness score.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the topic detection computer is further configured to determine one or more differences between the first and second topics to identify at least one new topic and add the at least one new topic as a model parameter to the first topic model computer.
  - 10. The system of claim 8, wherein the first topic model computer executes a master topic computer model based on a multi-component extension of latent Dirichlet allocation having a first set of model parameters.
  - 11. The system of claim 10, wherein the second topic model computer executes a periodic new topic computer model based on the multi-component extension of latent Dirichlet allocation having a second set of model parameters different from the first set of model parameters.
  - 12. The system of claim 8, wherein the topic detection computer is further configured to generate a hierarchy of related topics in the corpus of documents.
  - 13. The system of claim 8, wherein the first topic model computer and the second topic model computer are configured to execute respective first and second topic computer models having one or more parameters selected from the group consisting of a multi-document component, a vocabulary size, and a parameter setting for a prior Dirichlet distribution on a topic term.
  - 14. The system of claim 8, wherein the topic detection computer is further configured to generate a graphical representation of co-occurring linked topics across the plurality of documents in the corpus.

15. A non-transitory computer readable medium having stored thereon computer executable instructions comprising:
- generating, via a first topic model computer module of a computer, a first term vector identifying a first topic in a plurality of documents in a document corpus;
  
  generating, via a second topic model computer module of the computer, a second term vector identifying a second topic in the plurality of documents in the document corpus;
  
  linking, via a topic detection computer module of the computer, each of the first and second topics across the plurality of documents in the document corpus;
  
  assigning, via the topic detection computer module of the computer, a relatedness score to each of the linked first and second topics based on co-occurrence of each of the linked first and second topics across the plurality of documents in the document corpus; and
  
  determining, via the topic detection computer module of the computer, whether the first and second linked topics are related across the plurality of documents in the document corpus based at least in part on the relatedness score.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer readable medium of claim 15 wherein the instructions further comprise determining, via the topic detection module, one or more differences between the first and second topics to identify at least one new topic and adds the at least one new topic as a model parameter to the first topic model computer module.
  - 17. The computer readable medium of claim 15 wherein the instructions further comprise executing, via the first topic model computer module, a master topic computer model based on a multi-component extension of latent Dirichlet allocation having a first set of model parameters.
  - 18. The computer readable medium of claim 17 wherein the instructions further comprise executing, via the second topic model computer module, a periodic new topic computer model based on the multi-component extension of latent Dirichlet allocation having a second set of model parameters different from the first set of model parameters.
  - 19. The computer readable medium of claim 15 wherein the instructions further comprise generating, via the topic detection computer module, a hierarchy of related topics in the corpus of documents.
  - 20. The computer readable medium of claim 15 wherein the instructions further comprise executing, via the first and second topic model computer modules, respective first and second topic computer models having one or more parameters selected from the group consisting of a multi-document component, a vocabulary size, and a parameter setting for a prior Dirichlet distribution on a topic term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Finch Computing LLC (Qbase, LLC)
Original Assignee
Qbase, LLC
Inventors
LIGHTNER, Scott, WECKESSER, Franz, BODDHU, Sanjay, DAVE, Rakesh, FLAGG, Robert

Granted Patent

US 9,542,477 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/10   File systems; File servers

G06F 16/3334   Selection or weighting of t...

G06F 16/3347   using vector based model

G06F 16/93   Document management systems

G06F 16/951   Indexing; Web crawling tech...

G06F 16/9535   Search customisation based ...

G06F 16/9538   Presentation of query results

METHOD OF AUTOMATED DISCOVERY OF TOPICS RELATEDNESS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD OF AUTOMATED DISCOVERY OF TOPICS RELATEDNESS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links