Method and system for latent dirichlet allocation computation using approximate counters

US 10,147,044 B2
Filed: 08/06/2015
Issued: 12/04/2018
Est. Priority Date: 02/04/2015
Status: Active Grant

First Claim

Patent Images

1. A method for identifying sets of correlated words comprising:

running an uncollapsed Gibbs sampler over a Dirichlet distribution of a plurality of words in a set of documents to produce sampler result data, further comprising;

representing one or more counts in the uncollapsed Gibbs sampler using one or more approximate counters, andusing one or more probabilistic techniques to increment the one or more approximate counters; and

determining, from the sampler result data, one or more sets of correlated words;

wherein the method is performed by one or more computing devices.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Herein is described a data-parallel algorithm for topic modeling in which the memory requirements are streamlined for implementation on a highly-parallel architecture, such as a GPU. Specifically, approximate counters are used in a large mixture model or clustering algorithm (e.g., an uncollapsed Gibbs sampler) to decrease memory usage over what is required when conventional counters are used. The decreased memory usage of the approximate counters allows a highly-parallel architecture with limited memory to process more computations for the large mixture model more efficiently. Embodiments describe binary Morris approximate counters, general Morris approximate counters, and Cs custom character rös approximate counters in the context of an uncollapsed Gibbs sampler, and, more specifically, for a Greedy Gibbs sampler.

22 Citations

View as Search Results

20 Claims

1. A method for identifying sets of correlated words comprising:
- running an uncollapsed Gibbs sampler over a Dirichlet distribution of a plurality of words in a set of documents to produce sampler result data, further comprising;
  
  representing one or more counts in the uncollapsed Gibbs sampler using one or more approximate counters, andusing one or more probabilistic techniques to increment the one or more approximate counters; and
  
  determining, from the sampler result data, one or more sets of correlated words;
  
  wherein the method is performed by one or more computing devices.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the one or more approximate counters comprise binary Morris approximate counters.
  - 3. The method of claim 1, wherein the one or more approximate counters comprise general Morris approximate counters.
  - 4. The method of claim 1, wherein the one or more approximate counters comprise Csrö
    - s approximate counters.
  - 5. The method of claim 1, wherein running the uncollapsed Gibbs sampler over the Dirichlet distribution of the plurality of words in the set of documents to produce sampler result data, further comprises representing one or more other counts in the uncollapsed Gibbs sampler using one or more conventional counters.
  - 6. The method of claim 1, wherein the one or more approximate counters comprise two or more of:
    - binary Morris approximate counters, general Morris approximate counters, Csrö
      
      s approximate counters, or conventional counters.
  - 7. The method of claim 1, wherein:
    - running the uncollapsed Gibbs sampler over the Dirichlet distribution comprises computing in parallel a plurality of values, including the one or more counts, for the uncollapsed Gibbs sampler; and
      
      the plurality of values are computed in a plurality of parallel Single Program Multiple Data (SPMD) units on a graphics processing unit (GPU).
  - 8. The method of claim 1, further comprising representing each of the one or more approximate counters using eight bits or fewer than eight bits.
  - 9. The method of claim 1, wherein the uncollapsed Gibbs sampler has at least one variable uncollapsed.
  - 10. The method of claim 1, wherein the uncollapsed Gibbs sampler is a Greedy Gibbs sampler.

11. One or more non-transitory computer-readable media storing one or more sequences of instructions for identifying sets of correlated words, wherein said one or more sequences of instructions, when executed by one or more processors, cause:
- running an uncollapsed Gibbs sampler over a Dirichlet distribution of a plurality of words in a set of documents to produce sampler result data, further comprising;
  
  representing one or more counts in the uncollapsed Gibbs sampler using one or more approximate counters, andusing one or more probabilistic techniques to increment the one or more approximate counters; and
  
  determining, from the sampler result data, one or more sets of correlated words.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The one or more non-transitory computer-readable media of claim 11, wherein the one or more approximate counters comprise binary Morris approximate counters.
  - 13. The one or more non-transitory computer-readable media of claim 11, wherein the one or more approximate counters comprise general Morris approximate counters.
  - 14. The one or more non-transitory computer-readable media of claim 11, wherein the one or more approximate counters comprise Csrö
    - s approximate counters.
  - 15. The one or more non-transitory computer-readable media of claim 11, wherein running the uncollapsed Gibbs sampler over the Dirichlet distribution of the plurality of words in the set of documents to produce sampler result data, further comprises representing one or more other counts in the uncollapsed Gibbs sampler using one or more conventional counters.
  - 16. The one or more non-transitory computer-readable media of claim 11, wherein the one or more approximate counters comprise two or more of:
    - binary Morris approximate counters, general Morris approximate counters, Csrö
      
      s approximate counters, or conventional counters.
  - 17. The one or more non-transitory computer-readable media of claim 11, wherein:
    - running the uncollapsed Gibbs sampler over the Dirichlet distribution comprises computing in parallel a plurality of values, including the one or more counts, for the uncollapsed Gibbs sampler; and
      
      the one or more sequences of instructions include instructions, that when executed by one or more processors, cause the plurality of values to be computed in a plurality of parallel Single Program Multiple Data (SPMD) units on a graphics processing unit (GPU).
  - 18. The one or more non-transitory computer-readable media of claim 11, wherein the one or more sequences of instructions include instructions, that when executed by one or more processors, cause representing each of the one or more approximate counters using eight bits or fewer than eight bits.
  - 19. The one or more non-transitory computer-readable media of claim 11, wherein the uncollapsed Gibbs sampler has at least one variable uncollapsed.
  - 20. The one or more non-transitory computer-readable media of claim 11, wherein the uncollapsed Gibbs sampler is a Greedy Gibbs sampler.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Oracle International Corporation (Oracle Corporation)
Original Assignee
Oracle International Corporation (Oracle Corporation)
Inventors
Steele, Jr., Guy L., Tristan, Jean-Baptiste
Primary Examiner(s)
Leroux, Etienne P
Assistant Examiner(s)
Samara, Husam Turki

Application Number

US14/820,169
Publication Number

US 20160224900A1
Time in Patent Office

1,216 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/355   Class or cluster creation o...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

G06F 9/5066   Algorithms for mapping a pl...

G06N 7/01   Probabilistic graphical mod...

Method and system for latent dirichlet allocation computation using approximate counters

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

22 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for latent dirichlet allocation computation using approximate counters

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links