×

Document processing employing probabilistic topic modeling of documents represented as text words transformed to a continuous space

  • US 9,430,563 B2
  • Filed: 02/02/2012
  • Issued: 08/30/2016
  • Est. Priority Date: 02/02/2012
  • Status: Active Grant
First Claim
Patent Images

1. An apparatus comprising:

  • an electronic data processing device configured to;

    perform a modeling method including;

    constructing a set of word embedding transforms by operations including generating a term-document matrix whose elements represent occurrence frequencies for text words in documents of a set of documents and include inverse document frequency (IDF) scaling;

    applying the set of word embedding transforms to transform text words of a set of documents into K-dimensional word vectors in order to generate sets or sequences of word vectors representing the documents of the set of documents where K is an integer greater than or equal to two; and

    learning a probabilistic topic model comprising a mixture model including M mixture components representing M topics using the sets or sequences of word vectors representing the documents of the set of documents wherein the learned probabilistic topic model operates to assign probabilities for the topics of the probabilistic topic model to an input set or sequence of K-dimensional embedded word vectors; and

    perform a document processing method including;

    applying the set of word embedding transforms to transform text words of an input document into K-dimensional word vectors in order to generate a set or sequence of word vectors representing the input document; and

    applying the learned mixture model to the set or sequence of word vectors representing the input document in order to generate one of (1) a vector or histogram of topic probabilities representing the input document or (2) one or more Fisher vectors representing the input document.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×