Mining geographic knowledge using a location aware topic model
First Claim
1. A method in a computing device for identifying a location associated with a first document, the method comprising:
- providing a collection of documents of words, each document labeled with an associated location, the collection not including the first document;
generating by the computing device collection level parameters for a latent Dirichlet allocation style model for the collection of documents that is based on latent topics and the location of each document, the collection level parameters indicating a probability that a document in the collection relates to each latent topic, a probability that each word of the collection relates to each latent topic, and a probability that each location of the collection relates to each latent topic, wherein a variational expectation maximization algorithm is used to estimate the collection level parameters that are a maximization of a lower bound on the collection level parameters represented by a summation for each document in the collection of the log of the conditional probability of the document and its location given the collection level parameters;
for each location, estimating, using the collection level parameters, a probability that the location is associated with the first document based on an aggregation of, for each topic, the conditional probability of the location given the topic and the conditional probability of the topic given the document, the conditional probabilities being derived from the collection of documents in which each document is labeled with an associated location; and
selecting the location with the highest probability as the location associated with the first document.
2 Assignments
0 Petitions
Accused Products
Abstract
Mining geographic knowledge using a location aware topic model is provided. A location system estimates topics and locations associated with documents based on a location aware topic (“LAT”) model. The location system generates the model from a collection of documents that are labeled with their associated locations. The location system generates collection level parameters based on an LDA-style model. To generate the collection level parameters, the location system estimates probabilities of latent topics, locations, and words of the collection. After the model is generated, the location system uses the collection level parameters to estimate probabilities of topics and locations being associated with target documents.
-
Citations
14 Claims
-
1. A method in a computing device for identifying a location associated with a first document, the method comprising:
-
providing a collection of documents of words, each document labeled with an associated location, the collection not including the first document; generating by the computing device collection level parameters for a latent Dirichlet allocation style model for the collection of documents that is based on latent topics and the location of each document, the collection level parameters indicating a probability that a document in the collection relates to each latent topic, a probability that each word of the collection relates to each latent topic, and a probability that each location of the collection relates to each latent topic, wherein a variational expectation maximization algorithm is used to estimate the collection level parameters that are a maximization of a lower bound on the collection level parameters represented by a summation for each document in the collection of the log of the conditional probability of the document and its location given the collection level parameters; for each location, estimating, using the collection level parameters, a probability that the location is associated with the first document based on an aggregation of, for each topic, the conditional probability of the location given the topic and the conditional probability of the topic given the document, the conditional probabilities being derived from the collection of documents in which each document is labeled with an associated location; and selecting the location with the highest probability as the location associated with the first document. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-readable storage medium that is not a signal encoded with instructions for controlling a computing device to estimate topics and locations associated with target documents, by a method comprising:
-
providing a collection of documents of words, each word of a document associated with a location; generating collection level parameters for a latent Dirichlet allocation style model for the collection of documents based on latent topics and the location of each document, the collection level parameters relating to probabilities of latent topics, locations, and words of the collection, the collection level parameters indicating a probability that a document in the collection relates to each latent topic, a probability that each word of the collection relates to each latent topic, and a probability that each location of the collection relates to each latent topic, wherein a variational expectation maximization algorithm is used to estimate the collection level parameters that are a maximization of a lower bound on the collection level parameters represented by a summation for each document in the collection of the log of the conditional probability of the document and its location given the collection level parameters; and estimating using the collection level parameters probabilities of topics and locations being associated with each of the target documents; and for each target document, selecting the location with the highest estimated probability as the location associated with that target document. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computing device for determining topics and locations associated with a target document, comprising:
-
a document store having a collection of documents of words, each word of a document associated with a location; a memory storing computer-implemented instructions of a component that generates collection level parameters for a latent Dirichlet allocation model for the collection of documents based on latent topics and the location of each document, the collection level parameters relating to probabilities of latent topics, locations, and words of the collection including a probability that each location of the collection relates to each latent topic, the collection level parameters indicating a probability that a document in the collection relates to each latent topic, a probability that each word of the collection relates to each latent topic, and a probability that each location of the collection relates to each latent topic, wherein a variational expectation maximization algorithm is used to estimate the collection level parameters that are a maximization of a lower bound on the collection level parameters represented by a summation for each document in the collection of the log of the conditional probability of the document and its location given the collection level parameters; and a component that estimates using the collection level parameters probabilities of topics and locations being associated with the target document; and a component that selects the location with the highest estimated probability for a target document as the location associated with that target document; and a processor for executing the computer-implemented instructions stored in memory. - View Dependent Claims (14)
-
Specification