Systems and methods for extracting patterns from graph and unstructered data
First Claim
1. A computer-implemented system for discovering communities, the system comprising:
- a computer-implemented topic modeling module for receiving documents as inputs, extracting topics in the documents, and constructing a first graph representing content similarities between the documents, said constructing comprising generating values of two or more parameters for topic modeling;
a computer-implemented community modeling module for uncovering relationships between entities associated with the documents and constructing a second graph representing the relationships between entities, said constructing comprising generating values of two or more community modeling parameters; and
a computer-implemented link modeling module for communicating with the topic-modeling module and the community modeling module, and predicting whether an edge in the first graph will be formed based on the uncovered relationship, predicting whether an edge in the second graph will be formed based on the extracted topics, and constructing an entity relationship graph based on the first graph and the second graph,wherein a community grouping for an entity is a variable hidden from a data collection process, said constructing said entity relationship graph comprising;
implementing a model that iterates until convergence;
estimating expected values of community hidden variables given current parameters for topic modeling and community modeling parameter;
updating values of said parameters for said topic modeling; and
updating values of said parameters for edge generation,using said model for said edge predicting, wherein resulting edges of said entity relationship graph associated with an entity friendship or partnering relationship are distinguished from edges of said entity relationship graph associated with topic similarity.
1 Assignment
0 Petitions
Accused Products
Abstract
A computing system receives input data having both graph and unstructured data and computes a current log likelihood of the input data. The computing system compares the current log likelihood with a previous log likelihood of the input data. If the current log likelihood is larger than the previous log likelihood, the computing system update topic modeling parameters, community modeling parameters, and the link generation parameter until the computing system obtains a maximal value of the log likelihood of the input data. Then, the computing system creates a graph indicating topic similarity between the input data based on the topic modeling parameters, creates another graph indicating community similarity between entities associated with the input data based on the community modeling parameters, and predicts a link existence between input data or entities based on the link generation parameter, the topic modeling parameter and the community modeling parameter.
10 Citations
14 Claims
-
1. A computer-implemented system for discovering communities, the system comprising:
-
a computer-implemented topic modeling module for receiving documents as inputs, extracting topics in the documents, and constructing a first graph representing content similarities between the documents, said constructing comprising generating values of two or more parameters for topic modeling; a computer-implemented community modeling module for uncovering relationships between entities associated with the documents and constructing a second graph representing the relationships between entities, said constructing comprising generating values of two or more community modeling parameters; and a computer-implemented link modeling module for communicating with the topic-modeling module and the community modeling module, and predicting whether an edge in the first graph will be formed based on the uncovered relationship, predicting whether an edge in the second graph will be formed based on the extracted topics, and constructing an entity relationship graph based on the first graph and the second graph, wherein a community grouping for an entity is a variable hidden from a data collection process, said constructing said entity relationship graph comprising; implementing a model that iterates until convergence; estimating expected values of community hidden variables given current parameters for topic modeling and community modeling parameter; updating values of said parameters for said topic modeling; and updating values of said parameters for edge generation, using said model for said edge predicting, wherein resulting edges of said entity relationship graph associated with an entity friendship or partnering relationship are distinguished from edges of said entity relationship graph associated with topic similarity.
-
-
2. A computer system for discovering a relationship between entities comprising:
-
a memory; a processor in communications with the memory, wherein the computer system is configured for performing a method comprising; receiving input data W representing a word vector matrix and input data G representing a link graph matrix having values that capture relationships amongst user or business entities; computing a current log likelihood of the input data W and G, the current likelihood of the input data being a probability distribution function of topic modeling parameters and community modeling parameters, the parameters representing topic similarity between unstructured texts of the entities and the community modeling parameters represent community similarity between the entities;
comparing the current log likelihood of the input data and a previous log likelihood of the input data computed previously;updating values of the parameters, if the current log likelihood is larger than the previous log likelihood; repeating the comparing and the updating until the current log likelihood becomes less than or equal to the previous log likelihood; and constructing at least one graph based on the updated values of the parameters, if the current log likelihood is less than or equal to the previous log likelihood, the at least one graph indicating the relationship between the entities. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer readable medium embodying computer program instructions being executed by a processor for causing a computer to perform method steps for clustering entities, said medium not a propagating signal, said method steps comprising:
-
receiving documents as inputs, extracting topics in the documents, and constructing a first graph representing content similarities between the documents, said constructing comprising generating values of two or more parameters for topic modeling; uncovering relationships between entities associated with the documents and constructing a second graph representing the relationships between entities, said constructing comprising generating values of two or more community modeling parameters; and predicting whether an edge in the first graph will be formed based on the uncovered relationship, and predicting whether an edge in the second graph will be formed based on the extracted topics, and constructing an entity relationship graph based on the first graph and the second graph, wherein a community grouping for an entity is a variable hidden from a data collection process, said constructing said entity relationship graph comprising; implementing a model that iterates until convergence; estimating expected values of community hidden variables given current parameters for topic modeling and community modeling parameter; updating values of said parameters for said topic modeling; and updating values of said parameters for edge generation, using said model for said edge predicting, wherein resulting edges of said entity relationship graph associated with an entity friendship or partnering relationship are distinguished from edges of said entity relationship graph associated with topic similarity.
-
-
11. A computer program product comprising computer usable medium having computer readable program code means embodied therein for causing functions to cluster entities, said medium not a propagating signal, the computer program code means in said computer program product for causing a computer to perform a method comprising
receiving input data W representing a word vector matrix and input data G representing a link graph matrix having values that capture relationships amongst user or business entities; -
computing a current log likelihood of the input data W and G, the current likelihood of the input data being a probability distribution function of topic modeling parameters and community modeling parameters, the parameters representing topic similarity between unstructured texts of the entities and the community modeling parameters represent community similarity between the entities;
comparing the current log likelihood of the input data and a previous log likelihood of the input data computed previously;updating values of the parameters, if the current log likelihood is larger than the previous log likelihood; repeating the comparing and the updating until the current log likelihood becomes less than or equal to the previous log likelihood; and constructing at least one graph based on the updated values of the parameters, if the current log likelihood is less than or equal to the previous log likelihood, the at least one graph indicating the relationship between the entities. - View Dependent Claims (12, 13, 14)
-
Specification