Holistic disambiguation for entity name spotting
First Claim
1. A computer implemented method for reducing ambiguities in entity name spotting, comprising:
- performing an entity name spotting process in a data corpus;
identifying, based on said entity name spotting process, an ambiguous entity name, said ambiguous entity name comprising an entity name corresponding to at least two categorical nodes in a predefined domain of an activation network, said activation network comprising a plurality of predefined categorical nodes, each predefined categorical node having an initial activation level, and each said predefined categorical node having edges established between said predefined categorical nodes, said edges indicating a relationship between said predefined categorical nodes, said relationship between said predefined categorical nodes defining a direction of influence between said predefined categorical nodes, said predefined domain representing at least one predefined category of items of interest within said data corpus;
determining an updated activation level for each of said categorical nodes in said predefined domain, said updated activation level for each said predefined categorical node in said activation network being based on every edge between said predefined categorical nodes, each said updated activation level for each said predefined categorical node in said activation network being updated based on said relationship between said predefined categorical nodes, said updated activation level being based at least on information in said data corpus and context of said information,wherein for each categorical node, said determining said updated activation level comprising;
receiving metadata related to entity names in said data corpus to modify said activation level of said categorical nodes in said predefined domain;
analyzing said metadata to determine related categorical nodes to modify said activation level of said categorical nodes in said predefined domain;
analyzing semantic information of user comments to modify said activation level of said categorical nodes in said predefined domain; and
analyzing an ontology customized for said activation network to modify said activation level of said categorical nodes in said predefined domain,said entity names comprising one of proper names, credentials and identifications;
selecting a most activated categorical node of said categorical nodes in said predefined domain, said most activated categorical node having a highest updated activation level of each categorical node;
assigning said ambiguous entity name to said most activated categorical node; and
outputting said most activated categorical node to a user to replace said ambiguous entity name.
1 Assignment
0 Petitions
Accused Products
Abstract
A method resolves ambiguous spotted entity names in a data corpus by determining an activation level value for each of a plurality of nodes corresponding to a single ambiguous entity name. The activation levels for each of the nodes may be modified by inputting outside domain knowledge corresponding to the nodes to increase the activation value of the nodes, spotting entity names corresponding to the nodes to increase the activation value of the nodes, searching the data corpus to spot newly posted entity names to increase the activation value of the nodes, and searching the data corpus to reduce or deactivate the activation value of the nodes by eliminating false positives. The ambiguous entity name is assigned to the node determined to have the highest activation level and is then outputted to a user.
-
Citations
12 Claims
-
1. A computer implemented method for reducing ambiguities in entity name spotting, comprising:
-
performing an entity name spotting process in a data corpus; identifying, based on said entity name spotting process, an ambiguous entity name, said ambiguous entity name comprising an entity name corresponding to at least two categorical nodes in a predefined domain of an activation network, said activation network comprising a plurality of predefined categorical nodes, each predefined categorical node having an initial activation level, and each said predefined categorical node having edges established between said predefined categorical nodes, said edges indicating a relationship between said predefined categorical nodes, said relationship between said predefined categorical nodes defining a direction of influence between said predefined categorical nodes, said predefined domain representing at least one predefined category of items of interest within said data corpus; determining an updated activation level for each of said categorical nodes in said predefined domain, said updated activation level for each said predefined categorical node in said activation network being based on every edge between said predefined categorical nodes, each said updated activation level for each said predefined categorical node in said activation network being updated based on said relationship between said predefined categorical nodes, said updated activation level being based at least on information in said data corpus and context of said information, wherein for each categorical node, said determining said updated activation level comprising; receiving metadata related to entity names in said data corpus to modify said activation level of said categorical nodes in said predefined domain; analyzing said metadata to determine related categorical nodes to modify said activation level of said categorical nodes in said predefined domain; analyzing semantic information of user comments to modify said activation level of said categorical nodes in said predefined domain; and analyzing an ontology customized for said activation network to modify said activation level of said categorical nodes in said predefined domain, said entity names comprising one of proper names, credentials and identifications; selecting a most activated categorical node of said categorical nodes in said predefined domain, said most activated categorical node having a highest updated activation level of each categorical node; assigning said ambiguous entity name to said most activated categorical node; and outputting said most activated categorical node to a user to replace said ambiguous entity name. - View Dependent Claims (2, 3, 4)
-
-
5. A computer implemented method for reducing ambiguities in entity name spotting, comprising:
-
creating one node for each entity name of interest of a data corpus comprising a plurality of entity names to create nodes in an activation network, each of said nodes having an entity name and an initial activation level; establishing edges between said nodes, said edges indicating a relationship between said nodes, said relationship between said nodes defining a direction of influence between said nodes; determining an updated activation level for each of said nodes, said updated activation level being based at least on information in said data corpus; inputting outside domain knowledge to change said updated activation level of said nodes; searching a user forum to spot existing entity names; modifying said updated activation level on nodes corresponding to said existing entity names spotted in said user forum during said searching by utilizing semantic information comprising one of a domain specific forum and a domain specific blog, said updated activation level for each node in said activation network being modified based on said relationship between said nodes and context of said information; searching said user forum for newly posted entity names; modifying said updated activation level on nodes corresponding to said newly posted entity names by utilizing semantic information comprising one of a domain specific forum and a domain specific blog; identifying an ambiguous entity name from said data corpus, said ambiguous entity name corresponding to a plurality of potentially matching nodes; selecting a most activated node from said plurality of potentially matching nodes, said most activated node having a highest updated activation level; assigning said ambiguous entity name to said most activated node; once all ambiguous entity names have been assigned, searching said data corpus for a next newly posted entity name; and outputting said most activated node to a user to replace said ambiguous entity name, each of said plurality of entity names comprising one of proper names, credentials and identifications. - View Dependent Claims (6, 7, 8)
-
-
9. A computer implemented method for reducing entity name spotting ambiguities in an activation network, said method comprising:
-
providing an activation network comprising nodes representing predefined categories of items of interest within a domain, edges being established between said nodes, said edges indicating a relationship between said nodes, said relationship between said nodes defining a direction of influence between said nodes based on sources of information, each of said nodes comprising an initial activation level; determining an updated activation level for each of said nodes based at least on information in said domain, and each said updated activation level for each said node in said activation network being updated based on said relationship between said nodes and context of said information; gathering domain specific text from a plurality of wide area network accessible sources to provide a data corpus comprising a plurality of entity names; searching said data corpus to spot known entity names corresponding to at least one node; modifying said updated activation level of said nodes based on said known entity names spotted during said searching, said modifying comprising one of reduction and deactivation of said updated activation level by eliminating false positives using one of stop words and heuristics; identifying an ambiguous entity name corresponding to potentially matching nodes; selecting a most activated node of said potentially matching nodes, said most activated node having a highest updated activation level of each potentially matching node; assigning said ambiguous entity name to said most activated node; and outputting said most activated node to a user to replace said ambiguous entity name, each of said plurality of entity names comprising one of proper names, credentials and identifications. - View Dependent Claims (10, 11, 12)
-
Specification