HOLISTIC DISAMBIGUATION FOR ENTITY NAME SPOTTING
First Claim
1. A computer implemented method for reducing ambiguities in entity name spotting, comprising:
- performing an entity name spotting process in a data corpus;
identifying, based on said entity name spotting process, an ambiguous entity name, said ambiguous entity name comprising an entity name corresponding to at least two potentially matching categorical nodes of an activation network, said activation network comprising a plurality of predefined categorical nodes;
determining an activation level for each of said potentially matching categorical nodes;
selecting a most activated categorical node of said potentially matching categorical nodes, said most activated categorical node having a highest activation level of each potentially matching categorical node; and
outputting said most activated categorical node to a user to replace said ambiguous entity name.
1 Assignment
0 Petitions
Accused Products
Abstract
A method resolves ambiguous spotted entity names in a data corpus by determining an activation level value for each of a plurality of nodes corresponding to a single ambiguous entity name. The activation levels for each of the nodes may be modified by inputting outside domain knowledge corresponding to the nodes to increase the activation value of the nodes, spotting entity names corresponding to the nodes to increase the activation value of the nodes, searching the data corpus to spot newly posted entity names to increase the activation value of the nodes, and searching the data corpus to reduce or deactivate the activation value of the nodes by eliminating false positives. The ambiguous entity name is assigned to the node determined to have the highest activation level and is then outputted to a user.
-
Citations
20 Claims
-
1. A computer implemented method for reducing ambiguities in entity name spotting, comprising:
-
performing an entity name spotting process in a data corpus; identifying, based on said entity name spotting process, an ambiguous entity name, said ambiguous entity name comprising an entity name corresponding to at least two potentially matching categorical nodes of an activation network, said activation network comprising a plurality of predefined categorical nodes; determining an activation level for each of said potentially matching categorical nodes; selecting a most activated categorical node of said potentially matching categorical nodes, said most activated categorical node having a highest activation level of each potentially matching categorical node; and outputting said most activated categorical node to a user to replace said ambiguous entity name. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer implemented method for reducing ambiguities in entity name spotting, comprising:
-
creating one node for each entity name of interest of a data corpus comprising a plurality of entity names, each of said nodes having an activation level; inputting outside domain knowledge to change activation levels of said nodes; searching a user forum to spot entity names; modifying said activation levels on nodes corresponding to entity names spotted in said forum during said searching; searching said user forum for newly posted entity names; additionally modifying activation levels on nodes corresponding to said newly posted entity names; identifying ambiguous entity names from said data corpus, said ambiguous entity names each corresponding to a plurality of potentially matching nodes; selecting a most activated node from said potentially matching nodes, said most active node having a highest activation level; assigning said ambiguous entity name to said most activated node; once all ambiguous entity names have been assigned, continue searching said data corpus for next newly posted entity names; and outputting said most activated categorical node to a user to replace said ambiguous entity name, wherein said entity name comprises one of proper names, credentials and identifications. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer implemented method for reducing entity name spotting ambiguities in an activation network comprising a plurality of nodes representing predefined categories of items of interest within a domain, each of said nodes comprising an activation level, said method comprising:
-
gathering domain specific text from a plurality of wide area network accessible sources to provide a data corpus, said data corpus comprising a plurality of entity names; searching said data corpus to spot entity names, said entity names corresponding to at least one node of said plurality of nodes; modifying said activation level of said nodes based on entity names spotted during said searching; identifying an ambiguous entity name, said ambiguous entity name corresponding to a plurality of potentially matching nodes; selecting a most activated node of said potentially matching nodes, said most activated node having a highest activation level of each potentially matching node; and outputting said most activated categorical node to a user to replace said ambiguous entity name, wherein said entity name comprises one of proper names, credentials and identifications. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A computer implemented method for reducing ambiguities in entity name spotting within an activation network, comprising:
-
creating one node for each entity name of interest of a data corpus comprising a plurality of entity names, each of said nodes having an activation level; modifying a value of said activation level of said nodes by one of; inputting outside domain knowledge corresponding to said nodes to increase said activation value of said nodes; searching said data corpus to spot entity names corresponding to said nodes to increase said activation value of said nodes; and searching said data corpus to spot newly posted entity names to increase said activation value of said nodes; and searching said data corpus to one of reduce and deactivate said activation value of said nodes by eliminating false positives; identifying ambiguous entity names from said data corpus, said ambiguous entity names each corresponding to a plurality of potentially matching nodes; assigning said ambiguous entity name to said most activated node; and outputting said most activated categorical node to a user to replace said ambiguous entity name, wherein said entity name comprises one of proper names, credentials and identifications. - View Dependent Claims (17, 18, 19, 20)
-
Specification