METHOD FOR DISAMBIGUATED FEATURES IN UNSTRUCTURED TEXT
First Claim
1. A method comprising:
- searching, by a node of a system hosting an in-memory database, a set of candidate records to identify one or more candidates matching one or more extracted features, wherein an extracted feature that matches a candidate is a primary feature;
associating, by the node, each of the extracted features with one or more machine-generated topic identifiers (“
topic IDs”
);
disambiguating, by the node, each of the primary features from one another based on relatedness of topic IDs;
identifying, by the node, a set of secondary features associated with each primary feature based upon the relatedness of topic IDs;
disambiguating, by the node, each of the primary features from each of the secondary features in the associated set of secondary features based on relatedness of topic IDs;
linking, by the node, each primary feature to the associated set of secondary features to form a new cluster;
determining, by the node, whether the new cluster matches an existing knowledgebase cluster, wherein,when there is a match, determining, by the disambiguation module of the in-memory database server computer, an existing unique identifier (“
unique ID”
) corresponding to each matching primary feature in the knowledgebase cluster and updating the knowledgebase cluster to include the new cluster; and
when there is no match, creating, by the node, a new knowledgebase cluster and assigning a new unique ID to the primary feature of the new knowledgebase cluster; and
transmitting, by the node, one of the existing unique ID and the new unique ID for the primary feature.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for disambiguating features in unstructured text is provided. The disclosed method may not require pre-existing links to be present. The method for disambiguating features in unstructured text may use co-occurring features derived from both the source document and a large document corpus. The disclosed method may include multiple modules, including a linking module for linking the derived features from the source document to the co-occurring features of an existing knowledge base. The disclosed method for disambiguating features may allow identifying unique entities from a knowledge base that includes entities with a unique set of co-occurring features, which in turn may allow for increased precision in knowledge discovery and search results, employing advanced analytical methods over a massive corpus, employing a combination of entities, co-occurring entities, topic IDs, and other derived features.
-
Citations
20 Claims
-
1. A method comprising:
-
searching, by a node of a system hosting an in-memory database, a set of candidate records to identify one or more candidates matching one or more extracted features, wherein an extracted feature that matches a candidate is a primary feature; associating, by the node, each of the extracted features with one or more machine-generated topic identifiers (“
topic IDs”
);disambiguating, by the node, each of the primary features from one another based on relatedness of topic IDs; identifying, by the node, a set of secondary features associated with each primary feature based upon the relatedness of topic IDs; disambiguating, by the node, each of the primary features from each of the secondary features in the associated set of secondary features based on relatedness of topic IDs; linking, by the node, each primary feature to the associated set of secondary features to form a new cluster; determining, by the node, whether the new cluster matches an existing knowledgebase cluster, wherein, when there is a match, determining, by the disambiguation module of the in-memory database server computer, an existing unique identifier (“
unique ID”
) corresponding to each matching primary feature in the knowledgebase cluster and updating the knowledgebase cluster to include the new cluster; andwhen there is no match, creating, by the node, a new knowledgebase cluster and assigning a new unique ID to the primary feature of the new knowledgebase cluster; and transmitting, by the node, one of the existing unique ID and the new unique ID for the primary feature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium having stored thereon computer executable instructions comprising:
-
searching, by a node of a system hosting an in-memory database, a set of candidate records to identify one or more candidates matching one or more extracted features, wherein an extracted feature that matches a candidate is a primary feature; associating, by the node, each of the extracted features with one or more machine-generated topic identifiers (“
topic IDs”
);disambiguating, by the node, each of the primary features from one another based on relatedness of topic IDs; identifying, by the node, a set of secondary features associated with each primary feature based upon the relatedness of topic IDs; disambiguating, by the node, each of the primary features from each of the secondary features in the associated set of secondary features based on relatedness of topic IDs; linking, by the node, each primary feature to the associated set of secondary features to form a new cluster; determining, by the node, whether the new cluster matches an existing knowledgebase cluster, wherein, when there is a match, determining, by the node, an existing unique identifier (“
unique ID”
) corresponding to each matching primary feature in the knowledgebase cluster and updating the knowledgebase cluster to include the new cluster; andwhen there is no match, creating a new knowledgebase cluster and assigning a new unique ID to the primary feature of the new knowledgebase cluster; and transmitting, by the node, one of the existing unique ID and the new unique ID for the primary feature. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification