Method for disambiguated features in unstructured text
First Claim
1. A method comprising:
- in response to receiving a search query from an end user device;
searching, by a node of a system, a set of candidate records including co-occurring features to identify one or more candidate records matching one or more extracted features, wherein an extracted feature that matches a candidate record is a primary feature,wherein the node comprises a main memory hosting an in-memory database, wherein the in-memory database stores a knowledge base of clusters, each cluster comprises a disambiguated primary feature with a unique identifier (“
unique ID”
), and a set of associated secondary features;
associating, by the node, each of the extracted features with one or more machine-generated topic identifiers (“
topic IDs”
);
disambiguating, by the node, each of the primary features from one another based on relatedness of topic IDs;
identifying, by the node, a set of secondary features associated with each primary feature based upon the relatedness of topic IDs;
disambiguating, by the node, each of the primary features from each of the secondary features in the associated set of secondary features based on relatedness of topic IDs;
linking, by the node, in real-time, as data is retrieved from the knowledgebase from the in-memory database, each primary feature to the associated set of secondary features to form a new cluster;
determining, by a disambiguation module of the in-memory database of the node, whether each of the new cluster matches an existing knowledgebase cluster by assignment of relative matching scores to existing knowledge clusters with disambiguated primary features, wherein,when there is a match, determining, an existing unique ID corresponding to each matching primary feature in the existing knowledgebase cluster and updating the existing knowledgebase cluster to include the new cluster;
when there is no match, creating, a new knowledgebase cluster and assigning a new unique ID to the primary feature of the new knowledgebase cluster; and
transmitting, one of the existing unique ID and the new unique ID for the primary feature to the user device.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for disambiguating features in unstructured text is provided. The disclosed method may not require pre-existing links to be present. The method for disambiguating features in unstructured text may use co-occurring features derived from both the source document and a large document corpus. The disclosed method may include multiple modules, including a linking module for linking the derived features from the source document to the co-occurring features of an existing knowledge base. The disclosed method for disambiguating features may allow identifying unique entities from a knowledge base that includes entities with a unique set of co-occurring features, which in turn may allow for increased precision in knowledge discovery and search results, employing advanced analytical methods over a massive corpus, employing a combination of entities, co-occurring entities, topic IDs, and other derived features.
-
Citations
20 Claims
-
1. A method comprising:
-
in response to receiving a search query from an end user device; searching, by a node of a system, a set of candidate records including co-occurring features to identify one or more candidate records matching one or more extracted features, wherein an extracted feature that matches a candidate record is a primary feature, wherein the node comprises a main memory hosting an in-memory database, wherein the in-memory database stores a knowledge base of clusters, each cluster comprises a disambiguated primary feature with a unique identifier (“
unique ID”
), and a set of associated secondary features;associating, by the node, each of the extracted features with one or more machine-generated topic identifiers (“
topic IDs”
);disambiguating, by the node, each of the primary features from one another based on relatedness of topic IDs; identifying, by the node, a set of secondary features associated with each primary feature based upon the relatedness of topic IDs; disambiguating, by the node, each of the primary features from each of the secondary features in the associated set of secondary features based on relatedness of topic IDs; linking, by the node, in real-time, as data is retrieved from the knowledgebase from the in-memory database, each primary feature to the associated set of secondary features to form a new cluster; determining, by a disambiguation module of the in-memory database of the node, whether each of the new cluster matches an existing knowledgebase cluster by assignment of relative matching scores to existing knowledge clusters with disambiguated primary features, wherein, when there is a match, determining, an existing unique ID corresponding to each matching primary feature in the existing knowledgebase cluster and updating the existing knowledgebase cluster to include the new cluster; when there is no match, creating, a new knowledgebase cluster and assigning a new unique ID to the primary feature of the new knowledgebase cluster; and transmitting, one of the existing unique ID and the new unique ID for the primary feature to the user device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable medium having stored thereon computer executable instructions when executed by a processor performs functions comprising:
-
in response to receiving a search query from an end user device; searching, by a node of a system, a set of candidate records including co-occurring features to identify one or more candidates records matching one or more extracted features, wherein the node comprises a main memory hosting the in-memory database, wherein the node comprises a main memory hosting an in-memory database, wherein the in-memory database stores a knowledge base of clusters, each cluster comprises a disambiguated primary feature with a unique identifier (“
unique ID”
), and a set of associated secondary features;associating, by the node, each of the extracted features with one or more machine-generated topic identifiers (“
topic IDs”
);disambiguating, by the node, each of the primary features from one another based on relatedness of topic IDs; identifying, by the node, a set of secondary features associated with each primary feature based upon the relatedness of topic IDs; disambiguating, by the node, each of the primary features from each of the secondary features in the associated set of secondary features based on relatedness of topic IDs; linking, by the node, in real-time, as data is retrieved from the knowledgebase from the in-memory database, each primary feature to the associated set of secondary features to form a new cluster; determining, by a disambiguation module of the in-memory database of the node, whether each of the new cluster matches an existing knowledgebase cluster by assignment of relative matching scores to existing knowledge clusters with disambiguated primary features, wherein, when there is a match, determining, an existing unique ID corresponding to each matching primary feature in the existing knowledgebase cluster and updating the existing knowledgebase cluster to include the new cluster; when there is no match, creating, a new knowledgebase cluster and assigning a new unique ID to the primary feature of the new knowledgebase cluster; and transmitting, one of the existing unique ID and the new unique ID for the primary feature to the user device. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification