Methods and systems for mapping data items to sparse distributed representations
First Claim
1. A method performed by at least one computer processor of each of a plurality of computing devices executing computer program instructions stored on at least one non-transitory computer-readable medium, wherein the computer program instructions are executable by the at least one computer processor to perform a method for enhancing a computing networking including a full-text search system through enhancement of queries based upon determining similarities between data items mapped to sparse distributed representations, the method comprising:
- clustering in a two-dimensional metric space, by a reference map generator, executing on a first computing device, a set of data documents selected according to at least one criterion, generating a semantic map;
associating, by the semantic map, a coordinate pair with each of the set of data documents;
generating, by a parser executing on the first computing device, an enumeration of data items occurring in the set of data documents;
determining, by a representation generator executing on the first computing device, for each data item in the enumeration, occurrence information including;
(i) a number of data documents in which the data item occurs, (ii) a number of occurrences of the data item in each data document, and (iii) the coordinate pair associated with each data document in which the data item occurs;
generating, by the representation generator, a distributed representation using the occurrence information;
receiving, by a sparsifying module executing on the first computing device, an identification of a maximum level of sparsity;
reducing, by the sparsifying module, a total number of set bits within the distributed representation based on the maximum level of sparsity to generate a sparse distributed representation (SDR) having a normative fillgrade;
generating, by the representation generator and the sparsifying module, at least one SDR for each data item in the enumeration of data items occurring in the set of data documents;
storing, in an SDR database, each of the generated SDRs;
receiving, by a query expansion module executing on a second computing device, from a third computing device, a first term;
determining, by a similarity engine executing on a fourth computing device, a level of semantic similarity between a first SDR generated based on the first term and a second SDR of a second term, the second SDR retrieved from the SDR database;
transmitting, by the query expansion module, to a full-text search system, using the first term and the second term, a query for an identification of each of a subset of a second set of documents containing at least one term similar to at least one of the first term and the second term; and
transmitting, by the query expansion module, to the third computing device, the identification received from the full-text search system of each of the subset of the second set of documents containing at least one term similar to at least one of the first term and the second term.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of mapping data items to sparse distributed representations (SDRs) includes clustering in a two-dimensional metric space, by a reference map generator, a set of data documents selected according to at least one criterion, generating a semantic map. The semantic map associates a coordinate pair with each of the set of data documents. A parser generates an enumeration of data items occurring in the set of data documents. A representation generator determines, for each data item in the enumeration, occurrence information. The representation generator generates a distributed representation using the occurrence information. A sparsifying module receives an identification of a maximum level of sparsity. The sparsifying module reduces a total number of set bits within the distributed representation based on the maximum level of sparsity to generate an SDR having a normative fillgrade.
34 Citations
11 Claims
-
1. A method performed by at least one computer processor of each of a plurality of computing devices executing computer program instructions stored on at least one non-transitory computer-readable medium, wherein the computer program instructions are executable by the at least one computer processor to perform a method for enhancing a computing networking including a full-text search system through enhancement of queries based upon determining similarities between data items mapped to sparse distributed representations, the method comprising:
-
clustering in a two-dimensional metric space, by a reference map generator, executing on a first computing device, a set of data documents selected according to at least one criterion, generating a semantic map; associating, by the semantic map, a coordinate pair with each of the set of data documents; generating, by a parser executing on the first computing device, an enumeration of data items occurring in the set of data documents; determining, by a representation generator executing on the first computing device, for each data item in the enumeration, occurrence information including;
(i) a number of data documents in which the data item occurs, (ii) a number of occurrences of the data item in each data document, and (iii) the coordinate pair associated with each data document in which the data item occurs;generating, by the representation generator, a distributed representation using the occurrence information; receiving, by a sparsifying module executing on the first computing device, an identification of a maximum level of sparsity; reducing, by the sparsifying module, a total number of set bits within the distributed representation based on the maximum level of sparsity to generate a sparse distributed representation (SDR) having a normative fillgrade; generating, by the representation generator and the sparsifying module, at least one SDR for each data item in the enumeration of data items occurring in the set of data documents; storing, in an SDR database, each of the generated SDRs; receiving, by a query expansion module executing on a second computing device, from a third computing device, a first term; determining, by a similarity engine executing on a fourth computing device, a level of semantic similarity between a first SDR generated based on the first term and a second SDR of a second term, the second SDR retrieved from the SDR database; transmitting, by the query expansion module, to a full-text search system, using the first term and the second term, a query for an identification of each of a subset of a second set of documents containing at least one term similar to at least one of the first term and the second term; and transmitting, by the query expansion module, to the third computing device, the identification received from the full-text search system of each of the subset of the second set of documents containing at least one term similar to at least one of the first term and the second term. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for enhancing a computing networking including a full-text search through expansion of queries based upon determining similarities between data items mapped to sparse distributed representations, the system comprising:
-
a semantic neural network executing on a first computing device, clustering, in a two-dimensional metric space, a set of data documents selected according to at least one criterion, generating a semantic map, and associating a coordinate pair with each of the set of data documents; a parser executing on the first computing device and generating an enumeration of data items occurring in the set of data documents; a representation generator executing on the first computing device, determining, for each data item in the enumeration, occurrence information including;
(i) a number of data documents in which the data item occurs, (ii) a number of occurrences of the data item in each data document, and (iii) the coordinate pair associated with each data document in which the data item occurs, andgenerating, a distributed representation using the occurrence information; a sparsifying module executing on the first computing device, receiving an identification of a maximum level of sparsity, and reducing a total number of set bits within the distributed representation based on the maximum level of sparsity to generate a sparse distributed representation (SDR) having a normative fillgrade; wherein the representation generator and the sparsifying module generate at least one SDR for each data item in the enumeration of data items occurring in the set of data documents; a query expansion module executing on a second computing device and receiving from a third computing device, a first term; and a similarity engine executing on a fourth computing device, receiving the first term from the query expansion module, and determining a level of semantic similarity between a first SDR generated based on the first term and a second SDR of a second term, the second SDR retrieved from the SDR databased; wherein the query expansion module transmits to a full-text search system, using the first term and the second term, a query for an identification of each of a subset of a second set of documents containing at least one term similar to at least one of the first term and the second term, and wherein the query expansion module transmits, to the third computing device, the identification received from the full-text search system of each of the subset of the second set of documents containing at least one term similar to at least one of the first term and the second term.
-
Specification