METHODS AND SYSTEMS FOR MAPPING DATA ITEMS TO SPARSE DISTRIBUTED REPRESENTATIONS
First Claim
1. A computer-implemented method for identifying a level of similarity between a user-provided data item and a data item within a set of data documents, the method comprising:
- clustering, by a reference map generator executing on a first computing device, in a two-dimensional metric space, a set of data documents selected according to at least one criterion, generating a semantic map;
associating, by the semantic map, a coordinate pair with each of the set of data documents;
generating, by a parser executing on the first computing device, an enumeration of terms occurring in the set of data documents;
determining, by a representation generator executing on the first computing device, for each term in the enumeration, occurrence information including;
(i) a number of data documents in which the term occurs, (ii) a number of occurrences of the term in each data document, and (iii) the coordinate pair associated with each data document in which the term occurs;
generating, by the representation generator, for each term in the enumeration, a sparse distributed representation (SDR) using the occurrence information;
storing, in an SDR database, each of the generated SDRs;
receiving, by a filtering module executing on a second computing device, from a third computing device, a filtering criterion;
generating, by the representation generator, for the filtering criterion, at least one SDR;
receiving, by the filtering module, a plurality of streamed documents from a data source;
generating, by the representation generator, for a first of the plurality of streamed documents, a compound SDR for a first of the plurality of streamed documents;
determining, by a similarity engine executing on the second computing device, a distance between the filtering criterion SDR and the generated compound SDR for the first of the plurality of streamed documents; and
acting, by the filtering module, on the first streamed document, based upon the determined distance.
3 Assignments
0 Petitions
Accused Products
Abstract
A method enables identification of a similarity level between a user-provided data item and a data item within a set of data documents. The method includes a representation generator determining, for each term in an enumeration of terms, occurrence information. The representation generator generates, for each term, a sparse distributed representation (SDR) using the occurrence information. The method includes receiving, by a filtering module, a filtering criterion. The method includes generating, by the representation generator, for the filtering criterion, at least one SDR. The method includes generating, by the representation generator, for a first of a plurality of streamed documents received from a data source, a compound SDR. The method includes determining, by a similarity engine executing on the second computing device, a distance between the filtering criterion SDR and the generated compound SDR. The method includes acting on the first streamed document, based upon the determined distance.
22 Citations
31 Claims
-
1. A computer-implemented method for identifying a level of similarity between a user-provided data item and a data item within a set of data documents, the method comprising:
-
clustering, by a reference map generator executing on a first computing device, in a two-dimensional metric space, a set of data documents selected according to at least one criterion, generating a semantic map; associating, by the semantic map, a coordinate pair with each of the set of data documents; generating, by a parser executing on the first computing device, an enumeration of terms occurring in the set of data documents; determining, by a representation generator executing on the first computing device, for each term in the enumeration, occurrence information including;
(i) a number of data documents in which the term occurs, (ii) a number of occurrences of the term in each data document, and (iii) the coordinate pair associated with each data document in which the term occurs;generating, by the representation generator, for each term in the enumeration, a sparse distributed representation (SDR) using the occurrence information; storing, in an SDR database, each of the generated SDRs; receiving, by a filtering module executing on a second computing device, from a third computing device, a filtering criterion; generating, by the representation generator, for the filtering criterion, at least one SDR; receiving, by the filtering module, a plurality of streamed documents from a data source; generating, by the representation generator, for a first of the plurality of streamed documents, a compound SDR for a first of the plurality of streamed documents; determining, by a similarity engine executing on the second computing device, a distance between the filtering criterion SDR and the generated compound SDR for the first of the plurality of streamed documents; and acting, by the filtering module, on the first streamed document, based upon the determined distance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification