Methods and systems for mapping data items to sparse distributed representations

US 10,394,851 B2
Filed: 08/03/2015
Issued: 08/27/2019
Est. Priority Date: 08/07/2014
Status: Active Grant

First Claim

Patent Images

1. A method performed by at least one computer processor of each of a plurality of computing devices executing computer program instructions stored on at least one non-transitory computer-readable medium, wherein the computer program instructions are executable by the at least one computer processor to perform a method for enhancing a computing networking including a full-text search system through enhancement of queries based upon determining similarities between data items mapped to sparse distributed representations, the method comprising:

clustering in a two-dimensional metric space, by a reference map generator, executing on a first computing device, a set of data documents selected according to at least one criterion, generating a semantic map;

associating, by the semantic map, a coordinate pair with each of the set of data documents;

generating, by a parser executing on the first computing device, an enumeration of data items occurring in the set of data documents;

determining, by a representation generator executing on the first computing device, for each data item in the enumeration, occurrence information including;

(i) a number of data documents in which the data item occurs, (ii) a number of occurrences of the data item in each data document, and (iii) the coordinate pair associated with each data document in which the data item occurs;

generating, by the representation generator, a distributed representation using the occurrence information;

receiving, by a sparsifying module executing on the first computing device, an identification of a maximum level of sparsity;

reducing, by the sparsifying module, a total number of set bits within the distributed representation based on the maximum level of sparsity to generate a sparse distributed representation (SDR) having a normative fillgrade;

generating, by the representation generator and the sparsifying module, at least one SDR for each data item in the enumeration of data items occurring in the set of data documents;

storing, in an SDR database, each of the generated SDRs;

receiving, by a query expansion module executing on a second computing device, from a third computing device, a first term;

determining, by a similarity engine executing on a fourth computing device, a level of semantic similarity between a first SDR generated based on the first term and a second SDR of a second term, the second SDR retrieved from the SDR database;

transmitting, by the query expansion module, to a full-text search system, using the first term and the second term, a query for an identification of each of a subset of a second set of documents containing at least one term similar to at least one of the first term and the second term; and

transmitting, by the query expansion module, to the third computing device, the identification received from the full-text search system of each of the subset of the second set of documents containing at least one term similar to at least one of the first term and the second term.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of mapping data items to sparse distributed representations (SDRs) includes clustering in a two-dimensional metric space, by a reference map generator, a set of data documents selected according to at least one criterion, generating a semantic map. The semantic map associates a coordinate pair with each of the set of data documents. A parser generates an enumeration of data items occurring in the set of data documents. A representation generator determines, for each data item in the enumeration, occurrence information. The representation generator generates a distributed representation using the occurrence information. A sparsifying module receives an identification of a maximum level of sparsity. The sparsifying module reduces a total number of set bits within the distributed representation based on the maximum level of sparsity to generate an SDR having a normative fillgrade.

34 Citations

View as Search Results

11 Claims

1. A method performed by at least one computer processor of each of a plurality of computing devices executing computer program instructions stored on at least one non-transitory computer-readable medium, wherein the computer program instructions are executable by the at least one computer processor to perform a method for enhancing a computing networking including a full-text search system through enhancement of queries based upon determining similarities between data items mapped to sparse distributed representations, the method comprising:
- clustering in a two-dimensional metric space, by a reference map generator, executing on a first computing device, a set of data documents selected according to at least one criterion, generating a semantic map;
  
  associating, by the semantic map, a coordinate pair with each of the set of data documents;
  
  generating, by a parser executing on the first computing device, an enumeration of data items occurring in the set of data documents;
  
  determining, by a representation generator executing on the first computing device, for each data item in the enumeration, occurrence information including;
  
  (i) a number of data documents in which the data item occurs, (ii) a number of occurrences of the data item in each data document, and (iii) the coordinate pair associated with each data document in which the data item occurs;
  
  generating, by the representation generator, a distributed representation using the occurrence information;
  
  receiving, by a sparsifying module executing on the first computing device, an identification of a maximum level of sparsity;
  
  reducing, by the sparsifying module, a total number of set bits within the distributed representation based on the maximum level of sparsity to generate a sparse distributed representation (SDR) having a normative fillgrade;
  
  generating, by the representation generator and the sparsifying module, at least one SDR for each data item in the enumeration of data items occurring in the set of data documents;
  
  storing, in an SDR database, each of the generated SDRs;
  
  receiving, by a query expansion module executing on a second computing device, from a third computing device, a first term;
  
  determining, by a similarity engine executing on a fourth computing device, a level of semantic similarity between a first SDR generated based on the first term and a second SDR of a second term, the second SDR retrieved from the SDR database;
  
  transmitting, by the query expansion module, to a full-text search system, using the first term and the second term, a query for an identification of each of a subset of a second set of documents containing at least one term similar to at least one of the first term and the second term; and
  
  transmitting, by the query expansion module, to the third computing device, the identification received from the full-text search system of each of the subset of the second set of documents containing at least one term similar to at least one of the first term and the second term.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 further comprising determining a pattern representative of semantic contexts in which a data item in the set of data documents occurs.
  - 3. The method of claim 2, wherein a spatial distribution of coordinate pairs in the pattern reflects at least one semantic region in a context of which the data item occurred.
  - 4. The method of claim 1, wherein the sparse distributed representation provides a binary fingerprint of a semantic meaning of the data item.
  - 5. The method of claim 1, wherein associating further comprises associating a coordinate pair that identifies a location of a point representing one of the set of data documents in the two-dimensional metric space.
  - 6. The method of claim 1, wherein generating the distributed representation further comprises proportionally reducing the total number of set bits, proportional to the number of occurrences of each data item in the distributed representation.
  - 7. The method of claim 1, wherein generating further comprises proportionally reducing the total number of set bits, proportional to the number of occurrences of the data in a subset of the data documents.
  - 8. The method of claim 1, wherein generating further comprises generating a topological map of the sparse distributed representation.
  - 9. The method of claim 1 further comprising:
    - combining a first semantic fingerprint of a first data item and a second semantic fingerprint of a second data item to form a compound fingerprint;
      
      adding a number of set bits at each location within the compound fingerprint;
      
      proportionally reducing a total number of set bits using a threshold resulting in a normative fillgrade.
  - 10. The method of claim 9, wherein proportionally reducing further comprises applying a weighting scheme to reduce the total number of set bits, the weighting scheme also evaluating a number of bits surrounding a particular set bit.

11. A system for enhancing a computing networking including a full-text search through expansion of queries based upon determining similarities between data items mapped to sparse distributed representations, the system comprising:
- a semantic neural networkexecuting on a first computing device,clustering, in a two-dimensional metric space, a set of data documents selected according to at least one criterion, generating a semantic map, andassociating a coordinate pair with each of the set of data documents;
  
  a parser executing on the first computing device and generating an enumeration of data items occurring in the set of data documents;
  
  a representation generatorexecuting on the first computing device,determining, for each data item in the enumeration, occurrence information including;
  
  (i) a number of data documents in which the data item occurs, (ii) a number of occurrences of the data item in each data document, and (iii) the coordinate pair associated with each data document in which the data item occurs, andgenerating, a distributed representation using the occurrence information;
  
  a sparsifying moduleexecuting on the first computing device,receiving an identification of a maximum level of sparsity, andreducing a total number of set bits within the distributed representation based on the maximum level of sparsity to generate a sparse distributed representation (SDR) having a normative fillgrade;
  
  wherein the representation generator and the sparsifying module generate at least one SDR for each data item in the enumeration of data items occurring in the set of data documents;
  
  a query expansion module executing on a second computing device and receiving from a third computing device, a first term; and
  
  a similarity engine executing on a fourth computing device, receiving the first term from the query expansion module, and determining a level of semantic similarity between a first SDR generated based on the first term and a second SDR of a second term, the second SDR retrieved from the SDR databased;
  
  wherein the query expansion module transmits to a full-text search system, using the first term and the second term, a query for an identification of each of a subset of a second set of documents containing at least one term similar to at least one of the first term and the second term, andwherein the query expansion module transmits, to the third computing device, the identification received from the full-text search system of each of the subset of the second set of documents containing at least one term similar to at least one of the first term and the second term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cortical IO AG
Original Assignee
Cortical IO AG
Inventors
De Sousa Webber, Francisco Eduardo
Primary Examiner(s)
Beausoliel, Jr., Robert W
Assistant Examiner(s)
Khakhar, Nirav K

Application Number

US14/816,133
Publication Number

US 20160042053A1
Time in Patent Office

1,485 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 16/3338   Query expansion

G06F 16/35   Clustering; Classification

G06F 16/93   Document management systems

Methods and systems for mapping data items to sparse distributed representations

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

34 Citations

11 Claims

Specification

Use Cases

Quick Links

Others

Methods and systems for mapping data items to sparse distributed representations

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

11 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others