METHODS AND SYSTEMS FOR MAPPING DATA ITEMS TO SPARSE DISTRIBUTED REPRESENTATIONS

US 20190332619A1
Filed: 07/12/2019
Published: 10/31/2019
Est. Priority Date: 08/07/2014
Status: Active Application

First Claim

Patent Images

1. A computer-implemented method for identifying a level of similarity between a user-provided data item and a data item within a set of data documents, the method comprising:

clustering, by a reference map generator executing on a first computing device, in a two-dimensional metric space, a set of data documents selected according to at least one criterion, generating a semantic map;

associating, by the semantic map, a coordinate pair with each of the set of data documents;

generating, by a parser executing on the first computing device, an enumeration of terms occurring in the set of data documents;

determining, by a representation generator executing on the first computing device, for each term in the enumeration, occurrence information including;

(i) a number of data documents in which the term occurs, (ii) a number of occurrences of the term in each data document, and (iii) the coordinate pair associated with each data document in which the term occurs;

generating, by the representation generator, for each term in the enumeration, a sparse distributed representation (SDR) using the occurrence information;

storing, in an SDR database, each of the generated SDRs;

receiving, by a filtering module executing on a second computing device, from a third computing device, a filtering criterion;

generating, by the representation generator, for the filtering criterion, at least one SDR;

receiving, by the filtering module, a plurality of streamed documents from a data source;

generating, by the representation generator, for a first of the plurality of streamed documents, a compound SDR for a first of the plurality of streamed documents;

determining, by a similarity engine executing on the second computing device, a distance between the filtering criterion SDR and the generated compound SDR for the first of the plurality of streamed documents; and

acting, by the filtering module, on the first streamed document, based upon the determined distance.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method enables identification of a similarity level between a user-provided data item and a data item within a set of data documents. The method includes a representation generator determining, for each term in an enumeration of terms, occurrence information. The representation generator generates, for each term, a sparse distributed representation (SDR) using the occurrence information. The method includes receiving, by a filtering module, a filtering criterion. The method includes generating, by the representation generator, for the filtering criterion, at least one SDR. The method includes generating, by the representation generator, for a first of a plurality of streamed documents received from a data source, a compound SDR. The method includes determining, by a similarity engine executing on the second computing device, a distance between the filtering criterion SDR and the generated compound SDR. The method includes acting on the first streamed document, based upon the determined distance.

22 Citations

View as Search Results

31 Claims

1. A computer-implemented method for identifying a level of similarity between a user-provided data item and a data item within a set of data documents, the method comprising:
- clustering, by a reference map generator executing on a first computing device, in a two-dimensional metric space, a set of data documents selected according to at least one criterion, generating a semantic map;
  
  associating, by the semantic map, a coordinate pair with each of the set of data documents;
  
  generating, by a parser executing on the first computing device, an enumeration of terms occurring in the set of data documents;
  
  determining, by a representation generator executing on the first computing device, for each term in the enumeration, occurrence information including;
  
  (i) a number of data documents in which the term occurs, (ii) a number of occurrences of the term in each data document, and (iii) the coordinate pair associated with each data document in which the term occurs;
  
  generating, by the representation generator, for each term in the enumeration, a sparse distributed representation (SDR) using the occurrence information;
  
  storing, in an SDR database, each of the generated SDRs;
  
  receiving, by a filtering module executing on a second computing device, from a third computing device, a filtering criterion;
  
  generating, by the representation generator, for the filtering criterion, at least one SDR;
  
  receiving, by the filtering module, a plurality of streamed documents from a data source;
  
  generating, by the representation generator, for a first of the plurality of streamed documents, a compound SDR for a first of the plurality of streamed documents;
  
  determining, by a similarity engine executing on the second computing device, a distance between the filtering criterion SDR and the generated compound SDR for the first of the plurality of streamed documents; and
  
  acting, by the filtering module, on the first streamed document, based upon the determined distance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 2. The method of claim 1, wherein receiving, by the filtering module, the filtering criterion, further comprises receiving at least one brand-related term.
  - 3. The method of claim 1, wherein receiving, by the filtering module, the filtering criterion, further comprises receiving at least one security-related term.
  - 4. The method of claim 1, wherein receiving, by the filtering module, the filtering criterion, further comprises receiving at least one virus signature.
  - 5. The method of claim 1, wherein receiving, by the filtering module, the filtering criterion, further comprises receiving at least one SDR.
  - 6. The method of claim 1, wherein generating, for the filtering criterion, the SDR, further comprises:
    - determining whether the filtering criterion is an SDR; and
      
      generating the SDR based upon a determining that the filtering criterion is not an SDR.
  - 7. The method of claim 1, wherein generating, for the filtering criterion, the SDR, further comprises determining not to generate the SDR based upon a determination that the filtering criterion is an SDR.
  - 8. The method of claim 1, wherein receiving, by the filtering module, the plurality of streamed documents further comprises receiving, by the filtering module, a plurality of social media text documents.
  - 9. The method of claim 1, wherein receiving, by the filtering module, the plurality of streamed documents further comprises receiving, by the filtering module, a plurality of network packets.
  - 10. The method of claim 1, wherein generating the compound SDR further comprises generating, by the representation generator, for the first of the plurality of streamed documents, the compound SDR for a first of the plurality of streamed documents, before receiving a second of the plurality of streamed documents.
  - 11. The method of claim 1, wherein acting further comprises forwarding, by the filtering module, to the third computing device, the streamed document.
  - 12. The method of claim 1, wherein acting further comprises determining, by the filtering module, not to forward the streamed document to the third computing device.
  - 13. The method of claim 1, wherein acting further comprises determining, by the filtering module, whether to transmit an alert to the third computing device, based upon the determined distance.
  - 14. The method of claim 13 further comprising determining, by the filtering module, whether to transmit an alert to the third computing device, based upon the determined distance and the filtering criterion.
  - 15. The method of claim 1 further comprising:
    - receiving, by the filtering module, a second plurality of streamed documents from a second data source;
      
      generating, for a first of the second plurality of streamed documents, a compound SDR;
      
      determining, by the similarity engine, a distance between the generated compound SDR for the first of the second plurality of streamed documents and the generated compound SDR for the first of the first plurality of streamed documents; and
      
      determining, by the filtering module, whether to forward, to the third computing device, the first of the second plurality of streamed documents, based upon the determined distance.
  - 16. The method of claim 1, wherein generating the enumeration of terms further comprises generating an enumeration of virus signatures occurring in the set of data documents.
  - 17. The method of claim 16, wherein determining the occurrence information further comprises determining, for each virus signature in the enumeration, occurrence information including:
    - (i) a number of data documents in which the virus signature occurs, (ii) a number of occurrences of the virus signature in each data document, and (iii) the coordinate pair associated with each data document in which the virus signature occurs.
  - 18. The method of claim 17, wherein generating, for each term in the enumeration, the SDR further comprises generating, for each virus signature in the enumeration, the SDR.
  - 19. The method of claim 16 further comprising decomposing each virus signature in the enumeration into a plurality of sub-units, based upon a protocol.
  - 20. The method of claim 19 further comprising decomposing each sub-unit in the enumeration into at least one value.
  - 21. The method of claim 20 further comprising determining, for each value of each of the plurality of sub-units of the virus signature in the enumeration, occurrence information including:
    - (i) a number of data documents in which the value occurs, (ii) a number of occurrences of the value in each data document, and (iii) the coordinate pair associated with each data document in which the value occurs.
  - 22. The method of claim 21, wherein generating, for each term in the enumeration, the SDR further comprises generating, for each value in the enumeration, the SDR.
  - 23. The method of claim 21, wherein generating, for each term in the enumeration, the SDR further comprises generating, for each sub-unit in the enumeration, the SDR.
  - 24. The method of claim 23, wherein generating, for each term in the enumeration, the SDR further comprises generating a compound SDR for each virus signature in the SDR, based on the generated sub-unit SDRs.
  - 25. The method of claim 1, wherein acting further comprises forwarding the first of the plurality of streamed documents to a client agent executing on the third machine.
  - 26. The method of claim 25, wherein the client agent executes on a router.
  - 27. The method of claim 25, wherein the client agent executes on a web server.
  - 28. The method of claim 25, wherein the client agent executes on a network device.
  - 29. The method of claim 1, wherein acting further comprises adding the first of the plurality of streamed documents to a sub-stream of streamed documents.
  - 30. The method of claim 29 further comprising storing the sub-stream in a database accessible by a client agent executing on the third machine.
  - 31. The method of claim 30 further comprising responding to a polling request from the client agent by transmitting the sub-stream to the client agent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cortical IO AG
Original Assignee
Cortical IO AG
Inventors
De Sousa Webber, Francisco Eduardo

Application Number

US16/510,108
Publication Number

US 20190332619A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/313   Selection or weighting of t...

G06F 16/3329   Natural language query form...

G06F 16/3346   using probabilistic model

G06F 16/35   Clustering; Classification

G06F 18/2136   based on sparsity criteria,...

G06F 18/22   Matching criteria, e.g. pro...

G06F 21/564   by virus signature recognition

G06F 40/221   Parsing markup language str...

G06F 40/30   Semantic analysis

G06V 10/761   Proximity, similarity or di...

METHODS AND SYSTEMS FOR MAPPING DATA ITEMS TO SPARSE DISTRIBUTED REPRESENTATIONS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

22 Citations

31 Claims

Specification

Use Cases

Quick Links

Others

METHODS AND SYSTEMS FOR MAPPING DATA ITEMS TO SPARSE DISTRIBUTED REPRESENTATIONS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

31 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others