Methods and systems for exact data match filtering

US 9,092,636 B2
Filed: 11/18/2009
Issued: 07/28/2015
Est. Priority Date: 11/18/2008
Status: Active Grant

First Claim

Patent Images

1. A method for preventing unauthorized disclosure of secure information, the method comprising:

receiving, by a protect agent installed at a first egress point, digital information including a first text, the first text including a plurality of words;

identifying, by the protect agent, a first candidate entity, the first candidate entity corresponding to a particular word of the plurality of words;

comparing, by the protect agent, the first candidate entity against a plurality of compressed registered entities stored in a lightweight entity database (LWED) stored locally to the protect agent;

upon determining, by the protect agent, that the first candidate entity matches against a first registered entity in the LWED, transmitting the first candidate entity to a remote server that can access a global entity database (GED), wherein the GED includes the plurality of registered entities in an uncompressed form, and wherein the remote server automatically generates a confirmation on whether the first candidate entity matches against a second registered entity in the GED;

receiving from the remote server, by the protect agent, the confirmation; and

performing, by the protect agent, a security action when the first candidate entity matches against a particular registered entity of the registered entities in the GED, wherein the security action is performed on the first text before the first text is disclosed through the first egress point.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A technique for efficiently preventing exact data words (“entities”) from unauthorized disclosure is disclosed. Protect agents installed at various egress points identify candidate entities from digital information desired to be disclosed by a user. The candidate entities are compared against registered entities stored in a lightweight entity database (LWED). If a candidate entity matches against a registered entity in the LWED, the protect agent initiates a security action. Alternately, the protect agent transmits the matching candidate entity to a global entity database (GED) server to receive additional confirmation on whether the candidate entity matches a registered entity. In some instances, the protect agent also receives (from the GED server) metadata information associated with the matching candidate entity. The protect agent utilizes the metadata information to initiate suitable security actions.

Citations

46 Claims

1. A method for preventing unauthorized disclosure of secure information, the method comprising:
- receiving, by a protect agent installed at a first egress point, digital information including a first text, the first text including a plurality of words;
  
  identifying, by the protect agent, a first candidate entity, the first candidate entity corresponding to a particular word of the plurality of words;
  
  comparing, by the protect agent, the first candidate entity against a plurality of compressed registered entities stored in a lightweight entity database (LWED) stored locally to the protect agent;
  
  upon determining, by the protect agent, that the first candidate entity matches against a first registered entity in the LWED, transmitting the first candidate entity to a remote server that can access a global entity database (GED), wherein the GED includes the plurality of registered entities in an uncompressed form, and wherein the remote server automatically generates a confirmation on whether the first candidate entity matches against a second registered entity in the GED;
  
  receiving from the remote server, by the protect agent, the confirmation; and
  
  performing, by the protect agent, a security action when the first candidate entity matches against a particular registered entity of the registered entities in the GED, wherein the security action is performed on the first text before the first text is disclosed through the first egress point.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein the identification of the first candidate entity from the plurality of words further comprises:
    - utilizing an entity format matcher to identify a first word from the plurality of words that matches a particular word-pattern.
  - 3. The method of claim 1, wherein the identification of the first candidate entity from the plurality of words further comprises:
    - utilizing a heuristics engine to skip over one or more words from the plurality of words based on a heuristic rule.
  - 4. The method of claim 3, wherein the heuristic rule includes one or more of:
    - skipping over a first word from the plurality of words when the first word matches a first stop word of a plurality of stop words;
      
      skipping over a second word from the plurality of words when the second word has a word-length that is shorter than a first word-length of a shortest registered entity of the plurality of compressed registered entities;
      
      orskipping over a third word from the plurality of words when the third word has a word-length that is longer than a second word-length of a longest registered entity of the plurality of registered entities.
  - 5. The method of claim 1, wherein the identification of the first candidate entity from the plurality of words further comprises identifying a first entity type associated with the candidate entity.
  - 6. The method of claim 5, wherein with the plurality of compressed registered entities stored in the the LWED are categorized according an entity type associated with each of with the plurality of compressed registered entities.
  - 7. The method of claim 6, wherein the comparison of the first candidate entity against the plurality of compressed registered entities further comprises:
    - identifying a subplurality of registered entities that are categorized based on the first entity type;
      
      comparing the first candidate entity against the subplurality of registered entities that are categorized based on the first entity type.
  - 8. The method of claim 1, wherein the first candidate entity is converted to a canonical format prior to being compared against the plurality of compressed registered entities, wherein the canonical format causes the protect agent to be impervious to differences in digital format and character encoding of the first candidate entity.
  - 9. The method of claim 1, wherein the registered entities stored in the GED correspond to entity words that are desired to be secured from unauthorized distribution.
  - 10. The method of claim 9, wherein the LWED is generated by:
    - generating hash-values for each registered entity of the plurality of registered entities; and
      
      storing the generated hash-values in a probabilistic data structure.
  - 11. The method of claim 10, wherein the probabilistic data structure is a Bloom filter.
  - 12. The method of claim 1, wherein the GED is stored in association with a remote server, and wherein the protect agent at the first egress point communicates with the GED utilizing a network.
  - 13. The method of claim 12, wherein each of the plurality of uncompressed registered entities in the GED is represented using a corresponding hash-value.
  - 14. The method of claim 1, wherein each of the the registered entities in the GED includes metadata information, the metadata information for a given registered entity including one or more of:
    - an entity type associated with the given registered entity;
      
      a location of the given registered entity within a particular document;
      
      oran origin information of a particular document.
  - 15. The method of claim 1, wherein the security action includes one or more of:
    - preventing the first text from being disclosed through the first egress point;
      
      logging transmission of the first text as a security violation;
      
      requiring a password from a user to allow the first text to be disclosed;
      
      blocking access by a user who transmitted the first text to the first text;
      
      sending out a security alert;
      
      orintegration of the first text with rights management information.

16. A method for preventing unauthorized disclosure of secure information, the method comprising:
- receiving, by a protect agent installed at a first egress point, digital information including a first text, the first text including a plurality of words;
  
  identifying, by the protect agent, a plurality of candidate entities, each of the plurality of candidate entities corresponding to a particular word of the plurality of words;
  
  identifying, by the protect agent, one or more matching candidate entities from the plurality of candidate entities that match against one of a plurality of lightweight entities stored in a lightweight entity database (LWED);
  
  upon determining one or more matching candidate entities, transmitting, by the protect agent, the one or more matching candidate entities to a remote server that can access a global entity database (GED), the GED including a plurality of registered entities identified to be secured against unauthorized disclosure, wherein the remote server performs a comparison of the one or more matching candidates entities with the plurality of registered entities in the GED to reduce a possibility that the identified one or more matching candidate entities are false positives, and wherein the remote server automatically generates acknowledgement whether each of the one or more matching candidate entities matches against one of the plurality of registered entities included in the GED;
  
  receiving, from the remote server, the acknowledgement; and
  
  performing, by the protect agent, a security action, when at least one of the one or more matching entities matches against one of the plurality of registered entities included in the GED, wherein the security action is performed on the first text before the first text is disclosed through the first egress point.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
- - 17. The method of claim 16, wherein each of the plurality of registered entities included in the GED is generated by:
    - generating a hash-value for a first entity that an organization protects from unauthorized disclosure; and
      
      associating the hash-value with metadata related to the first entity.
  - 18. The method of claim 17, wherein the metadata related to the first entity includes one or more of:
    - an entity type associated with the first entity;
      
      a location of the first entity within a particular document;
      
      oran origin information of a particular document.
  - 19. The method of claim 17, wherein the LWED is a compressed version of the GED.
  - 20. The method of claim 19, wherein the compressed version is generated by:
    - stripping metadata information associated with each of the hash-values corresponding to the plurality of registered entities in the GED; and
      
      storing the stripped hash-values in a probabilistic data structure.
  - 21. The method of claim 20, wherein the probabilistic data structure is a Bloom filter.
  - 22. The method of claim 16, wherein the identification of a particular candidate entity from the plurality of words further comprises:
    - utilizing an entity format matcher to identify a first word from the plurality of words that matches a particular word-pattern.
  - 23. The method of claim 16, wherein the security action includes one or more of:
    - preventing the first text from being disclosed through the first egress point;
      
      logging transmission of the first text as a security violation;
      
      requiring a password from a user to allow the first text to be disclosed;
      
      blocking access by a user who transmitted the first text to the first text;
      
      sending out a security alert;
      
      orintegration of the first text with rights management information.

24. A system for preventing unauthorized disclosure of secure information, the system having a processor and comprising:
- a receiving module to receive digital information including a first text, the first text including a plurality of words;
  
  a candidate ID module to identify a first candidate entity, the first candidate entity corresponding to a particular word of the plurality of words;
  
  a comparison module to compare the first candidate entity against a plurality of compressed registered entities stored in a lightweight entity database (LWED) stored locally;
  
  a communication module to transmit the first candidate entity to a remote server that can access a global entity database (GED), wherein the GED includes the plurality of registered entities in an uncompressed form, and wherein the remote server automatically generates a confirmation on whether the first candidate entity matches against a second registered entity in the GED;
  
  the communication module to receive from the remote server the confirmation; and
  
  a security action module to perform a security action when the first candidate entity matches against a particular registered entity of the registered entities in the GED, wherein the security action is performed on the first text before the first text is disclosed through the system.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 25. The system of claim 24, wherein the candidate ID module includes:
    - an entity format matcher to identify a first word from the plurality of words that matches a particular word-pattern.
  - 26. The system of claim 24, wherein the candidate ID module includes:
    - a heuristics engine to skip over one or more words from the plurality of words based on a heuristic rule.
  - 27. The system of claim 26, wherein the heuristic rule includes one or more of:
    - skipping over a first word from the plurality of words when the first word matches a first stop word of a plurality of stop words;
      
      skipping over a second word from the plurality of words when the second word has a word-length that is shorter than a first word-length of a shortest registered entity of the plurality of compressed registered entities;
      
      orskipping over a third word from the plurality of words when the third word has a word-length that is longer than a second word-length of a longest compressed registered entity of the plurality of registered entities.
  - 28. The system of claim 24, wherein the identification of the first candidate entity from the plurality of words by the candidate ID module further comprises identifying a first entity type associated with the candidate entity.
  - 29. The system of claim 28, wherein the plurality of compressed registered entities stored in the LWED are categorized according an entity type associated with each of the plurality of compressed registered entities.
  - 30. The system of claim 29, wherein the comparison of the first candidate entity against the plurality of compressed registered entities by the comparison module further comprises:
    - identifying a subplurality of registered entities that are categorized based on the first entity type;
      
      comparing the first candidate entity against the subplurality of registered entities that are categorized based on the first entity type.
  - 31. The system of claim 24, wherein the first candidate entity is converted to a canonical format prior to being compared against the plurality of compressed registered entities, wherein the canonical format causes the comparison module to be impervious to differences in digital format and character encoding of the first candidate entity.
  - 32. The system of claim 24, wherein the registered entities stored in the GED correspond to entity words that are desired to be secured from unauthorized distribution.
  - 33. The system of claim 24, wherein the compressed registered entities stored in the LWED are is generated by:
    - generating hash-values for each registered entity of the plurality of registered entities in the GED; and
      
      storing the generated hash-values in a probabilistic data structure.
  - 34. The system of claim 33, wherein the probabilistic data structure is a Bloom filter.
  - 35. The system of claim 24, wherein the GED is stored in association with a remote server, and wherein the communication module communicates with the GED utilizing a network.
  - 36. The system of claim 35, wherein each of the registered entities in the GED is represented using a corresponding hash-value.
  - 37. The system of claim 36, wherein each of the registered entities in the GED includes metadata information, the metadata information for a given registered entity including one or more of:
    - an entity type associated with the given registered entity;
      
      a location of the given registered entity within a particular document;
      
      oran origin information of a particular document.
  - 38. The system of claim 24, wherein the security action performed by the security module includes one or more of:
    - preventing the first text from being disclosed through the system;
      
      logging transmission of the first text as a security violation;
      
      requiring a password from a user to allow the first text to be disclosed;
      
      blocking access by a user who transmitted the first text to the first text;
      
      sending out a security alert;
      
      orintegration of the first text with rights management information.

39. A system for preventing unauthorized disclosure of secure information, the system comprising:
- a processor;
  
  a network interface through which to communicate with one or more remote servers over a network;
  
  a memory storing code which, when executed by the processor, causes the processor to perform a plurality of operations, including;
  
  receiving, by a protect agent installed at a first egress point, digital information including a first text, the first text including a plurality of words;
  
  identifying, by the protect agent, a plurality of candidate entities, each of the plurality of candidate entities corresponding to a particular word of the plurality of words;
  
  identifying, by the protect agent, one or more matching candidate entities from the plurality of candidate entities that match against one of a plurality of lightweight entities stored in a lightweight entity database (LWED);
  
  upon determining one or more matching candidate entities, transmitting, by the protect agent, the one or more matching candidate entities to a remote server that can access a global entity database (GED), the GED including a plurality of registered entities identified to be secured against unauthorized disclosure, wherein the remote server performs a comparison of the one or more matching candidates entities with the plurality of registered entities in the GED to reduce a possibility that the identified one or more matching candidate entities are false positives, and wherein the remote server automatically generates acknowledgement whether each of the one or more matching candidate entities matches against one of the plurality of registered entities included in the GED;
  
  receiving, from the remote server, the acknowledgement; and
  
  performing, by the protect agent, a security action, when at least one of the one or more matching entities matches against one or the plurality of registered entities included in the GED, wherein the security action is performed on the first text before the first text is disclosed through the first egress point.
- View Dependent Claims (40, 41, 42, 43, 44, 45, 46)
- - 40. The system of claim 39, wherein each of the plurality of registered entities included in the GED is generated by:
    - generating a hash-value for a first entity that an organization protects to protect from unauthorized disclosure; and
      
      associating the hash-value with metadata related to the first entity.
  - 41. The system of claim 40, wherein the metadata related to the first entity includes one or more of:
    - an entity type associated with the first entity;
      
      a location of the first entity within a particular document;
      
      oran origin information of a particular document.
  - 42. The system of claim 40, wherein the LWED is a compressed version of the GED.
  - 43. The system of claim 42, wherein the compressed version is generated by:
    - stripping metadata information associated with each of the hash-values corresponding to the plurality of registered entities in the GED; and
      
      storing the stripped hash-values in a probabilistic data structure.
  - 44. The system of claim 43, wherein the probabilistic data structure is a Bloom filter.
  - 45. The system of claim 39, wherein the identification of a particular candidate entity from the plurality of words further comprises:
    - utilizing an entity format matcher to identify a first word from the plurality of words that matches a particular word-pattern.
  - 46. The system of claim 39, wherein the security action includes one or more of:
    - preventing the first text from being disclosed through the first egress point;
      
      logging transmission of the first text as a security violation;
      
      requiring a password from a user to allow the first text to be disclosed;
      
      blocking access by a user who transmitted the first text to the first text;
      
      sending out a security alert;
      
      orintegration of the first text with rights management information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Freedom Solutions Group, LLC
Original Assignee
Workshare Technology Incorporated
Inventors
More, Scott, Beyer, Ilya
Primary Examiner(s)
Gelagay, Shewaye
Assistant Examiner(s)
NGUYEN, TRONG H

Application Number

US12/621,429
Publication Number

US 20100299727A1
Time in Patent Office

2,078 Days
Field of Search

726 2- 4, 726 11- 12, 726 16- 17, 726/21, 726 26- 30, 1/1
US Class Current

1/1
CPC Class Codes

G06F 21/554   involving event detection a...

G06F 21/62   Protecting access to data v...

G06F 21/6245   Protecting personal data, e...

H04L 51/212   using filtering or selectiv...

H04L 63/083   using passwords cryptograph...

H04L 63/10   for controlling access to d...

Methods and systems for exact data match filtering

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

46 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for exact data match filtering

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

46 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links