System and method for keyword spotting using representative dictionary
First Claim
1. A method for searching input data for textual phrases, the method comprising:
- providing a system having an external memory containing a first dictionary of first textual phrases and a cache memory containing a second dictionary of second textual phrases, wherein the cache memory has a faster access speed than the external memory, and wherein the second dictionary represents the first dictionary but has a smaller data size than the first dictionary because the second textual phrases are sub-strings derived from the first textual phrases that are shorter than the first textual phrases;
receiving input data using the system;
searching the input data with the second dictionary;
in response to identifying in the input data a second textual phrase from the second dictionary, locating in the input data a first textual phrase from the first dictionary corresponding to the identified second textual phrase; and
using the located first textual phrase to perform one of data leakage prevention, intrusion detection, intrusion prevention, spam e-mail detection, or detection of inappropriate content.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for keyword spotting, i.e., for identifying textual phrases of interest in input data. In the embodiments described herein, the input data comprises communication packets exchanged in a communication network. The disclosed keyword spotting techniques can be used, for example, in applications such as Data Leakage Prevention (DLP), Intrusion Detection Systems (IDS) or Intrusion Prevention Systems (IPS), and spam e-mail detection. A keyword spotting system holds a dictionary of textual phrases for searching input data. In a communication analytics system, for example, the dictionary defines textual phrases to be located in communication packets—such as e-mail addresses or Uniform Resource Locators (URLs).
184 Citations
20 Claims
-
1. A method for searching input data for textual phrases, the method comprising:
-
providing a system having an external memory containing a first dictionary of first textual phrases and a cache memory containing a second dictionary of second textual phrases, wherein the cache memory has a faster access speed than the external memory, and wherein the second dictionary represents the first dictionary but has a smaller data size than the first dictionary because the second textual phrases are sub-strings derived from the first textual phrases that are shorter than the first textual phrases; receiving input data using the system; searching the input data with the second dictionary; in response to identifying in the input data a second textual phrase from the second dictionary, locating in the input data a first textual phrase from the first dictionary corresponding to the identified second textual phrase; and using the located first textual phrase to perform one of data leakage prevention, intrusion detection, intrusion prevention, spam e-mail detection, or detection of inappropriate content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for searching input data for textual phrases, the system comprising:
-
an external memory containing a first dictionary of first textual phrases; a cache memory containing a second dictionary of second textual phrases, wherein the cache memory has a faster access speed than the external memory, and wherein the second dictionary represents the first dictionary but has a smaller data size than the first dictionary because the second textual phrases are sub-strings derived from the first textual phrases that are shorter than the first textual phrases; a network interface card (NIC) that receives input data from a network; and a processor that is communicatively coupled to the external memory, the cache memory, and the NIC, wherein the processor is configured by software to; receive the input data from the NIC, search the input data with the second dictionary, in response to identifying in the input data a second textual phrase from the second dictionary, locating in the input data a first textual phrase from the first dictionary corresponding to the identified second textual phrase, and using the located first textual phrase to perform one of data leakage prevention, intrusion detection, intrusion prevention, spam e-mail detection, or detection of inappropriate content. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification