SYSTEM AND METHOD FOR KEYWORD SPOTTING USING REPRESENTATIVE DICTIONARY
First Claim
1. A method for searching input data for textual phrases, the method comprising:
- providing a system having an external memory containing a first dictionary of first textual phrases and a cache memory containing a second dictionary of second textual phrases, wherein the cache memory has a faster access speed than the external memory, and wherein the second dictionary represents the first dictionary but has a smaller data size than the first dictionary because the second textual phrases are sub-strings derived from the first textual phrases that are shorter than the first textural phrases;
receiving input data using the system;
searching the input data with the second dictionary; and
in response to identifying in the input data a second textual phrase from the second dictionary, locating in the input data a first textual phrase from the first dictionary corresponding to the identified second textual phrase.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for keyword spotting, i.e., for identifying textual phrases of interest in input data. In the embodiments described herein, the input data comprises communication packets exchanged in a communication network. The disclosed keyword spotting techniques can be used, for example, in applications such as Data Leakage Prevention (DLP), Intrusion Detection Systems (IDS) or Intrusion Prevention Systems (IPS), and spam e-mail detection. A keyword spotting system holds a dictionary of textual phrases for searching input data. In a communication analytics system, for example, the dictionary defines textual phrases to be located in communication packets—such as e-mail addresses or Uniform Resource Locators (URLs).
25 Citations
20 Claims
-
1. A method for searching input data for textual phrases, the method comprising:
-
providing a system having an external memory containing a first dictionary of first textual phrases and a cache memory containing a second dictionary of second textual phrases, wherein the cache memory has a faster access speed than the external memory, and wherein the second dictionary represents the first dictionary but has a smaller data size than the first dictionary because the second textual phrases are sub-strings derived from the first textual phrases that are shorter than the first textural phrases; receiving input data using the system; searching the input data with the second dictionary; and in response to identifying in the input data a second textual phrase from the second dictionary, locating in the input data a first textual phrase from the first dictionary corresponding to the identified second textual phrase. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for searching input data for textual phrases, the system comprising:
-
an external memory containing a first dictionary of first textual phrases; a cache memory containing a second dictionary of second textural phrases, wherein the cache memory has a faster access speed than the external memory, and wherein the second dictionary represents the first dictionary but has a smaller data size than the first dictionary because the second textual phrases are sub-strings derived from the first textual phrases that are shorter than the first textural phrases; a network interface card (NIC) that receives input data from a network; and a processor that is communicatively coupled to the external memory, the cache memory, and the NIC, wherein the processor is configured by software to; receive the input data from the NIC, search the input data with the second dictionary, and in response to identifying in the input data a second textual phrase from the second dictionary, locating in the input data a first textual phrase from the first dictionary corresponding to the identified second textual phrase. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification