Systems and methods for efficient keyword spotting in communication traffic
First Claim
Patent Images
1. A method, comprising:
- identifying substrings from within keywords, wherein each keyword comprises a string;
caching a set of flags, each flag indicating whether a respective substring from the identified substrings occurs in one or more of the keywords, in an internal cache memory of a processor device;
identifying, using the processor device, locations in input data in which the substrings occur by comparing the input data with the cached flags; and
searching at the identified locations for occurrences of the keywords, so as to find at least one of the keywords in the input data wherein each flag indicates whether the respective substring occurs in at least one of multiple predefined offsets within the one or more keywords, wherein the input data comprises received communication network traffic,wherein the input data further comprises multiple data packets, wherein identifying the locations comprises identifying a subset of the data packets in which the substrings occur, and wherein searching at the identified locations comprises searching in the identified subset of the data packets; and
further wherein incoming Real Time Protocol (RTP) traffic, traffic that is identified as being encrypted, or traffic associated with any other suitable application or protocol is discarded.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems related to keyword searching processes. A list of keywords may be first represented by a set of short substrings. The substrings are selected such that an occurrence of a substring indicates a possible occurrence of one or more of the keywords. Input data may be initially pre-processed, so as to identify locations in the input data in which the substrings occur. Then, the identified locations are searched for occurrences of the actual keywords. The pre-processing scheme enables the keyword search process to search only in the identified locations of the substrings instead of over the entire input data.
150 Citations
14 Claims
-
1. A method, comprising:
-
identifying substrings from within keywords, wherein each keyword comprises a string; caching a set of flags, each flag indicating whether a respective substring from the identified substrings occurs in one or more of the keywords, in an internal cache memory of a processor device; identifying, using the processor device, locations in input data in which the substrings occur by comparing the input data with the cached flags; and searching at the identified locations for occurrences of the keywords, so as to find at least one of the keywords in the input data wherein each flag indicates whether the respective substring occurs in at least one of multiple predefined offsets within the one or more keywords, wherein the input data comprises received communication network traffic, wherein the input data further comprises multiple data packets, wherein identifying the locations comprises identifying a subset of the data packets in which the substrings occur, and wherein searching at the identified locations comprises searching in the identified subset of the data packets; and
further wherein incoming Real Time Protocol (RTP) traffic, traffic that is identified as being encrypted, or traffic associated with any other suitable application or protocol is discarded. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. Apparatus, comprising:
-
an interface, which is configured to receive input data; and a hardware processor, which comprises an internal cache memory and is configured to identify substrings from within keywords, wherein each keyword comprises a string, to cache in the internal cache memory a set of flags, such that each flag indicates whether a respective substring occurs in one or more of the keywords, to identify locations in the input data in which the substrings occur by comparing the input data with the cached flags, and to search at the identified locations for occurrences of the keywords, so as to find at least one of the keywords in the input data wherein each flag indicates whether the respective substring occurs in at least one of multiple predefined offsets within the one or more keywords, wherein the input data comprises received communication network traffic, wherein the input data further comprises multiple data packets, wherein identifying the locations comprises identifying a subset of the data packets in which the substrings occur, and wherein searching at the identified locations comprises searching in the identified subset of the data packets, and further wherein incoming Real Time Protocol (RTP) traffic, traffic that is identified as being encrypted, or traffic associated with any other suitable application or protocol is discarded. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification