System and method for detecting and decoding semantically encoded natural language messages
First Claim
Patent Images
1. A method for detecting semantically encoded natural language in textual input data, comprising:
- segmenting the textual input data into a plurality of token linguistic units to define a linguistic event;
assigning a context to one or more of the token linguistic units in the linguistic event;
computing a score for the one or more token linguistic units in the linguistic event that are assigned a context;
applying a predetermined threshold to the score of the one or more token linguistic units to determine whether their use in the linguistic event in their assigned contexts is implausible;
estimating covert meanings of token linguistic units identified as being below the predetermined threshold; and
detecting semantically encoded natural language based on replacing all occurrences, within the linguistic event, of the token linguistic units identified as being below the predetermined threshold with the estimated covert meaning of the token linguistic units identified as being below the predetermined threshold;
wherein the computed scores for the one or more token linguistic units indicate whether the token linguistic units are expected to appear with their assigned contexts in the linguistic event; and
wherein each context, which is assigned to a token linguistic unit in the linguistic event, has at least one linguistic relation that relates the token linguistic unit in the linguistic event and at least one other token linguistic unit in the linguistic event.
3 Assignments
0 Petitions
Accused Products
Abstract
A system detects and decodes semantic camouflage in natural language messages. The system is adapted to identify entities such as words or phrases in overt messages that are being used to disguise different and unrelated entities or concepts. The system automatically determines the semantic plausibility of the overt message and identifies entities that appear in implausible contexts. In addition, the system automatically estimates covert meanings for the entities identified in the overt message that appear in implausible contexts.
47 Citations
16 Claims
-
1. A method for detecting semantically encoded natural language in textual input data, comprising:
-
segmenting the textual input data into a plurality of token linguistic units to define a linguistic event; assigning a context to one or more of the token linguistic units in the linguistic event; computing a score for the one or more token linguistic units in the linguistic event that are assigned a context; applying a predetermined threshold to the score of the one or more token linguistic units to determine whether their use in the linguistic event in their assigned contexts is implausible; estimating covert meanings of token linguistic units identified as being below the predetermined threshold; and detecting semantically encoded natural language based on replacing all occurrences, within the linguistic event, of the token linguistic units identified as being below the predetermined threshold with the estimated covert meaning of the token linguistic units identified as being below the predetermined threshold; wherein the computed scores for the one or more token linguistic units indicate whether the token linguistic units are expected to appear with their assigned contexts in the linguistic event; and wherein each context, which is assigned to a token linguistic unit in the linguistic event, has at least one linguistic relation that relates the token linguistic unit in the linguistic event and at least one other token linguistic unit in the linguistic event. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for detecting semantically encoded natural language in textual input data, comprising:
-
a segmentation module for segmenting the textual input data into a plurality of token linguistic units to define a linguistic event; a context computation module for assigning a context to one or more of the token linguistic units in the linguistic event; a score computation module for computing a score for the one or more token linguistic units in the linguistic event that are assigned a context; a context verification module for applying a predetermined threshold to the score of the one or more token linguistic units to determine whether their use in the linguistic event in their assigned contexts is implausible; and a covert meaning estimation module for estimating covert meanings of token linguistic units identified as being below the predetermined threshold and the covert meaning estimation module detecting semantically encoded natural language based on replacing all occurrences, within the linguistic event, of the token linguistic units identified as being below the predetermined threshold with the estimated covert meaning of the token linguistic units identified as being below the predetermined threshold; wherein the computed scores for the one or more token linguistic units indicate whether the token linguistic units are expected to appear with their assigned contexts in the linguistic event; and wherein each context, which is assigned to a token linguistic unit in the linguistic event by the context computation module, has at least one linguistic relation that relates the token linguistic unit in the linguistic event and at least one other token linguistic unit in the linguistic event. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A memory device for storing a set of program instructions executable on a data processing device and usable for detecting semantically encoded natural language in textual input data, the set of program instructions comprising instructions for:
-
segmenting the textual input data into a plurality of token linguistic units to define a linguistic event; assigning a context to one or more of the token linguistic units in the linguistic event; computing a score for the one or more token linguistic units in the linguistic event that are assigned a context; applying a predetermined threshold to the score of the one or more token linguistic units to determine whether their use in the linguistic event in their assigned contexts is implausible; estimating covert meanings of token linguistic units identified as being below the predetermined threshold; and detecting semantically encoded natural language based on replacing all occurrences, within the linguistic event, of the token linguistic units identified as being below the predetermined threshold with the estimated covert meaning of the token linguistic units identified as being below the predetermined threshold; wherein the computed scores for the one or more token linguistic units indicate whether the token linguistic units are expected to appear with their assigned contexts in the linguistic event; and wherein each context, which is assigned to a token linguistic unit in the linguistic event, has at least one linguistic relation that relates the token linguistic unit in the linguistic event and at least one other token linguistic unit in the linguistic event.
-
Specification