Data mining system for screening terrorist attack event crime groups

Data mining system for screening terrorist attack event crime groups

  • CN 109,657,011 B
  • Filed: 11/26/2018
  • Issued: 10/01/2021
  • Est. Priority Date: 11/26/2018
  • Status: Active Grant
First Claim
Patent Images

1. A data mining system for screening criminal groups of terrorist attacks is characterized by comprising:

  • a first processing module;

    processing the historical data of each historical event to obtain a plurality of historical data points;

    each historical event has a unique number;

    the first processing module comprises;

    the characteristic extraction submodule of the historical data, the hazard grade division submodule of the historical data and the quantitative processing submodule of the historical data committing a case are carried out;

    each historical data point is an N x 1-dimensional vector formed by a characteristic value, a classification grade and a plan motivating quantization value of an extracted characteristic of historical data corresponding to each historical event;

    the submodule for quantizing the historical data committing motivation of the first processing module comprises;

    a first word-dividing unit;

    collecting historical data recorded in the form of English text as a solution motivation, segmenting words of the solution motivation of the historical data, and deleting non-text contents in the historical data by using a regular expression in python language to obtain a preliminarily processed set F1

    A first spell check correction unit;

    check set F with pyenchant package in python1If the spelling of the Chinese word is correct, the misspelled word is modified into the correctly spelled word, and finally a correctly spelled data set F is obtained2

    The first word type reduction unit;

    data set F was assembled using the WordNetLemmatizer class library in the nltk package in python2The adjacent characters which can be combined into a word are restored into the corresponding word, and the TextBlob library in python is utilizedPerforming api processing, unifying all words into lower case, and outputting and restoring the lower case into a single word set F3

    A first keyword extraction unit;

    for set F3The vectorization processing is carried out on the data, and the keyword extraction is carried out on the vectorization processing result by adopting a K-means clustering algorithm;

    taking the historical data of each historical event as a data object, sequentially counting the frequency of different words in each data object as corresponding characteristic values, and outputting the characteristic values in a form of (P)i,bijC) wherein PiRepresenting the ith historical event in the collection, bijRepresenting the jth word in the ith historical event, c representing the corresponding word frequency of the jth word in the ith historical event, wherein the word frequency of all words of each historical event is represented by a one-dimensional vector VmRepresents;

    performing K-means clustering on the generated one-dimensional vectors of all historical events, determining optimal classification by continuously adjusting K values, finally generating K clusters, sequencing the K clusters from large to small according to the cluster radius, and then sequentially assigning values to the sequencing result, wherein the assignment is a natural number which is sequentially decreased, and each assignment represents the value of a historical data plan motivation corresponding to the historical event;

    a second processing module;

    processing the data to be detected of the event to be predicted to obtain a data point to be detected;

    for the second processing module, comprising;

    the data to be tested is subjected to a characteristic extraction submodule, a hazard grade division submodule and a quantitative processing submodule;

    the data points to be measured are N x 1-dimensional vectors formed by characteristic values of extracted features of position data corresponding to the events to be predicted, classification levels and plan motivation quantization values;

    the second processing module carries out quantization processing submodule on the data to be tested as a plan motivation, and comprises;

    a second word segmentation unit;

    respectively carrying out word segmentation on the historical data and the plan motivation of the data to be detected, and deleting non-text content in the historical data by using a regular expression in a python language to obtain a primary processing set F1;

    a second spell check correction unit;

    checking whether the spelling of the word in the set F1 is correct by using a pyenchant packet in python, and finally obtaining a data set F with a completely correct spelling2

    A second type of speech reduction unit;

    data set F was assembled using the WordNetLemmatizer class library in the nltk package in python2The adjacent characters which can be combined into a word in the Chinese character library are restored into corresponding words, and the TextBlob library in python is used for api processing, namely, all the words are unified into lower case, and the set F which is restored into a single word is output3

    A second keyword extraction unit;

    for set F3The vectorization processing is carried out on the data, and the keyword extraction is carried out on the vectorization processing result by adopting a K-means clustering algorithm;

    taking the action motivation data of each event as a data object, sequentially counting the occurrence frequency of different words in each data object as corresponding characteristic values, and outputting the result in a form of (P)i,bijC) wherein PiRepresenting the ith event in the set, bijRepresenting the jth word in the ith event, c representing the corresponding word frequency of the jth word in the ith event in the ith historical event, wherein the word frequency of all words of each event is represented by a one-dimensional vector VmRepresents;

    performing K-means clustering on the generated one-dimensional vectors of all events, determining optimal classification by continuously adjusting K values, finally generating K clusters, sequencing the K clusters from large to small according to the radius of the clusters, and then sequentially assigning values to sequencing results, wherein the assignments are natural numbers which are sequentially decreased, and each assignment represents the value of a historical data plan motivation of the corresponding event;

    finally, outputting the plan motivation score of the event to be detected;

    a dimension reduction module;

    performing dimensionality reduction processing on all historical data points obtained by the first processing module;

    performing subspace clustering on the result subjected to the dimensionality reduction processing, and obtaining historical data clusters with different dimensionalities through the subspace clustering;

    an output module;

    calculating the distance between the data point to be detected and each data object in the historical data cluster, taking the distance from the data point to be detected to the nearest data object in the corresponding cluster as the distance from the data point to be detected to the cluster, if the distance from the data point to be detected to the cluster is within a set range, the matching is successful, otherwise, the matching is failed; and

    finally, aiming at the data points to be detected which are successfully matched, sorting the data points to be detected according to the distance from the data points to be detected to the class cluster from small to large, outputting the names of M crime groups which are ranked at the top as output values, excavating the nearest crime group from the historical crime groups, and outputting the crime group to a related safety department so as to provide auxiliary data support for solving the case as soon as possible in terrorist attack events.

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×