Self-organizing neural network for plain text categorization
First Claim
Patent Images
1. A method for natural language processing comprising the steps of:
- training a neural network comprising a plurality of baseline nodes, wherein a connection weight between any selected pair of baseline nodes is determined from text strings within the selected pair of nodes;
receiving a plurality of text messages from a preselected source;
for each received message, creating a non-baseline node associated with a selected one of the baseline nodes wherein a connection weight between any non-baseline node and the associated baseline node is determined from the text string within the baseline node and the received text message; and
identifying atypical received messages based upon the connection weight between any non-baseline node and the associated baseline node.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for natural language processing including using a trained neural network having a plurality of baseline nodes. A connection weight between any selected pair of baseline nodes is described by text strings within the selected pair of nodes. A plurality of text messages are received from a preselected source. For each received message, a non-baseline node associated with a selected one of the baseline nodes is created. A connection weight between any non-baseline node and the associated baseline node is described by the text string within the baseline node and the received text message.
-
Citations
25 Claims
-
1. A method for natural language processing comprising the steps of:
-
training a neural network comprising a plurality of baseline nodes, wherein a connection weight between any selected pair of baseline nodes is determined from text strings within the selected pair of nodes; receiving a plurality of text messages from a preselected source;
for each received message, creating a non-baseline node associated with a selected one of the baseline nodes wherein a connection weight between any non-baseline node and the associated baseline node is determined from the text string within the baseline node and the received text message; andidentifying atypical received messages based upon the connection weight between any non-baseline node and the associated baseline node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for natural language processing comprising the steps of:
-
training a neural network comprising a plurality of baseline nodes, wherein a connection weight between any selected pair of baseline nodes is determined from text strings within the selected pair of nodes; receiving a plurality of text messages from a preselected source; for each received message, creating a non-baseline node associated with a selected one of the baseline nodes wherein a connection weight between any non-baseline node and the associated baseline node is determined from the text string within the baseline node and the received text message; and identifying atypical received text messages by calculating a distance between each non-baseline node and its associated baseline node such that greater calculated distances indicate atypical data. - View Dependent Claims (13)
-
-
14. A method for natural language processing comprising the steps of:
-
training a neural network comprising a plurality of baseline nodes, wherein a connection weight between any selected pair of baseline nodes is determined from text strings within the selected pair of nodes; receiving a plurality of text messages from a preselected source; for each received message, creating a non-baseline node associated with a selected one of the baseline nodes wherein a connection weight between any non-baseline node and the associated baseline node is determined from the text string within the baseline node and the received text message; and analyzing atypical received text messages by determining a magnitude of a frequency of associations of non-baseline nodes with a selected baseline node.
-
-
15. A computer program product comprising:
a computer usable medium having computer readable code embodied therein for natural language processing, the computer program product comprising; computer readable program code devices configured to cause a computer to effect a trained neural network comprising a plurality of baseline nodes, wherein each baseline node has a weight that is represented by a text string within the node; computer readable program code devices configured to cause a computer to effect receiving a plurality of text messages from a preselected source; computer readable program code devices configured to cause a computer to effect for each received message, creating a non-baseline node having a weight that is represented by the received text message; computer readable program code devices configured to cause a computer to effect associating each non-baseline node with a selected one of the baseline nodes; and computer readable program code devices configured to cause a computer to effect identifying atypical received text based upon a difference between the non-baseline node associated with the received message and the baseline node to which the non-baseline node is associated. - View Dependent Claims (16)
-
17. A computer program product comprising:
-
a computer usable medium having computer readable code embodied therein for natural language processing, the computer program product comprising; computer readable program code devices configured to cause a computer to effect a trained neural network comprising a plurality of baseline nodes, wherein each baseline node has a weight that is represented by a text string within the node; computer readable program code devices configured to cause a computer to effect receiving a plurality of text messages from a preselected source; computer readable program code devices configured to cause a computer to effect for each received message, creating a non-baseline node having a weight that is represented by the received text message; computer readable program code devices configured to cause a computer to effect associating each non-baseline node with a selected one of the baseline nodes; computer readable program code devices configured to cause a computer to effect a self-organizing neural network architecture comprising a one-dimensional circular array of baseline nodes; computer readable program code devices configured to cause a computer to effect randomizing the weight within each of the baseline nodes; computer readable program code devices configured to cause a computer to effect receiving a baseline text messages from the preselected source; computer readable program code devices configured to cause a computer to effect associating the received baseline text message with a closest one of the randomized baseline nodes wherein closeness is measured by a longest common sub-sequence algorithm; computer readable program code devices configured to cause a computer to effect combining the received baseline text message with the associated baseline node using a genetic algorithm; and computer readable program code devices configured to cause a computer to effect iteratively executing the program code effecting the receiving, the associating, and the combining until the neural network achieves a desired state of training. - View Dependent Claims (18)
-
-
19. A system for processing information to identify atypical information content of received digitized text information with respect to prior digitized text information, the system comprising:
-
a digital computer; communication means for electronically collecting text information into the computer, the text organized in messages; software operable in the computer for representing the prior digitized text information in a neural network array, the array comprising a plurality of baseline nodes wherein each baseline node occupies a unique location in a text hyperspace and each baseline node has a weight represented by text tokens within the baseline node wherein the text tokens are determined from the prior digitized text; software operable in the computer for representing each received message as a non-baseline node having a position in the text hyperspace; software operable in the computer for directly or indirectly associating each non-baseline node with one baseline node; and software operable in the computer for evaluating relative positions of the non-baseline nodes in the text hyperspace to identify atypical received messages. - View Dependent Claims (20)
-
-
21. A system for processing information to identify atypical information content of received digitized text information with respect to prior digitized text information, the system comprising:
-
a digital computer; communication means for electronically collecting text information into the computer, the text organized in messages; software operable in the computer for representing the prior digitized text information in a neural network array, the array comprising a plurality of baseline nodes wherein each baseline node occupies a unique location in a text hyperspace and each baseline node has a weight represented by text tokens within the baseline node; software operable in the computer for representing each received message as a non-baseline node having a position in the text hyperspace; software operable in the computer for directly or indirectly associating each non-baseline node with one baseline node; and software operable in the computer for evaluating relative positions of the non-baseline nodes in the text hyperspace to identify atypical received messages, wherein the software operable in the computer for evaluating relative positions calculates a metric score using the equation;
##EQU4## wherein len(s1) is the length of text within the baseline node, len(s2) is the length of text within the non-baseline node, and len(LCS) is the length of the longest common sequence between s1 and s2.
-
-
22. A system for processing information to identify atypical information content of received digitized text information with respect to prior digitized text information, the system comprising:
-
a digital computer; communication means for electronically collecting text information into the computer, the text organized in messages; software operable in the computer for representing the prior digitized text information in a neural network array, the array comprising a plurality of baseline nodes wherein each baseline node occupies a unique location in a text hyperspace and each baseline node has a weight represented by text tokens within the baseline node, wherein the software operable in the computer for representing the prior digitized text information in a neural network array further comprises software operable to implement a text domain one dimensional neural network array and the software operable for associating each non-baseline node with one baseline node operates to directly associate each non-baseline node with one of the baseline nodes; software operable in the computer for representing each received message as a non-baseline node having a position in the text hyperspace; software operable in the computer for directly or indirectly associating each non-baseline node with one baseline node; and software operable in the computer for evaluating relative positions of the non-baseline nodes in the text hyperspace to identify atypical received messages.
-
-
23. A system for processing information to identify atypical information content of received digitized text information with respect to prior digitized text information, the system comprising:
-
a digital computer; communication means for electronically collecting text information into the computer, the text organized in messages; software operable in the computer for representing the prior digitized text information in a neural network array, the array comprising a plurality of baseline nodes wherein each baseline node occupies a unique location in a text hyperspace and each baseline node has a weight represented by text tokens within the baseline node, wherein the software operable in the computer for representing the prior digitized text information in a neural network array further comprises software operable to implement a text domain extensible minimum-spanning-tree (MST) neural network array and the software operable for associating each non-baseline node with one baseline node is operable to associate each non-baseline node with a baseline node or non-baseline node having the smallest relative position; software operable in the computer for representing each received message as a non-baseline node having a position in the text hyperspace; software operable in the computer for directly or indirectly associating each non-baseline node with one baseline node; and software operable in the computer for evaluating relative positions of the non-baseline nodes in the text hyperspace to identify atypical received messages.
-
-
24. A system for processing information to identify atypical information content of received digitized text information with respect to prior digitized text information, the system comprising:
-
a digital computer; communication means for electronically collecting text information into the computer, the text organized in messages; software operable in the computer for representing the prior digitized text information in a neural network array, the array comprising a plurality of baseline nodes wherein each baseline node occupies a unique location in a text hyperspace and each baseline node has a weight represented by text tokens within the baseline node; software operable in the computer for representing each received message as a non-baseline node having a position in the text hyperspace; software operable in the computer for directly or indirectly associating each non-baseline node with one baseline node; software operable in the computer for evaluating relative positions of the non-baseline nodes in the text hyperspace to identify atypical received messages; and software operable in the computer for combining the contents of a non-baseline node with the contents of a selected baseline node in order to train the neural network. - View Dependent Claims (25)
-
Specification