Method for identifying word patterns in text
First Claim
1. A method for identifying objects referenced in a stream of text, the method comprising:
- receiving an incoming stream of text;
tokenizing the stream of text into individual words;
constructing word patterns of one or more consecutive words from the stream of text;
consulting a semantic network to automatically find a match between one or more word patterns in the incoming stream of text and a word pattern in the semantic network, such that each word in the incoming stream is searched once in the semantic network; and
referencing a known object within the semantic network based on an identified word pattern from the stream of text, the known object identified by a word pattern of the semantic network.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for identifying word patterns in text is conducted in real time and is highly suitable for network and Internet use. The method involves receiving a stream of text, breaking the stream of text into a plurality of threads, tokenizing the words in each thread, and comparing the words to identified words in the semantic network. Recognized, words are then examined, together with surrounding words in the text to determine whether the words are part of a word pattern. Word patterns are located at nodes in the semantic network in a hierarchical structure, and certain word patterns correspond to objects of the semantic network. When all word patterns involving a word are located, links are followed to objects corresponding to the word patterns. Several nodes may point to a single object, but each object is represented only once in the semantic network. Identified objects may thus be identified in real time, as the text streams through the text analysis module.
-
Citations
10 Claims
-
1. A method for identifying objects referenced in a stream of text, the method comprising:
-
receiving an incoming stream of text; tokenizing the stream of text into individual words; constructing word patterns of one or more consecutive words from the stream of text; consulting a semantic network to automatically find a match between one or more word patterns in the incoming stream of text and a word pattern in the semantic network, such that each word in the incoming stream is searched once in the semantic network; and referencing a known object within the semantic network based on an identified word pattern from the stream of text, the known object identified by a word pattern of the semantic network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for identifying objects referenced in a stream of text, the method comprising:
-
loading a semantic network substantially entirely into a common RAM memory space of a processor, the semantic network comprised of recognized words and patterns of words in a hierarchical order; receiving an incoming stream of text comprised of words; tokenizing the stream of text into individual words; examining the individual words in the stream of text in a sequential order as the words are received by consulting the semantic network within the RAM memory to automatically identify one or more word patterns in the incoming stream of text, such that each word in the incoming stream is searched once in the semantic network in the order that the individual words are received, examining the individual words comprising; finding a match between an individual word in the stream of text and an identified word in the semantic network and comparing the individual word and an adjacent word of the stream of text to a word pattern in the semantic network, and continually adding words of the stream of text to recognized word patterns and comparing the result to other word patterns in the semantic network until no more word patterns containing the individual word are located; referencing a known object within the semantic network, the known object identified by a word pattern of the semantic network; and formatting the stream of text to represent identified objects without persistently storing the stream of text.
-
Specification