METHOD AND SYSTEM FOR FAST, GENERIC, ONLINE AND OFFLINE, MULTI-SOURCE TEXT ANALYSIS AND VISUALIZATION
First Claim
1. In a computer system having at least one user interface including at least one output device and at least one input device, a method comprising:
- a) receiving from a user through at least one input device an identification of at least one text source;
b) from each said identified text source, retrieving at least one text passage;
c) for each said retrieved text passage, parsing the said passage into words, identifying multi-word expressions in the said passage and applying a stemming algorithm to the said passage;
d) for each word from the said text passages, determining a number of times the said word appears in the said passages; and
e) causing to be displayed on an output device a predetermined number of words from the said text passages,wherein distances between the said predetermined number of words in a display on the said output device are determined at least in part by a word weight for each said displayed word and by a link weight for each pair of said displayed words, andwherein the word weight for each said displayed word is determined at least in part by a number of times the said word appears in the said passages; and
wherein the link weight for each said pair of said displayed words is determined at least in part by the number of times each said word appears in the said passages and by a number of times the said word pair appears in a same window in the said passages.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for text data analysis and visualization enable a user to specify a set of text data sources and visualize the content of the text data sources in an overview of salient features in the form of a network of words. A user may focus on one or more words to provide a visualization of connections specific to the focused word(s). The visualization may include clustering of relevant concepts within the network of words. Upon selection of a word, the context thereof, e.g., links to articles where the word appears, may be provided to the user. Analyzing may include textual statistical correlation models for assigning weights to words and links between words. Displaying the network of words may include a force-based network layout algorithm. Extracting clusters for display may include identifying “communities of words” as if the network of words was a social network.
-
Citations
12 Claims
-
1. In a computer system having at least one user interface including at least one output device and at least one input device, a method comprising:
-
a) receiving from a user through at least one input device an identification of at least one text source; b) from each said identified text source, retrieving at least one text passage; c) for each said retrieved text passage, parsing the said passage into words, identifying multi-word expressions in the said passage and applying a stemming algorithm to the said passage; d) for each word from the said text passages, determining a number of times the said word appears in the said passages; and e) causing to be displayed on an output device a predetermined number of words from the said text passages, wherein distances between the said predetermined number of words in a display on the said output device are determined at least in part by a word weight for each said displayed word and by a link weight for each pair of said displayed words, and wherein the word weight for each said displayed word is determined at least in part by a number of times the said word appears in the said passages; and wherein the link weight for each said pair of said displayed words is determined at least in part by the number of times each said word appears in the said passages and by a number of times the said word pair appears in a same window in the said passages. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-readable medium having computer-readable instructions stored thereon which, as a result of being executed in a computer system having at least one user interface including at least one output device and at least one input device, instruct the computer system to perform a method, comprising:
-
a) receiving from a user through at least one input device an identification of at least one text source; b) from each said identified text source, retrieving at least one text passage; c) for each said retrieved text passage, parsing the said passage into words, identifying multi-word expressions in the said passage and applying a stemming algorithm to the said passage; d) for each word from the said text passages, determining a number of times the said word appears in the said passages; and e) causing to be displayed on an output device a predetermined number of words from the said text passages, wherein distances between the said predetermined number of words in a display on the said output device are determined at least in part by a word weight for each said displayed word and by a link weight for each pair of said displayed words, and wherein the word weight for each said displayed word is determined at least in part by a number of times the said word appears in the said passages; and wherein the link weight for each said pair of said displayed words is determined at least in part by the number of times each said word appears in the said passages and by a number of times the said word pair appears in a same window in the said passages. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification