Methods for analyzing text
First Claim
Patent Images
1. A method for analyzing text, comprising:
- providing a processor on a first computer, wherein the processor runs a content acquisition system to obtain a text document over a computer network, wherein the text document comprises a hashtag immediately preceded by one or more words and immediately followed by one or more words;
storing the text document obtained from the content acquisition system on a storage device;
providing a processor on the first computer which runs a text analytics engine, wherein the text analytics engine comprises a hashtag detector, a sentiment recognizer, a named entity recognizer, and a sentiment assignor;
using the text analytics engine to access the text document and perform an analysis to generate metadata, comprising;
having the hashtag detector recognize a hashtag and the one or more words immediately following the hashtag;
having the named entity recognizer identify the one or more words immediately following the hashtag;
having the sentiment recognizer select words from the one or more words immediately preceding the hashtag which have sentiment;
assigning the one or more words having sentiment to the one or more words immediately following the hashtag;
providing a processor on the first computer which runs a weighting multiplier; and
measuring a repetition of letters in the one or more words having sentiment, comprising;
preprocessing a dictionary of terms into a tree such that each letter in a word corresponds to a node in the tree, with subsequent letters corresponding to branchings in the tree, and wherein leaves of the tree point to the term preprocessed;
where lookup is accomplished by processing of each letter in a word discovered in a novel text being processed where processing is accomplished by following the branches in the tree corresponding to subsequent letters, except that repeated letters do not follow branches but instead increment a counter recording the number of letters repeated, where the successful arrival at a leaf returns both the term discovered and the letter repetition counter; and
the letter repetition counter is used to calculate a weight multiplier greater or equal to 1 for the term;
storing the metadata in a database;
providing a processor from a second computer that runs an application which accesses the database.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for analyzing sentiment in media, topic targeted sentiment, and semantic vector calculation explanation functions.
108 Citations
2 Claims
-
1. A method for analyzing text, comprising:
-
providing a processor on a first computer, wherein the processor runs a content acquisition system to obtain a text document over a computer network, wherein the text document comprises a hashtag immediately preceded by one or more words and immediately followed by one or more words; storing the text document obtained from the content acquisition system on a storage device; providing a processor on the first computer which runs a text analytics engine, wherein the text analytics engine comprises a hashtag detector, a sentiment recognizer, a named entity recognizer, and a sentiment assignor; using the text analytics engine to access the text document and perform an analysis to generate metadata, comprising; having the hashtag detector recognize a hashtag and the one or more words immediately following the hashtag; having the named entity recognizer identify the one or more words immediately following the hashtag; having the sentiment recognizer select words from the one or more words immediately preceding the hashtag which have sentiment; assigning the one or more words having sentiment to the one or more words immediately following the hashtag; providing a processor on the first computer which runs a weighting multiplier; and measuring a repetition of letters in the one or more words having sentiment, comprising; preprocessing a dictionary of terms into a tree such that each letter in a word corresponds to a node in the tree, with subsequent letters corresponding to branchings in the tree, and wherein leaves of the tree point to the term preprocessed; where lookup is accomplished by processing of each letter in a word discovered in a novel text being processed where processing is accomplished by following the branches in the tree corresponding to subsequent letters, except that repeated letters do not follow branches but instead increment a counter recording the number of letters repeated, where the successful arrival at a leaf returns both the term discovered and the letter repetition counter; and
the letter repetition counter is used to calculate a weight multiplier greater or equal to 1 for the term;storing the metadata in a database; providing a processor from a second computer that runs an application which accesses the database.
-
-
2. A method for analyzing text, comprising:
-
providing a processor on a first computer, wherein the processor runs a content acquisition system to obtain a text document over a computer network, wherein the text document comprises a hashtag immediately preceded by one or more words and immediately followed by one or more words that form a run-on; storing the text document obtained from the content acquisition system on a storage device; providing a processor on the first computer which runs a text analytics engine, wherein the text analytics engine comprises a hashtag detector, a sentiment recognizer, a named entity recognizer, and a sentiment assignor; using the text analytics engine to access the text document and perform an analysis to generate metadata, comprising; having the hashtag detector recognize the hashtag and the run-on; having the sentiment recognizer identify individual words that form the run-on and determine whether such word(s) have sentiment, comprising; providing a backtracking algorithm; and having the backtracking algorithm examine the words immediately following the hashtag in a backwards direction, examining each span of consecutive letters from end to front to create one or more spans, and looking up each span in a dictionary of common words reversed, where each dictionary match creates a branching point, wherein in one branch the matched terms is considered a unique word, and repeating this process in order to find word matches for all of the letters in the one or more words immediately following the hashtag; and
wherein in another branch the string matched is considered so far as only a portion of a longer word, and continuing the same process until the next branch point, where the first branch that successfully assigns all letters into a known word causes acceptance of that parsing as the correct one, and where no successful groupings of letters into terms causes the entire hashtag to be considered as an individual term;having the named entity recognizer identify the one or more words immediately preceding the hashtag that identify one or more named entities; and assigning any one or more of the one or words immediately preceding the hashtag to any one or more of the one or more words identified as having sentiment by the sentiment recognizer; storing the metadata in a database; and providing a processor from a second computer that runs an application which accesses the database.
-
Specification