Methods for analyzing text

US 9,336,192 B1
Filed: 11/26/2013
Issued: 05/10/2016
Est. Priority Date: 11/28/2012
Status: Active Grant

First Claim

Patent Images

1. A method for analyzing text, comprising:

providing a processor on a first computer, wherein the processor runs a content acquisition system to obtain a text document over a computer network, wherein the text document comprises a hashtag immediately preceded by one or more words and immediately followed by one or more words;

storing the text document obtained from the content acquisition system on a storage device;

providing a processor on the first computer which runs a text analytics engine, wherein the text analytics engine comprises a hashtag detector, a sentiment recognizer, a named entity recognizer, and a sentiment assignor;

using the text analytics engine to access the text document and perform an analysis to generate metadata, comprising;

having the hashtag detector recognize a hashtag and the one or more words immediately following the hashtag;

having the named entity recognizer identify the one or more words immediately following the hashtag;

having the sentiment recognizer select words from the one or more words immediately preceding the hashtag which have sentiment;

assigning the one or more words having sentiment to the one or more words immediately following the hashtag;

providing a processor on the first computer which runs a weighting multiplier; and

measuring a repetition of letters in the one or more words having sentiment, comprising;

preprocessing a dictionary of terms into a tree such that each letter in a word corresponds to a node in the tree, with subsequent letters corresponding to branchings in the tree, and wherein leaves of the tree point to the term preprocessed;

where lookup is accomplished by processing of each letter in a word discovered in a novel text being processed where processing is accomplished by following the branches in the tree corresponding to subsequent letters, except that repeated letters do not follow branches but instead increment a counter recording the number of letters repeated, where the successful arrival at a leaf returns both the term discovered and the letter repetition counter; and

the letter repetition counter is used to calculate a weight multiplier greater or equal to 1 for the term;

storing the metadata in a database;

providing a processor from a second computer that runs an application which accesses the database.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for analyzing sentiment in media, topic targeted sentiment, and semantic vector calculation explanation functions.

108 Citations

View as Search Results

2 Claims

1. A method for analyzing text, comprising:
- providing a processor on a first computer, wherein the processor runs a content acquisition system to obtain a text document over a computer network, wherein the text document comprises a hashtag immediately preceded by one or more words and immediately followed by one or more words;
  
  storing the text document obtained from the content acquisition system on a storage device;
  
  providing a processor on the first computer which runs a text analytics engine, wherein the text analytics engine comprises a hashtag detector, a sentiment recognizer, a named entity recognizer, and a sentiment assignor;
  
  using the text analytics engine to access the text document and perform an analysis to generate metadata, comprising;
  
  having the hashtag detector recognize a hashtag and the one or more words immediately following the hashtag;
  
  having the named entity recognizer identify the one or more words immediately following the hashtag;
  
  having the sentiment recognizer select words from the one or more words immediately preceding the hashtag which have sentiment;
  
  assigning the one or more words having sentiment to the one or more words immediately following the hashtag;
  
  providing a processor on the first computer which runs a weighting multiplier; and
  
  measuring a repetition of letters in the one or more words having sentiment, comprising;
  
  preprocessing a dictionary of terms into a tree such that each letter in a word corresponds to a node in the tree, with subsequent letters corresponding to branchings in the tree, and wherein leaves of the tree point to the term preprocessed;
  
  where lookup is accomplished by processing of each letter in a word discovered in a novel text being processed where processing is accomplished by following the branches in the tree corresponding to subsequent letters, except that repeated letters do not follow branches but instead increment a counter recording the number of letters repeated, where the successful arrival at a leaf returns both the term discovered and the letter repetition counter; and
  
  the letter repetition counter is used to calculate a weight multiplier greater or equal to 1 for the term;
  
  storing the metadata in a database;
  
  providing a processor from a second computer that runs an application which accesses the database.

2. A method for analyzing text, comprising:
- providing a processor on a first computer, wherein the processor runs a content acquisition system to obtain a text document over a computer network, wherein the text document comprises a hashtag immediately preceded by one or more words and immediately followed by one or more words that form a run-on;
  
  storing the text document obtained from the content acquisition system on a storage device;
  
  providing a processor on the first computer which runs a text analytics engine, wherein the text analytics engine comprises a hashtag detector, a sentiment recognizer, a named entity recognizer, and a sentiment assignor;
  
  using the text analytics engine to access the text document and perform an analysis to generate metadata, comprising;
  
  having the hashtag detector recognize the hashtag and the run-on;
  
  having the sentiment recognizer identify individual words that form the run-on and determine whether such word(s) have sentiment, comprising;
  
  providing a backtracking algorithm; and
  
  having the backtracking algorithm examine the words immediately following the hashtag in a backwards direction, examining each span of consecutive letters from end to front to create one or more spans, and looking up each span in a dictionary of common words reversed, where each dictionary match creates a branching point, wherein in one branch the matched terms is considered a unique word, and repeating this process in order to find word matches for all of the letters in the one or more words immediately following the hashtag; and
  
  wherein in another branch the string matched is considered so far as only a portion of a longer word, and continuing the same process until the next branch point, where the first branch that successfully assigns all letters into a known word causes acceptance of that parsing as the correct one, and where no successful groupings of letters into terms causes the entire hashtag to be considered as an individual term;
  
  having the named entity recognizer identify the one or more words immediately preceding the hashtag that identify one or more named entities; and
  
  assigning any one or more of the one or words immediately preceding the hashtag to any one or more of the one or more words identified as having sentiment by the sentiment recognizer;
  
  storing the metadata in a database; and
  
  providing a processor from a second computer that runs an application which accesses the database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lexalytics, Inc. (InMoment, Inc.)
Original Assignee
Lexalytics, Inc. (InMoment, Inc.)
Inventors
Barba, Paul F., Marshall, Michael W., Lambrecht, Carl J.
Primary Examiner(s)
Colucci, Michael

Application Number

US14/090,271
Time in Patent Office

896 Days
Field of Search

704/9, 709/224, 709/206, 707/812, 706/27, 706/20, 705/14.23, 705/14.17, 434/157
US Class Current

1/1
CPC Class Codes

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

G06F 40/30   Semantic analysis

Methods for analyzing text

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

108 Citations

2 Claims

Specification

Solutions

Use Cases

Quick Links

Methods for analyzing text

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

108 Citations

2 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links