Method and apparatus for processing sentiment-bearing text
First Claim
Patent Images
1. A computer-implemented method of processing text included in multiple product reviews of a single product, comprising:
- utilizing a computer processor that is a component of the computer to cluster sub-document linguistic units included in a collection of relevant documents into a set of clusters based on pre-defined clustering criteria, wherein each relevant document in the collection contains text that is a review of the single product, and wherein each cluster in the set represents a different attribute of the single product, and wherein the pre-defined clustering criteria is a listing of key words defined before the computer processor clusters the sub-document linguistic units into the set of clusters, and wherein the listing of key words includes a separate group of key words for each said different attribute of the single product such that when the processor clusters the sub-document linguistic units into the set of clusters it does so by determining which of the listing of key words are included in which sub-document linguistic units;
assigning a sentiment and a confidence measure to each sub-document linguistic unit, wherein for each sub-document linguistic unit the confidence measure is a measurement of a confidence with which the sentiment was assigned;
generating a display including a direct indication of the sub-document linguistic units, the cluster in the set to which each sub-document linguistic unit was clustered by the computer processor, and the sentiment assigned to each sub-document linguistic unit;
wherein generating the display further comprises generating the display so as to also include a user input mechanism that receives user-initiated selection of a minimum confidence level that the confidence measure attributed to each sub-document linguistic unit must exceed for a sub-document linguistic unit to be included by the computer processor within any of the clusters;
excluding a particular one of the sub-document linguistic units from being included in any cluster in the set based on a determination that the confidence measure assigned to the particular sub-document linguistic unit is less than the minimum confidence level received by the user input mechanism; and
wherein generating the display further comprises generating the display so as to also include an indication of which of the listing of key words were used by the computer processor as a basis for clustering the sub-document linguistic units.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a system for identifying, extracting, clustering and analyzing sentiment-bearing text. In one embodiment, the invention implements a pipeline capable of accessing raw text and presenting it in a highly usable and intuitive way.
46 Citations
9 Claims
-
1. A computer-implemented method of processing text included in multiple product reviews of a single product, comprising:
-
utilizing a computer processor that is a component of the computer to cluster sub-document linguistic units included in a collection of relevant documents into a set of clusters based on pre-defined clustering criteria, wherein each relevant document in the collection contains text that is a review of the single product, and wherein each cluster in the set represents a different attribute of the single product, and wherein the pre-defined clustering criteria is a listing of key words defined before the computer processor clusters the sub-document linguistic units into the set of clusters, and wherein the listing of key words includes a separate group of key words for each said different attribute of the single product such that when the processor clusters the sub-document linguistic units into the set of clusters it does so by determining which of the listing of key words are included in which sub-document linguistic units; assigning a sentiment and a confidence measure to each sub-document linguistic unit, wherein for each sub-document linguistic unit the confidence measure is a measurement of a confidence with which the sentiment was assigned; generating a display including a direct indication of the sub-document linguistic units, the cluster in the set to which each sub-document linguistic unit was clustered by the computer processor, and the sentiment assigned to each sub-document linguistic unit; wherein generating the display further comprises generating the display so as to also include a user input mechanism that receives user-initiated selection of a minimum confidence level that the confidence measure attributed to each sub-document linguistic unit must exceed for a sub-document linguistic unit to be included by the computer processor within any of the clusters; excluding a particular one of the sub-document linguistic units from being included in any cluster in the set based on a determination that the confidence measure assigned to the particular sub-document linguistic unit is less than the minimum confidence level received by the user input mechanism; and wherein generating the display further comprises generating the display so as to also include an indication of which of the listing of key words were used by the computer processor as a basis for clustering the sub-document linguistic units. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method of generating a display that presents text associated with multiple product reviews of a single product, comprising:
-
receiving from a user an indication of features of the single product for which the user desires sentiment analysis; utilizing a computer processor that is a component of the computer to cluster sub-document linguistic units of relevant documents into clusters, wherein each of the clusters corresponds to one of the features for which the user desires sentiment analysis; assigning a sentiment and a confidence measure to each sub-document linguistic unit, wherein for each sub-document linguistic unit the confidence measure is a measurement of a confidence with which the sentiment was assigned; generating the display so as to include an indication of the sub-document linguistic units, a cluster to which each sub-document linguistic unit was assigned by the computer processor, a sentiment attributed to each sub-document linguistic unit, an overall sentiment attributed to each of the clusters, and the features of the single product for which the user desires a sentiment analysis; and wherein generating the display further comprises generating the display so as to include a user input mechanism that receives a user-initiated selection of a minimum confidence level that the confidence measure attributed to each sub-document linguistic unit must exceed for a sub-document linguistic unit to be included by the computer processor within any of the clusters, and wherein generating the display further comprises generating the display so as to include a plurality of boxes, wherein at least one of the plurality of boxes changes sizes in response to receipt by the user input mechanism of the minimum confidence level. - View Dependent Claims (7, 8, 9)
-
Specification