SYSTEM AND METHOD FOR SUGGESTION MINING
First Claim
1. A method for extraction of suggestions for improvement comprising:
- providing a structured terminology for a topic, the structured terminology including a set of semantic classes, each of a plurality of the semantic classes including a finite set of terms;
providing a thesaurus of terms relating to suggestions of improvement;
receiving a corpus of text documents, each document comprising a text string in a natural language;
labeling text elements in the text strings which are instances of terms in the structured terminology with the corresponding semantic class;
labeling text elements in the text strings which are instances of terms in the thesaurus;
with a processor, applying a set of patterns to the labeled text strings to identify suggestions of improvement expressions, the patterns each defining a syntactic relation between text elements, the patterns including;
for each of the semantic classes in the set, at least one pattern which specifies a syntactic relation in which one of the text elements in the relation is labeled as an instance of the semantic class, andwherein at least one of the patterns specifies a syntactic relation in which one of the text elements in the relation is labeled as an instance of one of the terms in the thesaurus; and
outputting a set of suggestions for improvements based on the identified suggestions of improvement expressions.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method for extraction of suggestions for improvement form a corpus of documents, such as customer reviews, are disclosed. A structured terminology provided or a topic includes a set of semantic classes, each including a set of terms. A thesaurus of terms relating to suggestions of improvement is provided. Text elements of text strings in the documents which are instances of terms in the structured terminology are labeled with the corresponding semantic class and text elements which are instances of terms in the thesaurus are also labeled. A set of patterns is applied to the labeled text strings to identify suggestions of improvement expressions. The patterns define syntactic relations between text elements, some of which are required to be instances of one of the terms in a particular semantic class or thesaurus. A set of suggestions for improvements is output based on the identified suggestions of improvement expressions.
225 Citations
23 Claims
-
1. A method for extraction of suggestions for improvement comprising:
-
providing a structured terminology for a topic, the structured terminology including a set of semantic classes, each of a plurality of the semantic classes including a finite set of terms; providing a thesaurus of terms relating to suggestions of improvement; receiving a corpus of text documents, each document comprising a text string in a natural language; labeling text elements in the text strings which are instances of terms in the structured terminology with the corresponding semantic class; labeling text elements in the text strings which are instances of terms in the thesaurus; with a processor, applying a set of patterns to the labeled text strings to identify suggestions of improvement expressions, the patterns each defining a syntactic relation between text elements, the patterns including; for each of the semantic classes in the set, at least one pattern which specifies a syntactic relation in which one of the text elements in the relation is labeled as an instance of the semantic class, and wherein at least one of the patterns specifies a syntactic relation in which one of the text elements in the relation is labeled as an instance of one of the terms in the thesaurus; and outputting a set of suggestions for improvements based on the identified suggestions of improvement expressions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for extraction of suggestions for improvement comprising:
-
memory which stores; a structured terminology for a topic, the structured terminology including a set of semantic classes, each of a plurality of the semantic classes including a finite set of terms, a thesaurus of terms relating to suggestions of improvement, and a set of patterns for identify suggestions of improvement expressions, in input text, the patterns each defining a syntactic relation between two text elements, the patterns including, for each of the semantic classes in the set of semantic classes, at least one pattern which specifies a syntactic relation in which one of the text elements in the relation is labeled as an instance of the semantic class, and wherein at least one of the patterns specifies a syntactic relation in which one of the text elements in the relation is labeled as an instance of a term in the thesaurus; a parser configured for labeling text elements in input text strings, which are instances of terms in the structured terminology, with the corresponding semantic class and for labeling text elements in the text strings which are instances of terms in the thesaurus; a suggestion review component for extracting suggestions for improvement expressions by applying the set of patterns to the input text strings and outputting suggestions for improvement based on the extracted expressions; and a processor for implementing the parser and the suggestion review component.
-
-
22. A method for forming a system for extraction of suggestions for improvement comprising:
-
generating a structured terminology for a topic, the structured terminology including a set of semantic classes, each of a plurality of the semantic classes including a finite set of terms related to the respective semantic class, the terms in the structured terminology including nouns or noun phrases; generating a thesaurus of terms relating to suggestions of improvement, the terms in the thesaurus including verbs; deriving a set of patterns to be applied to text strings to identify suggestions of improvement expressions, the patterns each defining a syntactic relation between two text elements, the patterns including; for each of the semantic classes in the set, at least one pattern which specifies a syntactic relation in which one of the text elements in the syntactic relation is labeled as an instance of the semantic class, and wherein at least one of the patterns specifies a syntactic relation in which one of the text elements in the relation is labeled as an instance of a term in the thesaurus; and with a processor, testing the patterns on a corpus of text documents to evaluate the performance of the system. - View Dependent Claims (23)
-
Specification