SVO-based taxonomy-driven text analytics
First Claim
Patent Images
1. A computer program product for classifying data, the computer program product comprising a computer readable storage device having program code embodied therewith, the program code being executable by a processor to:
- receive textual data, and analyze the received data,wherein the analysis comprises program code to;
identify at least one sentence from the received data;
parse parts of speech of one or more words of the identified sentence using a linguistic parser, including parsing a verb from the at least one parsed sentence, and identifying a verb usage pattern in the at least one identified sentence;
form a low level subject-verb-object (SVO) triplet for the at least one parsed sentence, including identifying a subject, verb, and object of the at least one parsed sentence, wherein the identification of the subject, verb, and object comprises joining the identified verb usage pattern with a form of the identified verb to ascertain linguistic taxonomy;
form a high level SVO triplet for the at least one parsed sentence, including determining a subject category for the subject, a verb category for the verb, and an object category for the object based on the taxonomy; and
classify the at least one parsed sentence based on the high level SVO triplet; and
summarize the analysis of the stored data, including;
produce an analysis report reflective of the analysis; and
convert the produced analysis report into a summary report reflective of the analysis report, including cluster the received textual data into one or more statement clusters, wherein the summary report comprises the statement clusters.
1 Assignment
0 Petitions
Accused Products
Abstract
Textual data is organized into statement clusters. Sentences are extracted from textual data and parsed. A verb usage pattern is identified and an SVO triplet is determined. The SVO triplet is compared to a taxonomy associated with the domain of the data and a sentiment is derived. A statement cluster is constructed comprising a higher level SVO triplet sensitive to the taxonomy and verb usage pattern, as well as the derived sentiment. Accordingly, the statement clusters may be organized by grouping.
-
Citations
16 Claims
-
1. A computer program product for classifying data, the computer program product comprising a computer readable storage device having program code embodied therewith, the program code being executable by a processor to:
receive textual data, and analyze the received data, wherein the analysis comprises program code to; identify at least one sentence from the received data; parse parts of speech of one or more words of the identified sentence using a linguistic parser, including parsing a verb from the at least one parsed sentence, and identifying a verb usage pattern in the at least one identified sentence; form a low level subject-verb-object (SVO) triplet for the at least one parsed sentence, including identifying a subject, verb, and object of the at least one parsed sentence, wherein the identification of the subject, verb, and object comprises joining the identified verb usage pattern with a form of the identified verb to ascertain linguistic taxonomy; form a high level SVO triplet for the at least one parsed sentence, including determining a subject category for the subject, a verb category for the verb, and an object category for the object based on the taxonomy; and classify the at least one parsed sentence based on the high level SVO triplet; and summarize the analysis of the stored data, including; produce an analysis report reflective of the analysis; and convert the produced analysis report into a summary report reflective of the analysis report, including cluster the received textual data into one or more statement clusters, wherein the summary report comprises the statement clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A system comprising:
-
a processing unit in communication with data storage; a functional unit having memory and in communication with the processing unit, the functional unit having tools to support data classification, the tools comprising; an extraction manager in communication with data storage, the extraction manager to receive textual data, and analyze the received data; an identification manager in communication with the extraction manager, the identification manager to; identify at least one sentence from the received data; parse parts of speech of one or more words of the identified sentence using a linguistic parser, including parse a verb from the at least one parsed sentence, and identify a verb usage pattern in the at least one identified sentence; and form a low level subject-verb-object (SVO) triplet for the at least one parsed sentence, including identify a subject, verb, and object of the at least one parsed sentence, wherein the identification of the subject, verb, and object comprises a join of the identified verb usage pattern with a form of the identified verb to ascertain linguistic taxonomy; and an organization manager in communication with the identification manager, the organization manager to; form a high level SVO triplet for the at least one parsed sentence, including determine a subject category for the subject, a verb category for the verb, and an object category for the object based on the taxonomy; and classify the at least one parsed sentence based on the high level SVO triplet; and summarize the analysis of the stored data, including; produce an analysis report reflective of the analysis; and convert the produced analysis report into a summary report reflective of the analysis report, including cluster the received textual data into one or more statement clusters, wherein the summary report comprises the statement clusters. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification