Automated analysis and summarization of comments in survey response data
First Claim
1. A computer-implemented method for summarizing free-form comments in survey response data, the method comprising:
- receiving survey responses from a plurality of respondents to a survey, the survey including a series of entries to identify respondent information and a free-form comment area;
extracting, by a computing device, text from the free-form comment area of a survey response of each respondent, and storing the text as a respondent document in a survey database including a plurality of respondent documents representing respondents'"'"' answers to the survey;
identifying, by the computing device, a plurality of topic words from the text of the free-form comment area of each survey response in the survey database;
computing, by the computing device, a weight for each of the plurality of topic words, wherein the weight indicates a relevance of the topic word in the free-form comment area of each survey response in the survey database;
assigning, by the computing device, one or more of the plurality of topic words to each respondent document in the survey database;
identifying, by the computing device, one or more discrete topics associated with certain combinations of topic words, each combination of topic words based upon a certain proximity between the topic words within the respondent document, a grammatical user of the topic words, and a high document weight for each of those topic words;
for each of the identified one or more discrete topics, computing, by the computing device, a count of number of respondent documents in the survey database associated with the each of the identified one or more discrete topics and based upon document weights computed for each topic word in an associated combination of topic words that exceed a threshold value; and
generating, by the computing device, a report comprising an indication of a relative importance of each of the identified one or more discrete topics based upon the count of the number of respondent documents in the survey database computed for each of the identified one or more discrete topics.
1 Assignment
0 Petitions
Accused Products
Abstract
Technologies are described herein for providing automated analysis and summarization of free-form comments in survey response data. A number of topic words are identified from the survey response comments, and a numeric weight is calculated for each topic word that reflects the relevance of the topic word to each comment. Each topic word is associated with one or more topics and the comments relevant to each topic is then determined based on the weights of the associated topic words in each comment. A report is generated which summarizes the topics and their relative importance in the survey response comments based upon the number of comments relevant to each.
74 Citations
18 Claims
-
1. A computer-implemented method for summarizing free-form comments in survey response data, the method comprising:
-
receiving survey responses from a plurality of respondents to a survey, the survey including a series of entries to identify respondent information and a free-form comment area; extracting, by a computing device, text from the free-form comment area of a survey response of each respondent, and storing the text as a respondent document in a survey database including a plurality of respondent documents representing respondents'"'"' answers to the survey; identifying, by the computing device, a plurality of topic words from the text of the free-form comment area of each survey response in the survey database; computing, by the computing device, a weight for each of the plurality of topic words, wherein the weight indicates a relevance of the topic word in the free-form comment area of each survey response in the survey database; assigning, by the computing device, one or more of the plurality of topic words to each respondent document in the survey database; identifying, by the computing device, one or more discrete topics associated with certain combinations of topic words, each combination of topic words based upon a certain proximity between the topic words within the respondent document, a grammatical user of the topic words, and a high document weight for each of those topic words; for each of the identified one or more discrete topics, computing, by the computing device, a count of number of respondent documents in the survey database associated with the each of the identified one or more discrete topics and based upon document weights computed for each topic word in an associated combination of topic words that exceed a threshold value; and generating, by the computing device, a report comprising an indication of a relative importance of each of the identified one or more discrete topics based upon the count of the number of respondent documents in the survey database computed for each of the identified one or more discrete topics. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer storage medium having computer executable instructions stored thereon that, when executed by a computer, will cause the computer to:
-
create a plurality of respondent documents, wherein each respondent document comprises text of a comment from survey responses from a plurality of respondents to a survey, the survey including a series of entries to identify respondent information and a free-form comment area, the plurality of respondent documents representing respondents'"'"' answers to a same question; store the plurality of respondent documents in a survey database; extract a plurality of terms from the plurality of respondent documents; construct a term-document matrix, wherein each entry in the term-document matrix comprises a frequency of occurrence of one of the plurality of terms in one of the plurality of respondent documents; transform the term-document matrix utilizing a matrix decomposition into a transformed matrix, wherein each entry in the transformed matrix comprises a weight for one of the plurality of terms in one of the plurality of respondent documents; identify one or more discrete topics associated with certain combinations of topic words, each combination of topic words based upon a certain proximity between the topic words within a respondent document of the plurality of respondent documents, a grammatical use of the topic words, and a high document weight for each of those topic words; for each of the identified one or more discrete topics, compute a count of number of respondent documents in the survey database associated with the each of the identified one or more discrete topics and based upon document weights computed for each topic word in an associated combination of topic words that exceed a threshold value; and generate a report comprising an indication of a relative importance of each of the identified one or more discrete topics based upon the count of the number of respondent documents in the survey database computed for each of the identified one or more discrete topics. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A system for performing automated analysis of comments in survey response data, the system comprising:
-
a processor; a memory; and a text mining application residing in the memory and executing on the processor, the text mining application configured to; extract demographic data regarding each respondent of a plurality of respondents, each respondent providing a comment for each survey response, wherein the survey response data includes comments provided by the plurality of respondents, identify a plurality of topic words from text of the comments, compute a weight for each of the plurality of topic words in each of the comments, wherein the weight indicates a relevance of the topic word in the comment, identify one or more discrete topics associated with certain combinations of topic words, each combination of topic words based upon a proximity between the topic words within a comment, a grammatical use of the topic words, and a high document weight for each of those topic words, specify one or more demographic groups from the extracted demographic data, for each of the one or more specified demographic groups and each of the one or more discrete topics, compute a count value of a number of comments in the survey response data associated with each of the one or more specified demographic groups and relevant to each of the one or more discrete topics, and based upon document weights computed for each topic word in an associated combination of topic words that exceed a threshold value, and generate a report for each of the one or more specified demographic groups, the report comprising an indication of a relative importance of each of the one or more discrete topics based upon the count value of the number of comments in the survey response data computed for each of the one or more specified demographic groups and each of the one or more discrete topics. - View Dependent Claims (15, 16, 17, 18)
-
Specification