Automated analysis and summarization of comments in survey response data

US 8,577,884 B2
Filed: 05/13/2008
Issued: 11/05/2013
Est. Priority Date: 05/13/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for summarizing free-form comments in survey response data, the method comprising:

receiving survey responses from a plurality of respondents to a survey, the survey including a series of entries to identify respondent information and a free-form comment area;

extracting, by a computing device, text from the free-form comment area of a survey response of each respondent, and storing the text as a respondent document in a survey database including a plurality of respondent documents representing respondents'"'"' answers to the survey;

identifying, by the computing device, a plurality of topic words from the text of the free-form comment area of each survey response in the survey database;

computing, by the computing device, a weight for each of the plurality of topic words, wherein the weight indicates a relevance of the topic word in the free-form comment area of each survey response in the survey database;

assigning, by the computing device, one or more of the plurality of topic words to each respondent document in the survey database;

identifying, by the computing device, one or more discrete topics associated with certain combinations of topic words, each combination of topic words based upon a certain proximity between the topic words within the respondent document, a grammatical user of the topic words, and a high document weight for each of those topic words;

for each of the identified one or more discrete topics, computing, by the computing device, a count of number of respondent documents in the survey database associated with the each of the identified one or more discrete topics and based upon document weights computed for each topic word in an associated combination of topic words that exceed a threshold value; and

generating, by the computing device, a report comprising an indication of a relative importance of each of the identified one or more discrete topics based upon the count of the number of respondent documents in the survey database computed for each of the identified one or more discrete topics.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technologies are described herein for providing automated analysis and summarization of free-form comments in survey response data. A number of topic words are identified from the survey response comments, and a numeric weight is calculated for each topic word that reflects the relevance of the topic word to each comment. Each topic word is associated with one or more topics and the comments relevant to each topic is then determined based on the weights of the associated topic words in each comment. A report is generated which summarizes the topics and their relative importance in the survey response comments based upon the number of comments relevant to each.

74 Citations

View as Search Results

18 Claims

1. A computer-implemented method for summarizing free-form comments in survey response data, the method comprising:
- receiving survey responses from a plurality of respondents to a survey, the survey including a series of entries to identify respondent information and a free-form comment area;
  
  extracting, by a computing device, text from the free-form comment area of a survey response of each respondent, and storing the text as a respondent document in a survey database including a plurality of respondent documents representing respondents'"'"' answers to the survey;
  
  identifying, by the computing device, a plurality of topic words from the text of the free-form comment area of each survey response in the survey database;
  
  computing, by the computing device, a weight for each of the plurality of topic words, wherein the weight indicates a relevance of the topic word in the free-form comment area of each survey response in the survey database;
  
  assigning, by the computing device, one or more of the plurality of topic words to each respondent document in the survey database;
  
  identifying, by the computing device, one or more discrete topics associated with certain combinations of topic words, each combination of topic words based upon a certain proximity between the topic words within the respondent document, a grammatical user of the topic words, and a high document weight for each of those topic words;
  
  for each of the identified one or more discrete topics, computing, by the computing device, a count of number of respondent documents in the survey database associated with the each of the identified one or more discrete topics and based upon document weights computed for each topic word in an associated combination of topic words that exceed a threshold value; and
  
  generating, by the computing device, a report comprising an indication of a relative importance of each of the identified one or more discrete topics based upon the count of the number of respondent documents in the survey database computed for each of the identified one or more discrete topics.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A computer-implemented method of claim 1, wherein identifying the plurality of topic words from the text of the free-form comment area of each survey response in the survey database and computing the weight for each of the plurality of topic words further comprising:
    - extracting, by the computing device, a plurality of terms from text of the survey of the plurality of respondent documents;
      
      constructing, by the computing device, a term-document matrix, wherein each entry in the term-document matrix comprises a frequency of occurrence of one of the plurality of terms in a particular respondent document of the plurality of respondent documents;
      
      transforming, by the computing device, the term-document matrix by applying a matrix decomposition into a transformed matrix, wherein each entry in the transformed matrix comprises a weight for one of the plurality of terms and for each of the plurality of respondent documents; and
      
      identifying, by the computing device, the plurality of topic words from a subset of the plurality of terms each having a weight exceeding the threshold value.
  - 3. The computer-implemented method of claim 2, wherein transforming the term-document matrix is accomplished using a truncated two-sided orthogonal decomposition.
  - 4. The computer-implemented method of claim 2, wherein extracting the plurality of terms from the text of the plurality of respondent documents comprises:
    - extracting, by the computing device, each word from the text of each of the plurality of respondent documents;
      
      determining, by the computing device, if the extracted word is a stop word; and
      
      one ofincluding, by the computing device, the extracted word in the plurality of terms when the extracted word is not a stop word;
      
      orexcluding, by the computing device, the extracted word from the plurality of terms when the extracted word is a stop word.
  - 5. The computer-implemented method of claim 1, further comprising:
    - receiving demographic data as a respondent provides information;
      
      specifying one or more demographic groups based on the received demographic data;
      
      for each of the one or more specified demographic groups and each of the one or more discrete topics, computing, by the computing device, a second count of a number of respondent documents in the survey database associated with each of the one or more specified demographic groups and relevant to each of the one or more discrete topics, and based upon computed document weights for each topic word in an associated combination of topic words, wherein the computed document weights exceed the threshold value; and
      
      generating, by the computing device, a second report for each of the one or more specified demographic groups, the second report comprising a second indication of the relative importance of each of the one or more discrete topics based upon the second count of the number of respondent documents in the survey database computed for each of the one or more discrete topics and each of the one or more specified demographic groups.
  - 6. The computer-implemented method of claim 1, further comprising identifying at least one group of related topic words from the plurality of topic words based on topic words that represent a same topic, and storing a relationship representing the at least one group of related topic words with corresponding weights for each of the plurality of topic words.
  - 7. The computer-implemented method of claim 1, wherein the report comprises a chart illustrating a relationship between the count of the number of respondent documents computed for each of the one or more discrete topics.
  - 8. The computer-implemented method of claim 7, wherein the chart comprises a type of Pareto chart.

9. A non-transitory computer storage medium having computer executable instructions stored thereon that, when executed by a computer, will cause the computer to:
- create a plurality of respondent documents, wherein each respondent document comprises text of a comment from survey responses from a plurality of respondents to a survey, the survey including a series of entries to identify respondent information and a free-form comment area, the plurality of respondent documents representing respondents'"'"' answers to a same question;
  
  store the plurality of respondent documents in a survey database;
  
  extract a plurality of terms from the plurality of respondent documents;
  
  construct a term-document matrix, wherein each entry in the term-document matrix comprises a frequency of occurrence of one of the plurality of terms in one of the plurality of respondent documents;
  
  transform the term-document matrix utilizing a matrix decomposition into a transformed matrix, wherein each entry in the transformed matrix comprises a weight for one of the plurality of terms in one of the plurality of respondent documents;
  
  identify one or more discrete topics associated with certain combinations of topic words, each combination of topic words based upon a certain proximity between the topic words within a respondent document of the plurality of respondent documents, a grammatical use of the topic words, and a high document weight for each of those topic words;
  
  for each of the identified one or more discrete topics, compute a count of number of respondent documents in the survey database associated with the each of the identified one or more discrete topics and based upon document weights computed for each topic word in an associated combination of topic words that exceed a threshold value; and
  
  generate a report comprising an indication of a relative importance of each of the identified one or more discrete topics based upon the count of the number of respondent documents in the survey database computed for each of the identified one or more discrete topics.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The non-transitory computer storage medium of claim 9, wherein transforming the term-document matrix is accomplished using a truncated two-sided orthogonal decomposition.
  - 11. The non-transitory computer storage medium of claim 9, having further computer executable instructions stored thereon that, when executed by the computer, will cause the computer to:
    - receive demographic data regarding a respondent providing a comment;
      
      associate the demographic data with the respondent document in the database that contains the text of the comment from the survey database;
      
      store the demographic data associated with the respondent document in the survey database;
      
      specify one or more demographic groups;
      
      for each of the one or more specified demographic groups and each of the plurality of terms, compute a second number of respondent documents associated with each of the one or more demographic groups and relevant to each of the plurality of terms, wherein document weight of the term for each of the second number of respondent documents exceeds the threshold value; and
      
      generate a report for each of the one or more specified demographic groups, the report comprising an indication of a relative importance of each of the plurality of terms based upon the second number of respondent documents computed for each of the plurality of terms and each of the one or more specified demographic groups.
  - 12. The non-transitory computer storage medium of claim 11, wherein the report comprise a chart illustrating a relationship between the second number of respondent documents computed for each of the plurality of terms.
  - 13. The non-transitory computer storage medium of claim 12, wherein the chart comprises a type of Pareto chart.

14. A system for performing automated analysis of comments in survey response data, the system comprising:
- a processor;
  
  a memory; and
  
  a text mining application residing in the memory and executing on the processor, the text mining application configured to;
  
  extract demographic data regarding each respondent of a plurality of respondents, each respondent providing a comment for each survey response, wherein the survey response data includes comments provided by the plurality of respondents,identify a plurality of topic words from text of the comments,compute a weight for each of the plurality of topic words in each of the comments, wherein the weight indicates a relevance of the topic word in the comment,identify one or more discrete topics associated with certain combinations of topic words, each combination of topic words based upon a proximity between the topic words within a comment, a grammatical use of the topic words, and a high document weight for each of those topic words,specify one or more demographic groups from the extracted demographic data,for each of the one or more specified demographic groups and each of the one or more discrete topics, compute a count value of a number of comments in the survey response data associated with each of the one or more specified demographic groups and relevant to each of the one or more discrete topics, and based upon document weights computed for each topic word in an associated combination of topic words that exceed a threshold value, andgenerate a report for each of the one or more specified demographic groups, the report comprising an indication of a relative importance of each of the one or more discrete topics based upon the count value of the number of comments in the survey response data computed for each of the one or more specified demographic groups and each of the one or more discrete topics.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The system of claim 14, wherein identifying the plurality of topic words from the text of the comments and computing the weight regarding each of the plurality of topic words further comprises:
    - extracting a plurality of terms from the text of the comments;
      
      constructing a term-document matrix, wherein each entry in the term-document matrix comprises a frequency of occurrence of one of the plurality of term in the text of one of the comments;
      
      transforming the term-document matrix utilizing a matrix decomposition into a transformed matrix, wherein each entry in the transformed matrix comprises a weight for one of the plurality of terms in one of the comments; and
      
      identifying the plurality of topic words from a subset of the plurality of terms each having a weight regarding at least one of the comments exceeding the threshold value.
  - 16. The system of claim 14, further comprises:
    - identifying at least one group of related topic words from the plurality of topic words based on topic words that represent a same topic, and storing a relationship representing the at least one group of related topic words with corresponding weights for each of the plurality of topic words.
  - 17. The system of claim 14, wherein the report comprises a chart illustrating a relationship between the number of comments of the survey response data computed for each of the one or more specified demographic groups and each of the one or more discrete topics.
  - 18. The system of claim 17, wherein the chart comprises a type of Pareto chart.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Boeing Co.
Original Assignee
The Boeing Co.
Inventors
Poteet, Stephen R., Kao, Anne, Luh, Shan
Primary Examiner(s)
Cao, Phuong Thao

Application Number

US12/119,697
Publication Number

US 20090287642A1
Time in Patent Office

2,002 Days
Field of Search

707/999.003, 707/705, 707/738, 707/739, 707/776
US Class Current

707/737
CPC Class Codes

G06Q 30/02 Marketing; Price estimation...

Automated analysis and summarization of comments in survey response data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

74 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Automated analysis and summarization of comments in survey response data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

74 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links