Computer-implemented methods and systems for clustering user reviews and ranking clusters

US 9,928,233 B2
Filed: 11/12/2014
Issued: 03/27/2018
Est. Priority Date: 11/12/2014
Status: Active Grant

First Claim

Patent Images

1. A method of organizing user reviews for data analysis, comprising the steps, each implemented in a computer system, of:

(a) sampling a plurality of reviews from one or more sources;

(b) identifying unique terms in each of said plurality of reviews;

(c) determining frequency values indicating the number of times each of the unique terms appears in each of said plurality of reviews, wherein determining the frequency values includes generating a term/review matrix having a plurality of rows and columns, in which each row identifies a review and each column identifies a unique term, and wherein the matrix specifies the frequency values for respective unique terms;

(d) adjusting the frequency values for each unique term to account for the rarity of that unique term across the plurality of reviews, wherein adjusting the frequency values includes normalizing the values in the matrix based on values associated with the rarity of the unique term across the plurality of reviews being analyzed;

(e) calculating similarities between the reviews based on the frequency values as adjusted in step (d); and

(f) grouping the reviews into clusters based on the similarities of the reviews determined in step (e).

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Computer-implemented methods and systems are disclosed for organizing user reviews, especially computer app reviews, into clusters and ranking the clusters so that the reviews may be more meaningfully analyzed.

Citations

34 Claims

1. A method of organizing user reviews for data analysis, comprising the steps, each implemented in a computer system, of:
- (a) sampling a plurality of reviews from one or more sources;
  
  (b) identifying unique terms in each of said plurality of reviews;
  
  (c) determining frequency values indicating the number of times each of the unique terms appears in each of said plurality of reviews, wherein determining the frequency values includes generating a term/review matrix having a plurality of rows and columns, in which each row identifies a review and each column identifies a unique term, and wherein the matrix specifies the frequency values for respective unique terms;
  
  (d) adjusting the frequency values for each unique term to account for the rarity of that unique term across the plurality of reviews, wherein adjusting the frequency values includes normalizing the values in the matrix based on values associated with the rarity of the unique term across the plurality of reviews being analyzed;
  
  (e) calculating similarities between the reviews based on the frequency values as adjusted in step (d); and
  
  (f) grouping the reviews into clusters based on the similarities of the reviews determined in step (e).
- View Dependent Claims (3, 4, 5, 6, 7, 8, 15, 16, 17, 18, 19, 20, 21, 29, 30, 33)
- - 3. The method of claim 1, wherein step (b) comprises tokenizing text in each of said plurality of reviews into a set of words and stemming each word not already in root form to generate a set of terms for each review, from which said unique terms are identified.
  - 4. The method of claim 1, wherein step (d) comprises calculating tf/idf values for each review, wherein tf=the raw frequency of a term within a review, and idf=log(total number of reviews/total number of reviews containing the term), and wherein the tf/idf values comprise the frequency values adjusted in step (d).
  - 5. The method of claim 1, wherein similarities between the reviews are calculated in step (e) using a standard cosine similarity calculation.
  - 6. The method of claim 1, wherein steps (e) and (f) comprise dividing the reviews into a given number of chunks, calculating the similarity of each review against every other review in each chunk, and for each pair of reviews whose similarity exceeds a given threshold, clustering said reviews.
  - 7. The method of claim 1, further comprising ranking the clusters.
  - 8. The method of claim 1, further comprising displaying the clusters to a user on a computer display.
  - 15. The method of claim 4, further comprising normalizing the term/review matrix.
  - 16. The method of claim 6, further comprising using the transitive property for clustering reviews.
  - 17. The method of claim 6, further comprising merging clusters with similarly sufficient reviews.
  - 18. The method of claim 6, further comprising performing a post merge process to collapse clusters across chunks.
  - 19. The method of claim 7, wherein the clusters are ranked based on a plurality of factors.
  - 20. The method of claim 7, further comprising displaying the clusters to a user on a computer device, wherein the clusters are displayed in order of ranking.
  - 21. The method of claim 8, wherein information displayed for each cluster includes one or more of:
    - text representative of similar text found in the reviews in the cluster, the total number of reviews in the cluster, the time period over which the reviews in the cluster were posted, and the average rating of the reviews in the cluster.
  - 29. The method of claim 17, wherein the clusters are merged based on cluster centroids within each chunk.
  - 30. The method of claim 19, wherein said factors include one or more of:
    - the date or time of the most recent review in a cluster, the average rating of the reviews in a cluster, the number of reviews in a cluster, and the average similarity score of the reviews in a cluster.
  - 33. The method of claim 30, wherein the factors are weighted such that clusters with more recent reviews, lower average ratings, higher number of reviews, and higher average similarity scores are assigned higher weights.

2. A computer system, comprising:
- at least one processor;
  
  memory associated with the at least one processor; and
  
  a program supported in the memory for organizing user reviews for data analysis, the program containing a plurality of instructions which, when executed by the at least one processor, cause the at least one processor to;
  
  (a) sample a plurality of reviews from one or more sources;
  
  (b) identify unique terms in each of said plurality of reviews;
  
  (c) determine frequency values indicating the number of times each of the unique terms appears in each of said plurality of reviews wherein determining the frequency values includes generating a term/review matrix having a plurality of rows and columns, in which each row identifies a review and each column identifies a unique term, and wherein the matrix specifies the frequency values for respective unique terms;
  
  (d) adjust the frequency values for each unique term to account for the rarity of that unique term across the plurality of reviews, wherein adjusting the frequency values includes normalizing the values in the matrix based on values associated with the rarity of the unique term across the plurality of reviews being analyzed;
  
  (e) calculate similarities between the reviews based on the frequency values as adjusted in (d); and
  
  (f) group the reviews into clusters based on the similarities of the reviews determined in (e).
- View Dependent Claims (9, 10, 11, 12, 13, 14, 22, 23, 24, 25, 26, 27, 28, 31, 32, 34)
- - 9. The computer system of claim 2, wherein (b) comprises tokenizing text in each of said plurality of reviews into a set of words and stemming each word not already in root form to generate a set of terms for each review, from which said unique terms are identified.
  - 10. The computer system of claim 2, wherein (d) comprises calculating tf/idf values for each review, wherein tf=the raw frequency of a term within a review, and idf=log (total number of reviews/total number of reviews containing the term), and wherein the tf/idf values comprise the frequency values adjusted in (d).
  - 11. The computer system of claim 2, wherein similarities between the reviews are calculated in (e) using a standard cosine similarity calculation.
  - 12. The computer system of claim 2, wherein (e) and (f) comprise dividing the reviews into a given number of chunks, calculating the similarity of each review against every other review in each chunk, and for each pair of reviews whose similarity exceeds a given threshold, clustering said reviews.
  - 13. The computer system of claim 2, wherein the program further comprises instructions for ranking the clusters.
  - 14. The computer system of claim 2, wherein the program further comprises instructions for displaying the clusters to a user on a computer display.
  - 22. The computer system of claim 10, wherein the program further comprises instructions for normalizing the term/review matrix.
  - 23. The computer system of claim 12, wherein the program further comprises instructions for using the transitive property for clustering reviews.
  - 24. The computer system of claim 12, wherein the program further comprises instructions for merging clusters with similarly sufficient reviews.
  - 25. The computer system of claim 12, wherein the program further comprises instructions for performing a post merge process to collapse clusters across chunks.
  - 26. The computer system of claim 13, wherein the clusters are ranked based on a plurality of factors.
  - 27. The computer system of claim 13, wherein the program further comprises instructions for displaying the clusters to a user on a computer device, wherein the clusters are displayed in order of ranking.
  - 28. The computer system of claim 14, wherein information displayed for each cluster includes one or more of:
    - text representative of similar text found in the reviews in the cluster, the total number of reviews in the cluster, the time period over which the reviews in the cluster were posted, and the average rating of the reviews in the cluster.
  - 31. The computer system of claim 24, wherein the clusters are merged based on cluster centroids within each chunk.
  - 32. The computer system of claim 26, wherein said factors include one or more of:
    - the date or time of the most recent review in a cluster, the average rating of the reviews in a cluster, the number of reviews in a cluster, and the average similarity score of the reviews in a cluster.
  - 34. The computer system of claim 32, wherein the factors are weighted such that clusters with more recent reviews, lower average ratings, higher number of reviews, and higher average similarity scores are assigned higher weights.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Applause App Quality, Inc.
Original Assignee
Applause App Quality, Inc.
Inventors
Young, Heidi A., Stredwick, Jason M., Mavianakere, Yashas
Primary Examiner(s)
AL HASHEMI, SANA A

Application Number

US14/539,623
Publication Number

US 20160132504A1
Time in Patent Office

1,231 Days
Field of Search

707607, 707608, 707687, 707705, 707790, 707813, 707821
US Class Current
CPC Class Codes

G06F 16/355   Class or cluster creation o...

G06F 40/284   Lexical analysis, e.g. toke...

G06Q 30/0282   Rating or review of busines...

Computer-implemented methods and systems for clustering user reviews and ranking clusters

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Computer-implemented methods and systems for clustering user reviews and ranking clusters

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links