Computer-implemented methods and systems for clustering user reviews and ranking clusters
First Claim
Patent Images
1. A method of organizing user reviews for data analysis, comprising the steps, each implemented in a computer system, of:
- (a) sampling a plurality of reviews from one or more sources;
(b) identifying unique terms in each of said plurality of reviews;
(c) determining frequency values indicating the number of times each of the unique terms appears in each of said plurality of reviews, wherein determining the frequency values includes generating a term/review matrix having a plurality of rows and columns, in which each row identifies a review and each column identifies a unique term, and wherein the matrix specifies the frequency values for respective unique terms;
(d) adjusting the frequency values for each unique term to account for the rarity of that unique term across the plurality of reviews, wherein adjusting the frequency values includes normalizing the values in the matrix based on values associated with the rarity of the unique term across the plurality of reviews being analyzed;
(e) calculating similarities between the reviews based on the frequency values as adjusted in step (d); and
(f) grouping the reviews into clusters based on the similarities of the reviews determined in step (e).
4 Assignments
0 Petitions
Accused Products
Abstract
Computer-implemented methods and systems are disclosed for organizing user reviews, especially computer app reviews, into clusters and ranking the clusters so that the reviews may be more meaningfully analyzed.
-
Citations
34 Claims
-
1. A method of organizing user reviews for data analysis, comprising the steps, each implemented in a computer system, of:
-
(a) sampling a plurality of reviews from one or more sources; (b) identifying unique terms in each of said plurality of reviews; (c) determining frequency values indicating the number of times each of the unique terms appears in each of said plurality of reviews, wherein determining the frequency values includes generating a term/review matrix having a plurality of rows and columns, in which each row identifies a review and each column identifies a unique term, and wherein the matrix specifies the frequency values for respective unique terms; (d) adjusting the frequency values for each unique term to account for the rarity of that unique term across the plurality of reviews, wherein adjusting the frequency values includes normalizing the values in the matrix based on values associated with the rarity of the unique term across the plurality of reviews being analyzed; (e) calculating similarities between the reviews based on the frequency values as adjusted in step (d); and (f) grouping the reviews into clusters based on the similarities of the reviews determined in step (e). - View Dependent Claims (3, 4, 5, 6, 7, 8, 15, 16, 17, 18, 19, 20, 21, 29, 30, 33)
-
-
2. A computer system, comprising:
-
at least one processor; memory associated with the at least one processor; and a program supported in the memory for organizing user reviews for data analysis, the program containing a plurality of instructions which, when executed by the at least one processor, cause the at least one processor to; (a) sample a plurality of reviews from one or more sources; (b) identify unique terms in each of said plurality of reviews; (c) determine frequency values indicating the number of times each of the unique terms appears in each of said plurality of reviews wherein determining the frequency values includes generating a term/review matrix having a plurality of rows and columns, in which each row identifies a review and each column identifies a unique term, and wherein the matrix specifies the frequency values for respective unique terms; (d) adjust the frequency values for each unique term to account for the rarity of that unique term across the plurality of reviews, wherein adjusting the frequency values includes normalizing the values in the matrix based on values associated with the rarity of the unique term across the plurality of reviews being analyzed; (e) calculate similarities between the reviews based on the frequency values as adjusted in (d); and (f) group the reviews into clusters based on the similarities of the reviews determined in (e). - View Dependent Claims (9, 10, 11, 12, 13, 14, 22, 23, 24, 25, 26, 27, 28, 31, 32, 34)
-
Specification