Systems and methods for identifying key phrase clusters within documents
First Claim
1. An electronic device comprising:
- a computer display;
computer-readable storage media; and
one or more processors configured to execute instructions to cause the electronic device to;
obtain, based on a first user input, documents and a statistical model;
segment contents of the documents into segments;
determine frequencies at which the segments occur within the contents of the documents and store the frequencies in the computer-readable storage media;
with the statistical model, determine modeled frequencies for the segments;
compare the frequencies with the modeled frequencies;
based on the comparison, determine statistical significance values for the segments;
identify representative segments from the segments having statistical significance values exceeding a predetermined threshold value;
cluster the documents into clusters, each cluster having identical or substantially identical representative segments;
determine a label for each cluster;
display within a graphical user interface a representation of the documents;
receive a second user input and identify a set of clusters, from the clusters, associated with the second user input; and
based on the received second user input, modify the graphical user interface to further includea representation of the second user input, andfor each of the clusters of the set of clusters;
an indication of the label associated with the cluster, andan indication of the documents associated with the cluster,wherein the clusters of the set of clusters are grouped and displayed in separate portions of the graphical user interface.
8 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed for key phrase clustering of documents. In accordance with one implementation, a method is provided for key phrase clustering of documents. The method includes obtaining a first plurality of documents based at least on a user input, obtaining a statistical model based at least on the user input, and obtaining, from content of the first plurality of documents, a plurality of segments. The method also includes identifying a plurality of clusters of segments from the plurality of segments, determining statistical significance of the plurality of clusters based at least on the statistical model and the content, and providing for display a representative cluster from the plurality of tokens, the representative cluster being determined based at least on the statistical significance. The method further includes determining a label for the representative cluster based at least on the plurality of clusters and the statistical significance.
-
Citations
20 Claims
-
1. An electronic device comprising:
-
a computer display; computer-readable storage media; and one or more processors configured to execute instructions to cause the electronic device to; obtain, based on a first user input, documents and a statistical model; segment contents of the documents into segments; determine frequencies at which the segments occur within the contents of the documents and store the frequencies in the computer-readable storage media; with the statistical model, determine modeled frequencies for the segments; compare the frequencies with the modeled frequencies; based on the comparison, determine statistical significance values for the segments; identify representative segments from the segments having statistical significance values exceeding a predetermined threshold value; cluster the documents into clusters, each cluster having identical or substantially identical representative segments; determine a label for each cluster; display within a graphical user interface a representation of the documents; receive a second user input and identify a set of clusters, from the clusters, associated with the second user input; and based on the received second user input, modify the graphical user interface to further include a representation of the second user input, and for each of the clusters of the set of clusters; an indication of the label associated with the cluster, and an indication of the documents associated with the cluster, wherein the clusters of the set of clusters are grouped and displayed in separate portions of the graphical user interface. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method performed by one or more processors, the method comprising:
-
obtaining, based on a first user input, documents and a statistical model; segmenting contents of the documents into segments; determining frequencies at which the segments occur within the contents of the documents; with the statistical model, determining modeled frequencies for the segments; comparing the determined frequencies with the modeled frequencies; based on the comparison, determining statistical significance values for the segments; identifying representative segments from the segments based on a comparison of the statistical significance values with a predetermined threshold value; clustering the documents into clusters, each cluster having related representative segments; determining a label for each cluster; displaying within a graphical user interface a representation of the documents; receiving a second user input and identifying a set of clusters, from the clusters, associated with the second user input; and based on the received second user input, modifying the graphical user interface to further include a representation of the second user input, and for each of the clusters of the set of clusters; an indication of the label associated with the cluster, and an indication of the documents associated with the cluster, wherein the clusters of the set of clusters are grouped and displayed in separate portions of the graphical user interface. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification