Systems and methods for identifying key phrase clusters within documents
First Claim
1. An electronic device comprising:
- a computer display;
one or more computer-readable storage media configured to store instructions; and
one or more processors configured to execute the instructions to cause the electronic device to at least;
obtain a first plurality of documents based at least in part on a user input;
obtain a statistical model based at least in part on the user input;
obtain, from content of the first plurality of documents, a plurality of segments;
determine statistical significance for the obtained plurality of segments based at least in part on the obtained statistical model;
determine, for each document in the first plurality of documents, representative segments from the obtained plurality of segments, the representative segments being determined based at least in part on the determined statistical significance;
cluster documents from the obtained first plurality of documents based at least in part on the determined representative segments;
receive a selection of a date range;
for a cluster of documents associated with a date within the date range, automatically associate a label with the cluster of documents based at least in party on the determined representative segments; and
display within a graphical user interface on the computer display a representation of the date range, the label, and contents of and/or links to documents in the cluster of documents.
8 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed for key phrase clustering of documents. In accordance with one implementation, a method is provided for key phrase clustering of documents. The method includes obtaining a first plurality of documents based at least on a user input, obtaining a statistical model based at least on the user input, and obtaining, from content of the first plurality of documents, a plurality of segments. The method also includes identifying a plurality of clusters of segments from the plurality of segments, determining statistical significance of the plurality of clusters based at least on the statistical model and the content, and providing for display a representative cluster from the plurality of tokens, the representative cluster being determined based at least on the statistical significance. The method further includes determining a label for the representative cluster based at least on the plurality of clusters and the statistical significance.
-
Citations
12 Claims
-
1. An electronic device comprising:
-
a computer display; one or more computer-readable storage media configured to store instructions; and one or more processors configured to execute the instructions to cause the electronic device to at least; obtain a first plurality of documents based at least in part on a user input; obtain a statistical model based at least in part on the user input; obtain, from content of the first plurality of documents, a plurality of segments; determine statistical significance for the obtained plurality of segments based at least in part on the obtained statistical model; determine, for each document in the first plurality of documents, representative segments from the obtained plurality of segments, the representative segments being determined based at least in part on the determined statistical significance; cluster documents from the obtained first plurality of documents based at least in part on the determined representative segments; receive a selection of a date range; for a cluster of documents associated with a date within the date range, automatically associate a label with the cluster of documents based at least in party on the determined representative segments; and display within a graphical user interface on the computer display a representation of the date range, the label, and contents of and/or links to documents in the cluster of documents. - View Dependent Claims (2, 3, 4)
-
-
5. A method performed by one or more processors, the method comprising:
-
obtaining a first plurality of documents based on at least a user input; obtaining a statistical model based at least on the user input; obtaining, from content of the first plurality of documents, a plurality of segments; determining statistical significance for the obtained plurality of segments based at least on the obtained statistical model; determining representative segments from the obtained plurality of segments for each document in the first plurality of documents, the representative segments being determined based at least in part on the determined statistical significance; clustering documents from the obtained first plurality of documents based at least in part on the determined representative segments; receiving a selection of a date range; for a cluster of documents associated with a date within the date range, automatically associating a label with the cluster of documents based at least in party on the determined representative segments; and providing for display within a graphical user interface a representation of the date range, the label, and at least one of contents of and links to documents in the cluster of documents. - View Dependent Claims (6, 7, 8)
-
-
9. A non-transitory computer-readable medium storing a set of instructions that are executable by one or more electronic devices, each having one or more processors, to cause the one or more electronic devices to perform a method, the method comprising:
-
obtaining a first plurality of documents associated with a user input; obtaining a statistical model associated with the user input; obtaining, from content of the first plurality of documents, a plurality of segments; determining statistical significance for the plurality of segments based at least on the statistical model; determining, for each document in the first plurality of documents, representative segments from the plurality of segments, the representative segments being determined based at least in part on the statistical significance; clustering documents from the first plurality of documents based at least in part on the representative segments; receiving a selection of a date range; for a cluster of documents associated with a date within the date range, automatically associating a label with the cluster of documents based at least in part on the determined representative segments; and providing for display within a graphical user interface a representation of the date range, the label, and contents of documents in the cluster of documents, or links to documents in the cluster of documents, or a combination thereof. - View Dependent Claims (10, 11, 12)
-
Specification