Computer-implemented system and method for visually suggesting classification for inclusion-based cluster spines

US 9,542,483 B2
Filed: 04/28/2014
Issued: 01/10/2017
Est. Priority Date: 07/28/2009
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented system for visually suggesting classification for inclusion-based document cluster spines, comprising:

a non-transitory computer readable storage medium comprising program code; and

a computer processor configured coupled to the storage medium, wherein the processor is configured to execute the program code to perform steps to;

designate a set of reference documents each associated with a classification code;

obtain a different set of uncoded documents;

combine one or more of the coded reference documents with a plurality of uncoded documents into a combined document set;

group the documents in the combined document set into clusters;

organize the clusters along one or more spines, each spine comprising a vector;

provide a visual suggestion for assigning one of the classification codes to one of the spines comprising visually representing each of the reference concepts in the clusters along that spine;

identify one of the documents as a center of one of the clusters;

generate a score vector for the cluster center;

compare the score vector for the cluster center to score vectors associated with one or more of the reference documents;

identify a neighborhood of similar reference documents for the cluster based on the comparison; and

assign one of the classification codes to the cluster based on the neighborhood, comprising;

determine a distance between the cluster center and the reference documents in the neighborhood; and

generate the classification code for assignment to the cluster, comprising at least one of;

identify the reference document with the closest distance to the cluster center and assign the classification code of the reference document with the closest distance as the generated classification code for the cluster;

calculate an average of the distances between the cluster center and the reference documents associated with each of the classification codes and assign the classification code with the closest average distance as the generated classification code of the cluster; and

count the reference documents in the neighborhood for each of the classification codes, weigh each count based on the distance between the reference documents with the classification code and the cluster center, and assign the classification code with the highest weighted count as the generated classification code of the cluster.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented system and method for visually suggesting classification for inclusion-based document cluster spines are provided. A set of reference documents each associated with a classification code is designated. A different set of un-coded documents is obtained. One or more of the coded reference documents are combined with a plurality of un-coded documents into a combined document set. The documents in the combined document set are grouped into clusters. The clusters are organized along one or more spines, each spine including a vector. A visual suggestion for assigning one of the classification codes to one of the spines is provided, including visually representing each of the reference concepts in the clusters along that spine.

309 Citations

16 Claims

1. A computer-implemented system for visually suggesting classification for inclusion-based document cluster spines, comprising:
- a non-transitory computer readable storage medium comprising program code; and
  
  a computer processor configured coupled to the storage medium, wherein the processor is configured to execute the program code to perform steps to;
  
  designate a set of reference documents each associated with a classification code;
  
  obtain a different set of uncoded documents;
  
  combine one or more of the coded reference documents with a plurality of uncoded documents into a combined document set;
  
  group the documents in the combined document set into clusters;
  
  organize the clusters along one or more spines, each spine comprising a vector;
  
  provide a visual suggestion for assigning one of the classification codes to one of the spines comprising visually representing each of the reference concepts in the clusters along that spine;
  
  identify one of the documents as a center of one of the clusters;
  
  generate a score vector for the cluster center;
  
  compare the score vector for the cluster center to score vectors associated with one or more of the reference documents;
  
  identify a neighborhood of similar reference documents for the cluster based on the comparison; and
  
  assign one of the classification codes to the cluster based on the neighborhood, comprising;
  
  determine a distance between the cluster center and the reference documents in the neighborhood; and
  
  generate the classification code for assignment to the cluster, comprising at least one of;
  
  identify the reference document with the closest distance to the cluster center and assign the classification code of the reference document with the closest distance as the generated classification code for the cluster;
  
  calculate an average of the distances between the cluster center and the reference documents associated with each of the classification codes and assign the classification code with the closest average distance as the generated classification code of the cluster; and
  
  count the reference documents in the neighborhood for each of the classification codes, weigh each count based on the distance between the reference documents with the classification code and the cluster center, and assign the classification code with the highest weighted count as the generated classification code of the cluster.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system according to claim 1, the steps further comprising:
    - provide at least one of a presence and an absence of the documents with each of the classification codes in the clusters along that spine; and
      
      a number of the documents with each of the classification codes in the clusters along that spine,wherein the suggestion includes the number and at least one of the presence and the absence.
  - 3. The system according to claim 2, the steps further comprising:
    - provide a visual classification suggestion for at least one of the clusters and one or more un-coded documents in that cluster, the suggestion comprising at least one of the presence and the absence and the number for that cluster.
  - 4. The system according to claim 1, the steps further comprising:
    - receive a user-selection of parameters for defining one or more of sources, custodians, and the classification codes of the reference documents; and
      
      receive a user-selection of parameters for defining one or more of commands relating to the reference documents and the un-coded documents, thresholds for the clustering, and automatically assigning one of the classification codes to one of the un-coded documents.
  - 5. The system according to claim 4, wherein the sources comprise those of the reference documents for which the associated classification codes have been verified, those of the reference documents that have been analyzed, and those of the reference documents associated with one of a plurality of document review projects.
  - 6. The system according to claim 1, the steps further comprising:
    - provide a compass within which one or more of the clusters organized along the spines are displayed on a display;
      
      display different one or more of the clusters in the compass upon receiving a user command,wherein the clusters are emphasized when displayed within the compass and deemphasized when displayed outside of the compass.
  - 7. The system according to claim 6, the steps further comprising:
    - associate a label with each of the spines, each label associated with one or more concepts from the documents in the clusters along that spine; and
      
      display the labels circumferentially outside of the compass,wherein the displayed labels do not overlap.
  - 8. The system according to claim 1, wherein the visual representation of one of the reference documents associated with one of the classification codes comprises at least one of a symbol, shape, and color different from the visual representations of the reference documents with the remaining classification codes.

9. A computer-implemented method for visually suggesting classification for inclusion-based document cluster spines, comprising the steps of:
- designating a set of reference documents each associated with a classification code;
  
  obtaining a different set of un-coded documents;
  
  combining one or more of the coded reference documents with a plurality of un-coded documents into a combined document set;
  
  grouping the documents in the combined document set into clusters;
  
  organizing the clusters along one or more spines, each spine comprising a vector; and
  
  providing a visual suggestion for assigning one of the classification codes to one of the spines comprising visually representing each of the reference concepts in the clusters along that spine;
  
  identifying one of the documents as a center of one of the clusters;
  
  generating a score vector for the cluster center;
  
  comparing the score vector for the cluster center to score vectors associated with one or more of the reference documents;
  
  identifying a neighborhood of similar reference documents for the cluster based on the comparison; and
  
  assigning one of the classification codes to the cluster based on the neighborhood, further comprising;
  
  determining a distance between the cluster center and the reference documents in the neighborhood; and
  
  generating the classification code for assignment to the cluster, comprising at least one of;
  
  identifying the reference document with the closest distance to the cluster center and assigning the classification code of the reference document with the closest distance as the generated classification code for the cluster;
  
  calculating an average of the distances between the cluster center and the reference documents associated with each of the classification codes and assigning the classification code with the closest average distance as the generated classification code of the cluster; and
  
  counting the reference documents in the neighborhood for each of the classification codes, weighing each count based on the distance between the reference documents with the classification code and the cluster center, and assigning the classification code with the highest weighted count as the generated classification code of the cluster,wherein the steps are performed by a suitably programmed computer.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method according to claim 9, further comprising:
    - providing at least one of a presence and an absence of the documents with each of the classification codes in the clusters along that spine; and
      
      providing a number of the documents with each of the classification codes in the clusters along that spine,wherein the suggestion includes the number and at least one of the presence and the absence.
  - 11. The method according to claim 10, further comprising:
    - providing a visual classification suggestion for at least one of the clusters and one or more un-coded documents in that cluster, the suggestion comprising the at least one of presence and the absence and the number for that cluster.
  - 12. The method according to claim 9, further comprising:
    - receiving a user-selection of parameters for defining one or more of sources, custodians, and the classification codes of the reference documents; and
      
      receiving a user-selection of parameters for defining one or more of commands relating to the reference documents and the un-coded documents, thresholds for the clustering, and automatically assigning one of the classification codes to one of the un-coded documents.
  - 13. The method according to claim 12, wherein the sources comprise those of the reference documents for which the associated classification codes have been verified, those of the reference documents that have been analyzed, and those of the reference documents associated with one of a plurality of document review projects.
  - 14. The method according to claim 9, further comprising:
    - providing a compass within which one or more of the clusters organized along the spines are displayed on a display;
      
      displaying different one or more of the clusters in the compass upon receiving a user command,wherein the clusters are emphasized when displayed within the compass and deemphasized when displayed outside of the compass.
  - 15. The method according to claim 14, further comprising:
    - associating a label with each of the spines, each label associated with one or more concepts from the documents in the clusters along that spine; and
      
      displaying the labels circumferentially outside of the compass, wherein the displayed labels do not overlap.
  - 16. The method according to claim 9, wherein the visual representation of one of the reference documents associated with one of the classification codes comprises at least one of a symbol, shape, and color different from the visual representations of the reference documents with the remaining classification codes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuix North America Inc. (Nuix Ltd.)
Original Assignee
FTI Consulting Incorporated
Inventors
Knight, William C., Nussbaum, Nicholas I.
Primary Examiner(s)
Choi, Yuk Ting

Application Number

US14/263,934
Publication Number

US 20140236947A1
Time in Patent Office

988 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/287   Visualization; Browsing

G06F 16/3322   using system suggestions G0...

G06F 16/334   Query execution G06F16/335 ...

G06F 16/35   Clustering; Classification

G06F 16/353   into predefined classes

G06F 16/355   Class or cluster creation o...

G06F 16/358   Browsing; Visualisation the...

G06F 16/93   Document management systems

G06F 16/954   Navigation, e.g. using cate...

G06N 20/00   Machine learning

G06N 5/02   Knowledge representation; S...

G06N 5/047   Pattern matching networks; ...

G06N 7/01   Probabilistic graphical mod...

Computer-implemented system and method for visually suggesting classification for inclusion-based cluster spines

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

309 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Computer-implemented system and method for visually suggesting classification for inclusion-based cluster spines

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

309 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links