Computer-implemented system and method for generating document groupings for display

US 9,619,551 B2
Filed: 11/23/2015
Issued: 04/11/2017
Est. Priority Date: 08/31/2001
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented system for generating document groupings, comprising:

a database to store a set of document, a lexicon of terms extracted from the set of documents and comprising a frequency of each extracted term within each document, and concepts each comprising two or more of the extracted terms; and

a server comprising a central processing unit, memory, an input port to receive the documents, lexicon and concepts from the database, and an output port, wherein the central processing unit is configured to;

select a subset of the documents in the set based on the term frequencies;

group the subset of documents into clusters based on the concepts;

calculate a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and

update the weights until a rate of change for each cluster becomes constant.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented system and method for generating document groupings is provided. A lexicon of terms extracted from a set of documents is generated. The lexicon includes a frequency of each extracted term within each document in the set. Concepts each having two or more of the extracted terms are generated. A subset of the documents in the set is selected based on the term frequencies. The subset of documents is grouped into clusters based on the concepts. A similarity of each document cluster is calculated with one or more documents based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms. The weights are updated until a rate of change for each cluster becomes constant.

Citations

20 Claims

1. A computer-implemented system for generating document groupings, comprising:
- a database to store a set of document, a lexicon of terms extracted from the set of documents and comprising a frequency of each extracted term within each document, and concepts each comprising two or more of the extracted terms; and
  
  a server comprising a central processing unit, memory, an input port to receive the documents, lexicon and concepts from the database, and an output port, wherein the central processing unit is configured to;
  
  select a subset of the documents in the set based on the term frequencies;
  
  group the subset of documents into clusters based on the concepts;
  
  calculate a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and
  
  update the weights until a rate of change for each cluster becomes constant.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A system according to claim 1, wherein the central processing unit further identifies one or more of the clusters as meaningful by defining a variance, selects those clusters having one or more documents with inner products that fall within the variance as meaningful, and presents those documents identified as meaningful.
  - 3. A system according to claim 1, wherein the central processing unit further represents semantic content of each document by mapping the terms in that document in order of decreasing frequency.
  - 4. A system according to claim 1, wherein the central processing unit further represents latent semantics of the document set by mapping each term in the document set against a total frequency occurrence, wherein the total frequency occurrence is calculated as a sum of the term frequencies within each document in the set.
  - 5. A system according to claim 4, wherein the central processing unit further selects a median value and edge conditions for the total frequency occurrences of the terms and generates the subset of documents from those documents in the set that satisfy the edge conditions.
  - 6. A system according to claim 5, wherein the central processing unit further re-centers the median value and to generate a different subset of documents for grouping.
  - 7. A system according to claim 5, wherein the central processing unit further sets the edge conditions based on a size of the documents.
  - 8. A system according to claim 7, wherein larger documents have tighter edge conditions than shorter documents.
  - 9. A system according to claim 1, wherein the central processing unit further calculates the distance using the following equation:
  - 10. A system according to claim 1, wherein the central processing unit further calculates the rate of change by determining a first derivative of the inner products over successive iterations.

11. A computer-implemented method for generating document groupings, comprising:
- generating a lexicon of terms extracted from a set of documents and comprising a frequency of each extracted term within each document;
  
  generating concepts each comprising two or more of the extracted terms;
  
  selecting a subset of the documents in the set based on the term frequencies;
  
  grouping the subset of documents into clusters based on the concepts;
  
  calculating a similarity of each document cluster with at least one document based on a distance by summing the frequency of each term in that document and a weight of the cluster for each of the terms; and
  
  updating the weights until a rate of change for each cluster becomes constant.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. A method according to claim 11, further comprising:
    - identifying one or more of the clusters as meaningful, comprising;
      
      defining a variance; and
      
      selecting those clusters having one or more documents with inner products that fall within the variance as meaningful; and
      
      displaying those documents identified as meaningful.
  - 13. A method according to claim 11, further comprising:
    - representing semantic content of each document by mapping the terms in that document in order of decreasing frequency.
  - 14. A method according to claim 11, further comprising:
    - representing latent semantics of the document set by mapping each term in the document set against a total frequency occurrence, wherein the total frequency occurrence is calculated as a sum of the term frequencies within each document in the set.
  - 15. A method according to claim 14, further comprising:
    - selecting a median value and edge conditions for the total frequency occurrences of the terms; and
      
      generating the subset of documents from those documents in the set that satisfy the edge conditions.
  - 16. A method according to claim 15, further comprising:
    - re-centering the median value; and
      
      generating a different subset of documents for grouping.
  - 17. A method according to claim 15, further comprising:
    - setting the edge conditions based on a size of the documents.
  - 18. A method according to claim 17, wherein larger documents have tighter edge conditions than shorter documents.
  - 19. A method according to claim 11, further comprising:
    - calculating the distance using the following equation;
  - 20. A method according to claim 11, further comprising:
    - calculating the rate of change by determining a first derivative of the inner products over successive iterations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuix North America Inc. (Nuix Ltd.)
Original Assignee
FTI Consulting Technology LLC (FTI Consulting Incorporated)
Inventors
Gallivan, Dan, Kawai, Kenji
Primary Examiner(s)
Spieler, William

Application Number

US14/949,829
Publication Number

US 20160078126A1
Time in Patent Office

505 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/23   Updating

G06F 16/24575   using context

G06F 16/285   Clustering or classification

G06F 16/313   Selection or weighting of t...

G06F 16/35   Clustering; Classification

G06F 16/355   Class or cluster creation o...

G06F 16/93   Document management systems

G06F 16/955   using information identifie...

G06F 3/0641   De-duplication techniques

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99936   Pattern matching access

Y10S 707/99943   Generating database or data...

Y10S 707/99945   Object-oriented database st...

Computer-implemented system and method for generating document groupings for display

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Computer-implemented system and method for generating document groupings for display

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links