SYSTEMS AND METHODS FOR METRIC DATA SMOOTHING

US 20150127650A1
Filed: 11/04/2014
Published: 05/07/2015
Est. Priority Date: 11/04/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a matrix for a set of documents, each row of the matrix corresponding to each document of the set of documents and each column of the matrix corresponding to a text segment that may be in any of the set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document;

receiving an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix;

adjusting, for each document, a frequency value of the second column based on the frequency value of the first column;

projecting each frequency value of the matrix into a reference space to generate a set of projection values in the reference space;

identifying a plurality of subsets of the reference space, at least some of the plurality of subsets including at least some of the projection values;

clustering, for each subset of the plurality of subsets, at least some documents of the set of documents that correspond to projection values that are members of that subset to generate clusters of one or more documents; and

generating a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An exemplary method may comprise receiving a matrix for a set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document, receiving an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix, adjusting, for each document, a frequency value of the second column based on the frequency value of the first column, projecting each frequency value into a reference space to generate a set of projection values, identifying a plurality of subsets of the reference space, clustering, for each subset of the plurality of subsets, at least some documents that correspond to projection values, and generating a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster.

48 Citations

View as Search Results

27 Claims

1. A method comprising:
- receiving a matrix for a set of documents, each row of the matrix corresponding to each document of the set of documents and each column of the matrix corresponding to a text segment that may be in any of the set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document;
  
  receiving an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix;
  
  adjusting, for each document, a frequency value of the second column based on the frequency value of the first column;
  
  projecting each frequency value of the matrix into a reference space to generate a set of projection values in the reference space;
  
  identifying a plurality of subsets of the reference space, at least some of the plurality of subsets including at least some of the projection values;
  
  clustering, for each subset of the plurality of subsets, at least some documents of the set of documents that correspond to projection values that are members of that subset to generate clusters of one or more documents; and
  
  generating a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 3. The method of claim 1, wherein generating the graph of nodes comprises generating a graphical representation of the graph of nodes.
  - 4. The method of claim 1, further comprising generating a link between at least two nodes of the graph of nodes, each node corresponding to different clusters, a first document of the set of documents being a member of the different clusters.
  - 5. The method of claim 4, wherein generating the graph of nodes comprises generating a graphical representation of the graph of nodes and generating the link comprises generating an edge between the at least two nodes.
  - 6. The method of claim 1, wherein the plurality of subsets of the reference space have a non-empty intersection.
  - 7. The method of claim 1, wherein adjusting, for each document, a frequency value comprises generating a third column of the matrix, each cell of the third column containing the adjusted frequency value for the corresponding document and the second column of frequency values remains unchanged.
  - 8. The method of claim 1, wherein projecting each frequency value comprises projecting each frequency value, including each of the adjusted frequency values, into the reference space to generate the set of projection values in the reference space.
  - 9. The method of claim 8, wherein the second column remains unchanged.
  - 10. The method of claim 1, wherein the text segments are from at least one dictionary of text segments.
  - 11. The method of claim 1, wherein one or more of the text segments are words.
  - 12. The method of claim 1, wherein one or more of the text segments are n-grams.
  - 13. The method of claim 1, wherein each frequency value is a term frequency-inverse document frequency for the corresponding text segment and the corresponding document.

2. The method of claim 2, wherein clustering, for the each subset of the plurality of subsets, at least some documents of the set of documents comprises:
- determining a distance between at least two documents of the set of documents corresponding to at least two projection values in a first subset of the plurality of subsets;
  
  comparing the distance to a threshold value; and
  
  clustering each of the at least two documents in two different clusters or one cluster based on the comparison.

14. A system comprises:
- an input module configured to receive a matrix for a set of documents, each row of the matrix corresponding to each document of the set of documents and each column of the matrix corresponding to a text segment that may be in any of the set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document;
  
  a smoothing module configured to receive an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix and to adjust, for each document, a frequency value of the second column based on the frequency value of the first column; and
  
  an analysis module configured to project each frequency value of the matrix into a reference space to generate a set of projection values in the reference space, to identify a plurality of subsets of the reference space, at least some of the plurality of subsets including at least some of the projection values, to cluster, for each subset of the plurality of subsets, at least some documents of the set of documents that correspond to projection values that are members of that subset to generate clusters of one or more documents, and to generate a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The system of claim 14, wherein the analysis module configured to cluster, for the each subset of the plurality of subsets, at least some documents of the set of documents comprises the analysis module configured to:
    - determine a distance between at least two documents of the set of documents corresponding to at least two projection values in a first subset of the plurality of subsets;
      
      compare the distance to a threshold value; and
      
      cluster each of the at least two documents in two different clusters or one cluster based on the comparison.
  - 16. The system of claim 14, further comprising a visualization module configured to generate a graphical representation of the graph of nodes.
  - 17. The system of claim 14, wherein the analysis module is further configured to generate a link between at least two nodes of the graph of nodes, each node corresponding to different clusters, a first document of the set of documents being a member of the different clusters.
  - 18. The system of claim 17, further comprising a visualization module configured generating an edge between the at least two nodes.
  - 19. The system of claim 14, wherein the plurality of subsets of the reference space have a non-empty intersection.
  - 20. The system of claim 14, wherein the smoothing module configured to adjust, for each document, a frequency value comprises the smoothing module configured to generate a third column of the matrix, each cell of the third column containing the adjusted frequency value for the corresponding document and the second column of frequency values remains unchanged.
  - 21. The system of claim 14, wherein the analysis module configured to project each frequency value comprises the analysis module configured to project each frequency value, including each of the adjusted frequency values, into the reference space to generate the set of projection values in the reference space.
  - 22. The system of claim 21, wherein the second column remains unchanged.
  - 23. The system of claim 14, wherein the text segments are from at least one dictionary of text segments.
  - 24. The system of claim 14, wherein one or more of the text segments are words.
  - 25. The system of claim 14, wherein one or more of the text segments are n-grams.
  - 26. The system of claim 14, wherein each frequency value is a term frequency-inverse document frequency for the corresponding text segment and the corresponding document.

27. A computer readable medium comprising instructions, the instructions being executable by a processor to perform a method, the method comprising:
- receiving a matrix for a set of documents, each row of the matrix corresponding to each document of the set of documents and each column of the matrix corresponding to a text segment that may be in any of the set of documents, each cell of the matrix including a frequency value indicating a number of instances of a corresponding text segment in a corresponding document;
  
  receiving an indication of a relationship between two text segments, each of the two text segments associated with a first column and a second column, respectively, of the matrix;
  
  adjusting, for each document, a frequency value of the second column based on the frequency value of the first column;
  
  projecting each frequency value of the matrix into a reference space to generate a set of projection values in the reference space;
  
  identifying a plurality of subsets of the reference space, at least some of the plurality of subsets including at least some of the projection values;
  
  clustering, for each subset of the plurality of subsets, at least some documents of the set of documents that correspond to projection values that are members of that subset to generate clusters of one or more documents; and
  
  generating a graph of nodes, each of the nodes identifying one or more of the documents corresponding to each cluster.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SymphonyAI Sensa LLC (Fortive Corp.)
Original Assignee
Ayasdi, Inc. (Fortive Corp.)
Inventors
Carlsson, Gunnar, Kloke, Jennifer, Sexton, Harlan, Bak, Anthony, Mann, Benjamin

Granted Patent

US 10,114,823 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/737
CPC Class Codes

G06F 16/3334 Selection or weighting of t...

G06F 16/93 Document management systems

SYSTEMS AND METHODS FOR METRIC DATA SMOOTHING

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

48 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR METRIC DATA SMOOTHING

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

48 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links