METHOD AND SYSTEM FOR CONSTRUCTING A DOCUMENT REDUNDANCY GRAPH

US 20110029952A1
Filed: 07/31/2009
Published: 02/03/2011
Est. Priority Date: 07/31/2009
Status: Active Grant

First Claim

Patent Images

1. A method for constructing a document redundancy graph, said method comprising:

representing at least one paragraph associated with a document set as a node among a plurality of nodes, wherein each node among said plurality of nodes with respect to said redundancy graph represents a unique cluster of information;

merging said plurality of nodes associated with redundant information by configuring a data structure with respect to a pair of information identifiers in association with a probability value, wherein said probability value sorts a plurality of information matches in an order of decreasing certainty; and

combining said plurality of nodes unique to a single document by comparing each information identifier among said pair of information identifiers to an entry in said data structure in an order wherein said data structure eliminates inconsistency associated with said plurality of information matches.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for constructing a document redundancy graph with respect to a document set. The redundancy graph can be constructed with a node for each paragraph associated with the document set such that each node in the redundancy graph represents a unique cluster of information. The nodes can be linked in an order with respect to the information provided in the document set and bundles of redundant information from the document set can be mapped to individual nodes. A data structure (e.g., a hash table) of a paragraph identifier associated with a probability value can be constructed for eliminating inconsistencies with respect to node redundancy. Additionally, a sequence of unique nodes can also be integrated into the graph construction process. The nodes can be connected to the paragraphs associated with the document set via a hyperlink and/or via a label with respect to each node.

Citations

20 Claims

1. A method for constructing a document redundancy graph, said method comprising:
- representing at least one paragraph associated with a document set as a node among a plurality of nodes, wherein each node among said plurality of nodes with respect to said redundancy graph represents a unique cluster of information;
  
  merging said plurality of nodes associated with redundant information by configuring a data structure with respect to a pair of information identifiers in association with a probability value, wherein said probability value sorts a plurality of information matches in an order of decreasing certainty; and
  
  combining said plurality of nodes unique to a single document by comparing each information identifier among said pair of information identifiers to an entry in said data structure in an order wherein said data structure eliminates inconsistency associated with said plurality of information matches.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein at least one information identifier among said pair of information identifiers identifies a paragraph.
  - 3. The method of claim 1 further comprising configuring at least one information identifier among said pair of information identifiers to include a list of identifiers associated with at least one information element.
  - 4. The method of claim 1 wherein merging said plurality of nodes associated with said redundant information further comprises:
    - combining said plurality of nodes into a single node if an intersection of said document set reachable from each node is empty.
  - 5. The method of claim 1 wherein merging said plurality of nodes associated with said redundant information further comprises:
    - updating said data structure that describes information combinations after combining a pair of nodes.
  - 6. The method of claim 1 wherein combining said plurality of nodes unique to said single document further comprises:
    - setting a flag to indicate said node is a combined node if said data structure comprises said node.
  - 7. The method of claim 1 wherein combining said plurality of nodes unique to said single document further comprises:
    - initiating a chain node if said node follows said combined node by checking said flag in order to thereafter clear said flag.
  - 8. The method of claim 1 wherein combining said plurality of nodes unique to said single document further comprises:
    - adding said node to said chain node if said paragraph does not follow said combined node.
  - 9. The method of claim 1 further comprising adding an edge to said redundant graph for every transition from said chain node to said combined node and vice versa.
  - 10. The method of claim 1 further comprising linking said plurality of nodes with respect to said at least one paragraph via a hyperlink.
  - 11. The method of claim 1 further comprising linking said plurality of nodes with respect to said at least one paragraph via a label.
  - 12. The method of claim 11 wherein said label comprises at least one of the following types of data:
    - a cryptic paragraph identifier;
      
      a summary associated with said paragraph;
      
      ora paragraph content.

13. A method for navigating information in a document set, said method comprising:
- constructing a document redundancy graph for said document set wherein matching information elements across documents associated with said document are combined into single nodes;
  
  presenting said document redundancy graph to a user; and
  
  permitting said user to access information regarding information elements associated with at least one node of said document redundancy graph.

14. A system for constructing a document redundancy graph, said system comprising:
- a processor;
  
  a data bus coupled to said processor; and
  
  a computer-usable medium embodying computer code, said computer-usable medium being coupled to said data bus, said computer program code comprising instructions executable by said processor and configured for;
  
  representing at least one paragraph associated with a document set as a node among a plurality of nodes, wherein each node among said plurality of nodes with respect to said redundancy graph represents a unique cluster of information;
  
  merging said plurality of nodes associated with redundant information by configuring a data structure with respect to a pair of information identifiers in association with a probability value, wherein said probability value sorts a plurality of information matches in an order of decreasing certainty; and
  
  combining said plurality of nodes unique to a single document by comparing each information identifier among said pair of information identifiers to an entry in said data structure in an order wherein said data structure eliminates inconsistency associated with said plurality of information matches.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The system of claim 14 wherein at least one information identifier among said pair of information identifiers identifies a paragraph.
  - 16. The system of claim 14 wherein said instructions are further configured for modifying at least one information identifier among said pair of information identifiers to include a list of identifiers associated with at least one information element.
  - 17. The system of claim 14 wherein said instructions are further configured for adding an edge to said redundant graph for every transition from said chain node to said combined node and vice versa.
  - 18. The system of claim 14 wherein said instructions are further configured for linking said plurality of nodes with respect to said at least one paragraph via a hyperlink.
  - 19. The system of claim 14 wherein said instructions are further configured for linking said plurality of nodes with respect to said at least one paragraph via a label.
  - 20. The system of claim 19 wherein said label comprises at least one of the following types of data:
    - a cryptic paragraph identifier;
      
      a summary associated with said paragraph;
      
      ora paragraph content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Harrington, Steven J.

Granted Patent

US 8,914,720 B2
Time in Patent Office

Days
Field of Search
US Class Current

717/123
CPC Class Codes

G06F 16/345   Summarisation for human users

G06F 40/131   Fragmentation of text files...

G06F 40/194   Calculation of difference b...

METHOD AND SYSTEM FOR CONSTRUCTING A DOCUMENT REDUNDANCY GRAPH

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR CONSTRUCTING A DOCUMENT REDUNDANCY GRAPH

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links