Method and system for constructing a document redundancy graph
First Claim
1. A method for constructing a document redundancy graph, said method comprising:
- representing each paragraph associated with a document set as a node among a plurality of nodes, wherein each node among said plurality of nodes with respect to said redundancy graph represents a unique cluster of information related to said each paragraph;
providing said each paragraph with a unique paragraph identifier;
constructing a hash table of all paragraph identifiers comprising identifiers of all paragraphs reachable from said each paragraph;
merging said plurality of nodes associated with redundant information by configuring said hash table with respect to a pair of paragraph identifiers in association with a probability value, wherein said probability value sorts a plurality of information matches in an order of decreasing certainty of common content, wherein a pair of said paragraph identifiers associated with an increased certainty of common content are selected to merge; and
combining said plurality of nodes unique to a single document by expressing a pair of nodes with overlapping common content as a combined node, wherein said combined node comprises an empty intersection of said pair of nodes and comparing each paragraph identifier among said pair of paragraph identifiers to a probability value associated with an entry in said hash table in an order wherein said hash table eliminates inconsistency associated with said plurality of information matches.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method for constructing a document redundancy graph with respect to a document set. The redundancy graph can be constructed with a node for each paragraph associated with the document set such that each node in the redundancy graph represents a unique cluster of information. The nodes can be linked in an order with respect to the information provided in the document set and bundles of redundant information from the document set can be mapped to individual nodes. A data structure (e.g., a hash table) of a paragraph identifier associated with a probability value can be constructed for eliminating inconsistencies with respect to node redundancy. Additionally, a sequence of unique nodes can also be integrated into the graph construction process. The nodes can be connected to the paragraphs associated with the document set via a hyperlink and/or via a label with respect to each node.
-
Citations
18 Claims
-
1. A method for constructing a document redundancy graph, said method comprising:
-
representing each paragraph associated with a document set as a node among a plurality of nodes, wherein each node among said plurality of nodes with respect to said redundancy graph represents a unique cluster of information related to said each paragraph; providing said each paragraph with a unique paragraph identifier; constructing a hash table of all paragraph identifiers comprising identifiers of all paragraphs reachable from said each paragraph; merging said plurality of nodes associated with redundant information by configuring said hash table with respect to a pair of paragraph identifiers in association with a probability value, wherein said probability value sorts a plurality of information matches in an order of decreasing certainty of common content, wherein a pair of said paragraph identifiers associated with an increased certainty of common content are selected to merge; and combining said plurality of nodes unique to a single document by expressing a pair of nodes with overlapping common content as a combined node, wherein said combined node comprises an empty intersection of said pair of nodes and comparing each paragraph identifier among said pair of paragraph identifiers to a probability value associated with an entry in said hash table in an order wherein said hash table eliminates inconsistency associated with said plurality of information matches. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for constructing a document redundancy graph, said system comprising:
-
a processor; a data bus coupled to said processor; and a computer-usable mass storage device embodying computer code, said computer-usable medium being coupled to said data bus, said computer program code comprising instructions executable by said processor and configured for; representing each paragraph associated with a document set as a node among a plurality of nodes, wherein each node among said plurality of nodes with respect to said redundancy graph represents a unique cluster of information related to said each paragraph; providing said each paragraph with a unique paragraph identifier; constructing a hash table of all paragraph identifiers comprising identifiers of all paragraphs reachable from said each paragraph; merging said plurality of nodes associated with redundant information by configuring said hash table with respect to a pair of paragraph identifiers in association with a probability value, wherein said probability value sorts a plurality of information matches in an order of decreasing certainty of common content, wherein a pair of said paragraph identifiers associated with an increased certainty of common content are selected to merge; and combining said plurality of nodes unique to a single document by expressing a pair of nodes with overlapping common content as a combined node, wherein said combined node comprises an empty intersection of said pair of nodes and comparing each paragraph identifier among said pair of paragraph identifiers to a probability value associated with an entry in said hash table in an order wherein said hash table eliminates inconsistency associated with said plurality of information matches. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A computer-usable mass storage for constructing a document redundancy graph, said computer-usable mass storage storing computer program code, said computer program code comprising program instructions executable by a processor, said program instructions comprising:
-
program instructions to represent each paragraph associated with a document set as a node among a plurality of nodes, wherein each node among said plurality of nodes with respect to said redundancy graph represents a unique cluster of information related to said each paragraph; program instructions to provide said each paragraph with a unique paragraph identifier; program instructions to construct a hash table of all paragraph identifiers comprising identifiers of all paragraphs reachable from said each paragraph; program instructions to merge said plurality of nodes associated with redundant information by configuring said hash table with respect to a pair of paragraph identifiers in association with a probability value, wherein said probability value sorts a plurality of information matches in an order of decreasing certainty of common content, wherein a pair of said paragraph identifiers associated with an increased certainty of common content are selected to merge; and program instructions to combine said plurality of nodes unique to a single document by expressing a pair of nodes with overlapping common content as a combined node, wherein said combined node comprises an empty intersection of said pair of nodes and comparing each paragraph identifier among said pair of paragraph identifiers to a probability value associated with an entry in said hash table in an order wherein said hash table eliminates inconsistency associated with said plurality of information matches.
-
Specification