Device, System and Method for Identifying Sections of Documents
First Claim
Patent Images
1. A method for determining document structure, comprising using computerized storage, processing and programming embodied on a non-transitory computerized storage medium for:
- identifying potential section markers of an input document;
identifying similar types of said potential section markers;
distinguishing between references and section markers and weeding out references, thereby identifying real section markers among said potential section markers;
said computerized programming automatically identifying legitimate and illegitimate numbering sequences, lettering sequences, or combined numbering and lettering sequences of said real section markers without operator intervention;
said computerized programming automatically identifying structural inclusion relations among said real section markers which are identified to adhere to said legitimate numbering sequences, lettering sequences, or combined numbering and lettering sequences without operator intervention; and
said computerized programming automatically generating a structured table of contents from said real, legitimately-sequenced section markers without operator intervention.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for identifying sections of contracts. This method works well with documents that originated from scanned images, i.e., documents that could possibly include noise and misleading cues.
-
Citations
18 Claims
-
1. A method for determining document structure, comprising using computerized storage, processing and programming embodied on a non-transitory computerized storage medium for:
-
identifying potential section markers of an input document; identifying similar types of said potential section markers; distinguishing between references and section markers and weeding out references, thereby identifying real section markers among said potential section markers; said computerized programming automatically identifying legitimate and illegitimate numbering sequences, lettering sequences, or combined numbering and lettering sequences of said real section markers without operator intervention; said computerized programming automatically identifying structural inclusion relations among said real section markers which are identified to adhere to said legitimate numbering sequences, lettering sequences, or combined numbering and lettering sequences without operator intervention; and said computerized programming automatically generating a structured table of contents from said real, legitimately-sequenced section markers without operator intervention. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A structured table of contents product-by-process in the form of computerized text produced by a computerized device and represented in a non-transitory computerized storage medium, produced by a method for determining document structure, said method comprising:
-
identifying potential section markers of an input document; identifying similar types of said potential section markers; distinguishing between references and section markers and weeding out references, thereby identifying real section markers among said potential section markers; said computerized device identifying legitimate and illegitimate numbering sequences, lettering sequences, or combined numbering and lettering sequences of said real section markers without operator intervention; said computerized device identifying structural inclusion relations among said real section markers which are identified to adhere to said legitimate numbering sequences, lettering sequences, or combined numbering and lettering sequences without operator intervention; and said computerized device generating a structured table of contents from said real, legitimately-sequenced section markers without operator intervention. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An apparatus for determining document structure, comprising computerized storage, processing and programming embodied on a non-transitory computerized storage medium for:
-
identifying potential section markers of an input document; identifying similar types of said potential section markers; distinguishing between references and section markers and weeding out references, thereby identifying real section markers among said potential section markers; said computerized programming automatically identifying legitimate and illegitimate numbering sequences, lettering sequences, or combined numbering and lettering sequences of said real section markers without operator intervention; said computerized programming automatically identifying structural inclusion relations among said real section markers which are identified to adhere to said legitimate numbering sequences, lettering sequences, or combined numbering and lettering sequences without operator intervention; and said computerized programming automatically generating a structured table of contents from said real, legitimately-sequenced section markers without operator intervention. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification