Reconstruction of lists in a document
First Claim
Patent Images
1. A method for analyzing a document comprising a plurality of primitive elements, the method comprising:
- identifying a first set of hierarchically-organized lists in a first column and a second set of hierarchically-organized lists in a second column subsequent to the first column in the document, each of the first and second sets of hierarchically-organized lists comprising one or more list items identified by a list label;
determining that a first list in the first set of hierarchically-organized lists continues as a second list in the second set of hierarchically-organized lists based on an analysis of the list labels of a last list item in the first list and a first list item in the second list;
storing the first list and the second list as a single list structure associated with the document.
0 Assignments
0 Petitions
Accused Products
Abstract
Some embodiments provide a method for analyzing a document that includes several primitive elements. The method identifies that a set of primitive elements include an implicit list in the document based on location and appearance of the set of primitive elements. The method defines the identified implicit list as an explicit list. The method stores the explicit list as a structure associated with the document.
71 Citations
25 Claims
-
1. A method for analyzing a document comprising a plurality of primitive elements, the method comprising:
-
identifying a first set of hierarchically-organized lists in a first column and a second set of hierarchically-organized lists in a second column subsequent to the first column in the document, each of the first and second sets of hierarchically-organized lists comprising one or more list items identified by a list label; determining that a first list in the first set of hierarchically-organized lists continues as a second list in the second set of hierarchically-organized lists based on an analysis of the list labels of a last list item in the first list and a first list item in the second list; storing the first list and the second list as a single list structure associated with the document. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A machine readable medium storing a program which when executed by at least one processing unit analyzes a document comprising a plurality of primitive elements, the program comprising sets of instructions for:
-
identifying a first set of hierarchically-organized lists in a first column and a second set of hierarchically-organized lists in a second column subsequent to the first column in the document, each of the first and second sets of hierarchically-organized lists comprising one or more list items identified by a list label; determining that a first list in the first set of hierarchically-organized lists continues as a second list in the second set of hierarchically-organized lists based on an analysis of the list labels of a last list item in the first list and a first list item in the second list; and storing the first list and the second list as a single list structure associated with the document. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory machine readable medium storing a program which when executed by at least one processing unit analyzes a document comprising a plurality of primitive elements, the program comprising sets of instructions for:
-
identifying different sets of lists for different columns of the document, each column ordered within the document based on a reading order; identifying a first list in a first column of the document that has an open end state; identifying a second list, in a second column of the document subsequent to the first column in the reading order, that has an open start state; determining that the first list in the first column continues as the second list in the second column of the document; and storing the first list and the second list as a single list structure associated with the document. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for analyzing a document comprising a plurality of primitive elements, the method comprising:
-
identifying different sets of lists for different columns of the document, each column ordered within the document based on a reading order; identifying a first list in a first column of the document that has an open end state; identifying a second list, in a second column of the document subsequent to the first column in the reading order, that has an open start state; determining that the first list in the first column continues as the second list in the second column of the document; and storing the first list and the second list as a single list structure associated with the document. - View Dependent Claims (23, 24, 25)
-
Specification