NUMBER SEQUENCES DETECTION SYSTEMS AND METHODS
First Claim
1. A numbered sequences detector comprising:
- a digital processor configured to perform a method including the operations of (i) extracting one or more numbered item token patterns from a document comprising an ordered sequence of text units, each numbered item token pattern including an incremental portion and a fixed portion that matches at least one text unit of the document and (ii) identifying at least one numbered sequence in the document conforming with a matching numbered item token pattern of the extracted one or more numbered item token patterns, the identified at least one numbered sequence comprising an ordered sub-sequence of text units of the document that match the matching numbered item token pattern.
5 Assignments
0 Petitions
Accused Products
Abstract
Numbered sequences detection includes (i) extracting one or more numbered item token patterns from a document comprising an ordered sequence of text units, each numbered item token pattern including an incremental portion and a fixed portion that matches at least one text unit of the document and (ii) identifying at least one numbered sequence in the document conforming with a matching numbered item token pattern of the extracted one or more numbered item token patterns. The identified at least one numbered sequence comprises an ordered sub-sequence of text units of the document that match the matching numbered item token pattern. The detection may further comprise determining that a second type of numbered sequence nests in the document between consecutive text units belonging to a numbered sequence of a first type, and optimizing one or more numbered sequences of the second type based on information provided by the determining.
-
Citations
20 Claims
-
1. A numbered sequences detector comprising:
a digital processor configured to perform a method including the operations of (i) extracting one or more numbered item token patterns from a document comprising an ordered sequence of text units, each numbered item token pattern including an incremental portion and a fixed portion that matches at least one text unit of the document and (ii) identifying at least one numbered sequence in the document conforming with a matching numbered item token pattern of the extracted one or more numbered item token patterns, the identified at least one numbered sequence comprising an ordered sub-sequence of text units of the document that match the matching numbered item token pattern. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
13. A method comprising:
-
extracting one or more numbered item token patterns from a document comprising an ordered sequence of text units, each numbered item token pattern including an incremental portion and a fixed portion that matches at least one text unit of the document; identifying at least one numbered sequence in the document conforming with a matching numbered item token pattern of the extracted one or more numbered item token patterns, the identified at least one numbered sequence comprising an ordered sub-sequence of text units of the document that match the matching numbered item token pattern; and generating a structured document based on the document comprising an ordered sequence of text units and structured in accordance with the identified at least one numbered sequence. - View Dependent Claims (14, 15, 16)
-
- 17. A storage medium storing instructions executable by a digital processor to perform a method including identifying numbered sequences of at least two different sequence types in a document comprising an ordered sequence of text units wherein each numbered sequence comprises an ordered sub-sequence of text units of the document in which the text units of the numbered sequence have an incremental portion indicative of the sequence type, identifying a numbered sequence of a first type having consecutive text units bounding a nested numbered sequence of a second type different from the first type, and adjusting at least one numbered sequence of the second type based on the constraint that it must be bounded by consecutive text units of a numbered sequence of the first type.
Specification