Number sequences detection systems and methods
First Claim
1. A numbered sequences detector comprising:
- a digital processor configured to perform a method operating on a document comprising an ordered sequence of text units, the method including the operations of;
(i) extracting one or more numbered item token patterns from the document, each numbered item token pattern including an incremental portion defining at least one index and a fixed portion, each numbered item token pattern matching at least one text unit of the document wherein said matching includes matching the incremental portion of the numbered item token pattern with an index value for the text unit of the document defined by one or more tokens of the text unit of the document, and(ii) identifying at least one numbered sequence in the document conforming with a matching numbered item token pattern of the extracted one or more numbered item token patterns, each identified numbered sequence comprising an ordered sub-sequence of text units of the document wherein;
the identifying includes matching each text unit of the ordered sub-sequence of text units of the document with the matching numbered item token pattern including matching the incremental portion of the matching numbered item token pattern with an index value of the text unit of the document defined by one or more tokens of the text unit of the document, andthe index values of the ordered sub-sequence of text units of the document that comprise the numbered sequence have an incremental relationship.
4 Assignments
0 Petitions
Accused Products
Abstract
Numbered sequences detection includes (i) extracting one or more numbered item token patterns from a document comprising an ordered sequence of text units, each numbered item token pattern including an incremental portion and a fixed portion that matches at least one text unit of the document and (ii) identifying at least one numbered sequence in the document conforming with a matching numbered item token pattern of the extracted one or more numbered item token patterns. The identified at least one numbered sequence comprises an ordered sub-sequence of text units of the document that match the matching numbered item token pattern. The detection may further comprise determining that a second type of numbered sequence nests in the document between consecutive text units belonging to a numbered sequence of a first type, and optimizing one or more numbered sequences of the second type based on information provided by the determining.
34 Citations
20 Claims
-
1. A numbered sequences detector comprising:
a digital processor configured to perform a method operating on a document comprising an ordered sequence of text units, the method including the operations of; (i) extracting one or more numbered item token patterns from the document, each numbered item token pattern including an incremental portion defining at least one index and a fixed portion, each numbered item token pattern matching at least one text unit of the document wherein said matching includes matching the incremental portion of the numbered item token pattern with an index value for the text unit of the document defined by one or more tokens of the text unit of the document, and (ii) identifying at least one numbered sequence in the document conforming with a matching numbered item token pattern of the extracted one or more numbered item token patterns, each identified numbered sequence comprising an ordered sub-sequence of text units of the document wherein; the identifying includes matching each text unit of the ordered sub-sequence of text units of the document with the matching numbered item token pattern including matching the incremental portion of the matching numbered item token pattern with an index value of the text unit of the document defined by one or more tokens of the text unit of the document, and the index values of the ordered sub-sequence of text units of the document that comprise the numbered sequence have an incremental relationship. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
13. A method operating on a document comprising an ordered sequence of text units, the method comprising:
-
extracting one or more numbered item token patterns from the document, each numbered item token pattern including an incremental portion and a fixed portion, each numbered item token pattern matching text units of an ordered sub-sequence of text units of the document wherein; for each text unit of the ordered sub-sequence of text units said matching includes matching the incremental portion of the numbered item token pattern with an index value defined by one or more tokens of the text unit, and wherein the index values of the text units of the ordered sub-sequence of text units conform with an expected incremental relationship for the incremental portion of the numbered item token pattern defined by (1) an incremental relationship encoding table or (2) an alphabetical or numerical sequence; and identifying at least one numbered sequence in the document conforming with a matching numbered item token pattern of the extracted one or more numbered item token patterns, each identified numbered sequence comprising an ordered sub-sequence of text units of the document that match the matching numbered item token pattern with portions of the text units matching the incremental portion of the matching numbered item token pattern defining index values having the expected incremental relationship; wherein the extracting and the identifying are performed by a computer. - View Dependent Claims (14, 15, 16)
-
-
17. A non-transitory storage medium storing instructions executable by a digital processor to perform a method including:
-
identifying numbered sequences of at least two different sequence types in a document comprising an ordered sequence of text units wherein each numbered sequence comprises an ordered sub-sequence of text units of the document in which the text units of the numbered sequence have an incremental portion indicative of the sequence type, from amongst the identified numbered sequences of at least two different types, identifying a numbered sequence of a first type having consecutive text units bounding a nested numbered sequence of a second type different from the first type, and adjusting at least one numbered sequence of the second type based on the constraint that it must be bounded by consecutive text units of a numbered sequence of the first type. - View Dependent Claims (18, 19, 20)
-
Specification