LIST RECOGNIZING METHOD AND LIST RECOGNIZING SYSTEM
First Claim
1. A list recognizing method, comprising:
- parsing and analyzing metadata information within an original fixed-layout document, and extracting basic elements within a page;
segmenting the basic elements, extracting segmented text lines within the page to obtain fragments;
building an undirected graph with respect to the fragments;
detecting indent features of a bullet according to features of the basic elements;
training a learning model according to the indent features, local features of the fragments and neighborhood relation features among the fragments, obtaining model parameters, and establishing a list recognizing model; and
invoking the list recognizing model to perform list recognizing on the required document, so as to get recognition results.
1 Assignment
0 Petitions
Accused Products
Abstract
A list recognizing method and system, which comprises: parsing and analyzing metadata information within an original fixed-layout document, and extracting basic elements within a page; segmenting the basic elements, extracting segmented text lines within the page to obtain fragments; building an undirected graph with respect to the fragments; detecting indent features of a bullet according to features of the basic elements; training a learning model according to the indent features, local features of the fragments and neighborhood relation features among the fragments, obtaining model parameters, and establishing a list recognizing model; and invoking the list recognizing model to perform list recognizing on the required document, so as to get recognition result. This machine learning method may recognize not only a list, but also the contextual relationship between the first line and its subsequent lines of a list, and realize analyzing and understanding a layout of the list of the fixed-layout document ultimately. The accuracy of list recognizing on a fixed-layout document can be improved even if the bullets of the first line of the list are various.
26 Citations
20 Claims
-
1. A list recognizing method, comprising:
-
parsing and analyzing metadata information within an original fixed-layout document, and extracting basic elements within a page; segmenting the basic elements, extracting segmented text lines within the page to obtain fragments; building an undirected graph with respect to the fragments; detecting indent features of a bullet according to features of the basic elements; training a learning model according to the indent features, local features of the fragments and neighborhood relation features among the fragments, obtaining model parameters, and establishing a list recognizing model; and invoking the list recognizing model to perform list recognizing on the required document, so as to get recognition results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A list recognizing system, comprising:
-
an extracting unit, configured to parse and analyze metadata information within an original fixed-layout document, and extract basic elements within a page; a segmenting unit, configured to segment the basic elements, extract segmented text lines within the page to obtain fragments; a building unit, configured to build an undirected graph with respect to the fragments; a detecting unit, configured to detect indent features of a bullet according to features of the basic elements; a modeling unit, configured to train a learning model according to the indent features, local features of the fragments and neighborhood relation features among the fragments, obtain model parameters, and establish a list recognizing model; and an invoking unit, configured to invoke the list recognizing model to perform list recognizing on the required document, so as to get recognition results. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification