Document segmentation based on visual gaps
First Claim
Patent Images
1. A method, performed by a computer system, the method comprising:
- generating, by a processor associated with the computer system, a visual model of a document that includes a geographic signal;
identifying, by a processor associated with the computer system, a hierarchical structure of the document based on the visual model;
segmenting, by a processor associated with the computer system, the document based on the hierarchical structure and the visual model of the document; and
associating, by a processor associated with the computer system, a portion of the document as corresponding to the geographic signal, when the portion includes the geographic signal and when the portion is within a lowest level of the hierarchical structure that includes the geographic signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A document may be segmented based on a visual model of the document. The visual model is determined according to an amount of visual white space or gaps that are in the document. In one implementation, the visual model is used to identify a hierarchical structure of the document, which may then be used to segment the document.
25 Citations
20 Claims
-
1. A method, performed by a computer system, the method comprising:
-
generating, by a processor associated with the computer system, a visual model of a document that includes a geographic signal; identifying, by a processor associated with the computer system, a hierarchical structure of the document based on the visual model; segmenting, by a processor associated with the computer system, the document based on the hierarchical structure and the visual model of the document; and associating, by a processor associated with the computer system, a portion of the document as corresponding to the geographic signal, when the portion includes the geographic signal and when the portion is within a lowest level of the hierarchical structure that includes the geographic signal. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable memory device containing programming instructions for execution by a processor, the computer-readable memory device comprising:
-
programming instructions to generate a visual model of a document that includes at least one geographic signal; programming instructions to identify a hierarchical structure of the document based on the visual model; and programming instructions for associating the at least one geographic signal with portions of the document based on the identified hierarchical structure of the document, the associating including associating text in the document as text corresponding to the at least one geographic signal when the text includes the geographic signal and when the text is within a lowest level of the hierarchical structure that includes the at least one geographic signal. - View Dependent Claims (9, 10, 11)
-
-
12. A device, comprising:
-
a processor; and a memory coupled to the processor and containing instructions that when executed by the processor cause the processor to; identify a document that includes geographic signals, segment the document into a plurality of sections that correspond to different ones of the identified geographic signals based on a visual layout of the document, and index text in the plurality of sections of the document as corresponding to the geographic signals, the indexing including associating text in the document as text corresponding to one of the geographic signals when the text includes the one of the geographic signals and when the text is within a lowest level of the hierarchical structure that includes the one of the geographic signals. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A method, performed by a computer system, the method comprising:
-
identifying, by a processor of the computer system, a document that includes information associated with a location; generating, by a processor of the computer system, a visual model of the document based on gaps between elements in the document; assigning, by a processor of the computer system, weights to the elements based on a size of the gaps; identifying, by a processor of the computer system, a hierarchical structure of the document, where levels of the hierarchical structure are based on the weights of the elements; identifying, by a processor of the computer system, text surrounding the information associated with the location; and associating by a processor of the computer system, a lowest one of the levels, that includes the text, with the information associated with the geographic location. - View Dependent Claims (18, 19, 20)
-
Specification