Vision-based document segmentation
First Claim
1. One or more computer readable media having stored thereon a plurality of instructions that, when executed by one or more processors of a device, causes the one or more processors to:
- receive a query including one or more search terms;
rank a plurality of blocks based on how well the plurality of blocks matches the one or more search terms, wherein each of the plurality of blocks is part of one document of a plurality of documents, and wherein each of the plurality of blocks is obtained by visual segmentation of one of the plurality of documents;
ranking the blocks according to the location of the one or more search terms in the block and how frequently the one or more search terms occur in the block;
for each of the plurality of documents, rank the document based at least in part on the rankings of the blocks that are part of the document; and
return, in response to the query, an indication of the rankings of one or more of the plurality of documents.
1 Assignment
0 Petitions
Accused Products
Abstract
Vision-based document segmentation identifies one or more portions of semantic content of a document. The one or more portions are identified by identifying a plurality of visual blocks in the document, and detecting one or more separators between the visual blocks of the plurality of visual blocks. A content structure for the document is constructed based at least in part on the plurality of visual blocks and the one or more separators, and the content structure identifies the one or more portions of semantic content of the document. The content structure obtained using the vision-based document segmentation can optionally be used during document retrieval.
45 Citations
19 Claims
-
1. One or more computer readable media having stored thereon a plurality of instructions that, when executed by one or more processors of a device, causes the one or more processors to:
-
receive a query including one or more search terms; rank a plurality of blocks based on how well the plurality of blocks matches the one or more search terms, wherein each of the plurality of blocks is part of one document of a plurality of documents, and wherein each of the plurality of blocks is obtained by visual segmentation of one of the plurality of documents; ranking the blocks according to the location of the one or more search terms in the block and how frequently the one or more search terms occur in the block; for each of the plurality of documents, rank the document based at least in part on the rankings of the blocks that are part of the document; and return, in response to the query, an indication of the rankings of one or more of the plurality of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. One or more computer readable media having stored thereon a plurality of instructions that, when executed by one or more processors of a device, causes the one or more processors to:
-
generate first rankings for a plurality of documents based on how well the plurality of documents match search criteria; generate second rankings based on how well a plurality of blocks matches the one or more search terms, wherein each block is part of one of the plurality of documents, and wherein each of the plurality of blocks is obtained by visually segmenting each of the plurality of documents into blocks; rank the blocks according to the location of the one or more search terms in the block and how frequently the one or more search terms occur in the block; and generate final rankings for the plurality of documents based at least in part on the second rankings. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A method of searching a plurality of documents, the method comprising:
-
receiving a request to search the plurality of documents stored on a device, wherein the request includes query criteria; identifying a subset of the plurality of documents based on the query criteria;
identifying, for each of the subset of documents, a plurality of blocks by visually segmenting the document;expanding, based on the content of the plurality of blocks, the query criteria; and identifying a second subset of the plurality of documents based on the expanded query criteria; and ranking the blocks according to the location of query criteria in the block and how frequently the query criteria occur in the block. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification