Document retrieving method and apparatus
First Claim
1. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising:
- a first acquisition step of acquiring text data by executing character-recognition processing for image data of a document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing;
a second acquisition step of acquiring layout feature data based on the image data of the document;
a storing step of storing, in storage means, text feature data and layout feature data respectively acquired from a registered document in said first and second acquisition steps, in association with the registered document;
a determining step of determining, for a search document from which text feature data and layout feature data have been acquired in said first and second acquisition steps, whether the text feature data acquired from the search document or the layout feature data acquired from the search document is used for a narrowing-down process, based on the text feature data acquired from the search document in said first acquisition step;
a first narrow-down step of narrowing down a plurality of registered documents stored in the storage means based on the text feature data acquired from the search document in said first acquisition step if said determining step determined that the text feature data acquired from the search document is used;
a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired from the search document in said second acquisition step if said determining step determined that the layout feature data acquired from the search document is used; and
a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired from the search document in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step.
1 Assignment
0 Petitions
Accused Products
Abstract
In the proposed document retrieving apparatus, text feature data that bases upon text data included in a document and image feature data that bases upon a document image are stored in a memory. Image data of a search document is subjected to character recognition processing, text feature data is acquired based on the obtained text data, and image feature data (layout data) is acquired based on the image data of the search document. Using the text feature data and image feature data acquired with respect to the search document, a memory is searched, and a document corresponding to the search document is retrieved from plural documents.
42 Citations
20 Claims
-
1. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising:
-
a first acquisition step of acquiring text data by executing character-recognition processing for image data of a document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the document; a storing step of storing, in storage means, text feature data and layout feature data respectively acquired from a registered document in said first and second acquisition steps, in association with the registered document; a determining step of determining, for a search document from which text feature data and layout feature data have been acquired in said first and second acquisition steps, whether the text feature data acquired from the search document or the layout feature data acquired from the search document is used for a narrowing-down process, based on the text feature data acquired from the search document in said first acquisition step; a first narrow-down step of narrowing down a plurality of registered documents stored in the storage means based on the text feature data acquired from the search document in said first acquisition step if said determining step determined that the text feature data acquired from the search document is used; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired from the search document in said second acquisition step if said determining step determined that the layout feature data acquired from the search document is used; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired from the search document in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A document retrieving apparatus for retrieving a document from a storage comprising:
-
a first acquisition unit configured to acquire text data by executing character-recognition processing for image data of a document and to acquire text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit configured to acquire layout feature data based on the image data of the document; a storage unit configured to store the text feature data and the layout feature data respectively acquired from a registered document by said first and second acquisition units, in association with the registered document; a determining unit for determining, for a search document from which text feature data and layout feature data have been acquired by said first and second acquisition units, whether the text feature data acquired from the search document or the layout feature data acquired from the search document is used for a narrowing-down process, based on the text feature data acquired from the search document by said first acquisition unit; a first narrow-down unit for narrowing down a plurality of registered documents stored in the storage means based on the text feature data acquired from the search document by said first acquisition unit if said determining unit determined that the text feature data acquired from the search document is used; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired from the search document by said second acquisition unit if said determining unit determined that the layout feature data acquired from the search document is used; and a retrieving unit configured to retrieve a document, based on both the text feature data and the layout feature data acquired from the search document by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising:
-
a first acquisition step of acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the search document; a determining step of determining, based on the text feature data acquired from the search document in said first acquisition step, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down step of narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired in said first acquisition step if said determining step determined that the text feature data is used for the narrowing-down process; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired in said second acquisition step if said determining step determined that the layout feature data is used for the narrowing-down process; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step.
-
-
18. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising:
-
a first acquisition step of acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the search document; a determining step of determining, based on the layout feature data acquired from the search document in said second acquisition step, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down step of narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired in said first acquisition step if said determining step determined that the text feature data is used for the narrowing-down process; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired in said second acquisition step if said determining step determined that the layout feature data is used for the narrowing-down process; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step.
-
-
19. A document retrieving apparatus for retrieving a document from a storage comprising:
-
a first acquisition unit for acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit for acquiring layout feature data based on the image data of the search document; a determining unit for determining, based on the text feature data acquired from the search document by said second acquisition unit, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down unit for narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired by said first acquisition unit if said determining unit determined that the text feature data is used for the narrowing-down process; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired by said second acquisition unit if said determining unit determined that the layout feature data is used for the narrowing-down process; and a retrieving unit for retrieving a document, based on both the text feature data and the layout feature data acquired by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit.
-
-
20. A document retrieving apparatus for retrieving a document from a storage comprising:
-
a first acquisition unit for acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit for acquiring layout feature data based on the image data of the search document; a determining unit for determining, based on the layout feature data acquired from the search document by said second acquisition unit, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down unit for narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired by said first acquisition unit if said determining unit determined that the text feature data is used for the narrowing-down process; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired by said second acquisition unit if said determining unit determined that the layout feature data is used for the narrowing-down process; and a retrieving unit for retrieving a document, based on both the text feature data and the layout feature data acquired by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit.
-
Specification