Using an ID Domain to Improve Searching
First Claim
1. A method comprising:
- segmenting text in an image of a document into elements;
assigning an identifier to each element based on a comparison of elements;
replacing each element in the text with the corresponding identifier; and
creating an index of identifiers in the document.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods which use an ID domain to improve searching are described. An embodiment describes an index phase in which an image of a document is converted into the ID domain. This is achieved by dividing the text in the image into elements and mapping each element to an identifier. Similar elements are mapped to the same identifier. Each element in the text is then replaced by the appropriate identifier to create a version of the document in the ID domain. This version may be indexed and searched. Another embodiment describes a query phase in which a query is converted into the ID domain and then used to search an index of identifiers which has been created from collections of documents which have been converted into the ID domain. The conversion of the query may use mappings which were created during the index phase or alternatively may use pre-existing mappings.
23 Citations
20 Claims
-
1. A method comprising:
-
segmenting text in an image of a document into elements; assigning an identifier to each element based on a comparison of elements; replacing each element in the text with the corresponding identifier; and creating an index of identifiers in the document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
a processor; an input for receiving a query; and a memory arranged to store executable instructions arranged to cause the processor to; convert the query into an image; perform a comparison between elements in said image to a cluster table defining mappings between image elements and identifiers; create a query defined in terms of identifiers based on the comparison; and search an index of identifiers created from at least one document image using said query defined in terms of identifiers. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. One or more tangible device-readable media with device-executable instructions for performing steps comprising:
-
receiving an image of a document; segmenting text in said image into a plurality of elements; grouping said elements based on similarity of elements; allocating an identifier to each group of elements; replacing each element in a group with the identifier of the group; ordering the identifiers according to an order of the text in said image; and creating an index of identifiers. - View Dependent Claims (20)
-
Specification