Efficiently and systematically searching stock, image, and other non-word-based documents
First Claim
1. A method for searching stock documents containing non-word-based data, comprising:
- (a) collecting a group of stock documents to form a collection of stock documents;
(b) dividing each document in said group of collected documents into a series of elements;
(c) defining a plurality of non-word-based token patterns;
(d) tokenizing said documents by matching said series of elements against said plurality of defined non-word-based token patterns to generate a collection of tokens for each of said documents, and providing a name for each of said tokens;
(e) combining the collections of tokens for said documents into a master collection of tokens;
(f) searching for stock documents in said collection of documents that have the same token names as a query or a combination of queries, by searching said query or queries in said master collection of tokens, to provide a plurality of matching documents with respective scores; and
(g) displaying matching documents in the order of their matching scores;
whereby said method will be able to search stock documents efficiently and systematically.
0 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of a non-word-based information retrieval system includes searching stock or image documents in a huge data source. A non-word-based document is first divided into a series of elements or an array of cells. Each element or cell is matched against a series of predefined token patterns, so that a match will generate a token having a name. The collection of the generated named tokens is a word-based representation of the non-word-based document. After tokens from all documents are collected in a master collection of tokens, the non-word-based documents can be efficiently and systematically searched in a manner analogous to a document search in a word-based search system
-
Citations
22 Claims
-
1. A method for searching stock documents containing non-word-based data, comprising:
-
(a) collecting a group of stock documents to form a collection of stock documents; (b) dividing each document in said group of collected documents into a series of elements; (c) defining a plurality of non-word-based token patterns; (d) tokenizing said documents by matching said series of elements against said plurality of defined non-word-based token patterns to generate a collection of tokens for each of said documents, and providing a name for each of said tokens; (e) combining the collections of tokens for said documents into a master collection of tokens; (f) searching for stock documents in said collection of documents that have the same token names as a query or a combination of queries, by searching said query or queries in said master collection of tokens, to provide a plurality of matching documents with respective scores; and (g) displaying matching documents in the order of their matching scores; whereby said method will be able to search stock documents efficiently and systematically. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for searching image documents containing non-word-based data, comprising:
-
(a) collecting a group of image documents to form a collection of image documents; (b) dividing each document in said group of collected documents into an array of cells; (c) defining a plurality of non-word-based token patterns; (d) tokenizing said documents by matching said array of cells against said plurality of defined non-word-based token patterns to generate a collection of tokens for each of said documents, and providing a name for each of said tokens; (e) combining the collections of tokens for said documents into a master collection of tokens; (f) providing a query image and dividing said query into an array of cells; (g) tokenizing said query image by matching said array of cells of said query image against said plurality of defined non-word-based token patterns to generate a collection of tokens of said query image, and providing a name for each of said tokens; (h) searching for image documents in said collection of documents that have the same tokens with the same position arrangement as said tokens of said query image by searching said query token names in said master collection of tokens, to provide a plurality of matching documents with respective scores; and (i) displaying matching documents in the order of their matching scores; whereby said method will be able to search image documents efficiently and systematically. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A method for searching non-word-based documents, comprising:
-
(a) collecting a group of non-word-based documents to form a collection of documents; (b) defining a plurality of non-word-based token patterns; (c) tokenizing said collected non-word-based documents by matching their elements against said plurality of defined non-word-based token patterns to generate a master collection of tokens, and providing a name for each token; (d) searching for non-word-based documents in said collection of documents that have the same tokens as a query token or a combination of query tokens by searching said query token or tokens in said generated master collection of tokens, to provide a plurality of matching documents with respective scores; and (e) displaying matching documents in the order of their matching scores; whereby said method will be able to search non-word-based documents efficiently and systematically. - View Dependent Claims (12, 13, 14)
-
-
15. A method for tokenizing a non-word-based document into a collection of tokens, comprising:
-
(a) dividing said non-word-based document into a plurality of elements; (b) defining a plurality of non-word-based token patterns; (c) matching each of said plurality of elements against each of said plurality of defined non-word-based token patterns so as to generate a token for each match between an element and a defined non-word-based token pattern, and providing a name for each token; and (d) collecting all of said tokens in a collection of tokens; whereby said collection of tokens will be a word-based representation of said non-word-based document that is efficiently and systematically generated. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A method for searching a plurality of non-word-based documents, comprising:
-
(a) representing each of said non-word-based documents with a collection of respective document tokens, and providing a name for each document token; (b) providing a non-word-based query and representing said query with a query token or a combination of query tokens, and providing a name for each query token; and (c) searching said documents by searching said names of said document tokens using said names of said query tokens; whereby said method will be able to search non-word-based documents efficiently and systematically. - View Dependent Claims (22)
-
Specification