Efficiently and systematically searching stock, image, and other non-word-based documents
First Claim
1. A computer method for searching stock documents containing non-word-based data, comprising the computer executed steps of:
- (a) collecting a group of stock documents to form a collection of stock documents;
(b) dividing each document in said group of collected documents into a series of elements of same type, each element of said series of elements is a time scale in stock document data;
(c) defining a plurality of non-word-based token patterns;
(d) tokenizing said documents by matching said series of elements against said plurality of defined non-word-based token patterns to generate a collection of tokens for each of said documents, and providing a name for each of said tokens;
(e) combining the collections of tokens for said documents into a master collection of tokens;
(f) searching for stock documents in said collection of documents that have the same token names as a query or a combination of queries, by searching said query or queries in said master collection of tokens, to provide a plurality of matching documents with respective scores; and
(g) displaying matching documents in the order of their matching scores.
0 Assignments
0 Petitions
Accused Products
Abstract
One embodiment of a non-word-based information retrieval system includes searching stock or image documents in a huge data source. A non-word-based document is first divided into a series of elements or an array of cells. Each element or cell is matched against a series of predefined token patterns, so that a match will generate a token having a name. The collection of the generated named tokens is a word-based representation of the non-word-based document. After tokens from all documents are collected in a master collection of tokens, the non-word-based documents can be efficiently and systematically searched in a manner analogous to a document search in a word-based search system.
-
Citations
5 Claims
-
1. A computer method for searching stock documents containing non-word-based data, comprising the computer executed steps of:
-
(a) collecting a group of stock documents to form a collection of stock documents; (b) dividing each document in said group of collected documents into a series of elements of same type, each element of said series of elements is a time scale in stock document data; (c) defining a plurality of non-word-based token patterns; (d) tokenizing said documents by matching said series of elements against said plurality of defined non-word-based token patterns to generate a collection of tokens for each of said documents, and providing a name for each of said tokens; (e) combining the collections of tokens for said documents into a master collection of tokens; (f) searching for stock documents in said collection of documents that have the same token names as a query or a combination of queries, by searching said query or queries in said master collection of tokens, to provide a plurality of matching documents with respective scores; and (g) displaying matching documents in the order of their matching scores. - View Dependent Claims (2, 3, 4)
-
-
5. A computer method for tokenizing a non-word-based document into a collection of tokens, comprising the computer executed steps of:
-
(a) dividing said non-word-based document into a plurality of elements of same type; (b) defining a plurality of non-word-based token patterns; (c) matching each of said plurality of elements of same type against each of said plurality of defined non-word-based token patterns so as to generate a token for each match between an element and a defined non-word-based token pattern, and providing a name for each token; (d) collecting all of said tokens in a collection of tokens; and (e) wherein said plurality of elements is of time scale type.
-
Specification