METHODS FOR EFFICIENTLY AND SYSTEMATICALLY SEARCHING STOCK, IMAGE, AND OTHER NON-WORD-BASED DOCUMENTS

US 20100153402A1
Filed: 02/04/2010
Published: 06/17/2010
Est. Priority Date: 05/24/2006
Status: Active Grant

First Claim

Patent Images

1. A computer method for searching image documents containing non-word-based data, comprising the computer executed steps of:

(a) collecting a group of image documents to form a collection of image documents;

(b) dividing each document in said group of collected documents into an array of cells of same type, each of said cells of said array comprises a plurality of pixels;

(c) defining a plurality of non-word-based token patterns;

(d) tokenizing said documents by matching said array of cells against said plurality of defined non-word-based token patterns to generate a collection of tokens for each of said documents, and providing a name for each of said tokens;

(e) combining the collections of tokens for said documents into a master collection of tokens;

(f) providing a query image, said query image is a part of an image document, and dividing said query into an array of cells, each of said cells of said array comprises a plurality of pixels;

(g) tokenizing said query image by matching said array of cells of said query image against said plurality of defined non-word-based token patterns to generate a collection of tokens of said query image, and providing a name for each of said tokens;

(h) searching for image documents in said collection of documents that have the same tokens with the same position arrangement as said tokens of said query image by searching said query token names in said master collection of tokens, to provide a plurality of matching documents with respective scores; and

(i) displaying matching documents in the order of their matching scores.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One embodiment of a non-word-based information retrieval system includes searching stock or image documents in a huge data source. A non-word-based document is first divided into a series of elements or an array of cells. Each element or cell is matched against a series of predefined token patterns, so that a match will generate a token having a name. The collection of the generated named tokens is a word-based representation of the non-word-based document. After tokens from all documents are collected in a master collection of tokens, the non-word-based documents can be efficiently and systematically searched in a manner analogous to a document search in a word-based search system

Citations

11 Claims

1. A computer method for searching image documents containing non-word-based data, comprising the computer executed steps of:
- (a) collecting a group of image documents to form a collection of image documents;
  
  (b) dividing each document in said group of collected documents into an array of cells of same type, each of said cells of said array comprises a plurality of pixels;
  
  (c) defining a plurality of non-word-based token patterns;
  
  (d) tokenizing said documents by matching said array of cells against said plurality of defined non-word-based token patterns to generate a collection of tokens for each of said documents, and providing a name for each of said tokens;
  
  (e) combining the collections of tokens for said documents into a master collection of tokens;
  
  (f) providing a query image, said query image is a part of an image document, and dividing said query into an array of cells, each of said cells of said array comprises a plurality of pixels;
  
  (g) tokenizing said query image by matching said array of cells of said query image against said plurality of defined non-word-based token patterns to generate a collection of tokens of said query image, and providing a name for each of said tokens;
  
  (h) searching for image documents in said collection of documents that have the same tokens with the same position arrangement as said tokens of said query image by searching said query token names in said master collection of tokens, to provide a plurality of matching documents with respective scores; and
  
  (i) displaying matching documents in the order of their matching scores.
- View Dependent Claims (2, 3, 4)
- - 2. The computer method of claim 1 wherein said collection of documents is supplied in a data source.
  - 3. The computer method of claim 1 wherein said collection of documents is on the Internet and said collecting is done from the Internet.
  - 4. The computer method of claim 1 wherein said plurality of image documents also contains word-based data.

5. A computer method for searching non-word-based documents containing data with m-measurements over n-dimensional space, comprising the computer executed steps of:
- (a) collecting a group of non-word-based documents of same type containing data with m-measurements over n-dimensional space to form a collection of documents;
  
  (b) dividing each document in said group of collected documents into a plurality of elements of same type;
  
  (c) defining a plurality of non-word-based token patterns;
  
  (d) tokenizing said collected non-word-based documents by matching their elements of same type against said plurality of defined non-word-based token patterns to generate a master collection of tokens, and providing a name for each token;
  
  (e) searching for non-word-based documents in said collection of documents that have the same tokens as a query token or a combination of query tokens, said query is a part of a non-word-based document, by searching said query token or tokens in said generated master collection of tokens, to provide a plurality of matching documents with respective scores; and
  
  (f) displaying matching documents in the order of their matching scores.
- View Dependent Claims (6, 7)
- - 6. The computer method of claim 5 wherein said collection of documents is supplied in a data source.
  - 7. The computer method of claim 5 wherein said collection of documents is on the Internet and said collecting is done from the Internet.

8. A computer method for tokenizing a non-word-based document into a collection of tokens, comprising the computer executed steps of:
- (a) dividing said non-word-based document into a plurality of elements of same type;
  
  (b) defining a plurality of non-word-based token patterns;
  
  (c) matching each of said plurality of elements of same type against each of said plurality of defined non-word-based token patterns so as to generate a token for each match between an element and a defined non-word-based token pattern, and providing a word-based name for each token; and
  
  (d) collecting all of said tokens in a collection of tokens, said collection of tokens is a word-based representation of said non-word-based document.
- View Dependent Claims (9, 10, 11)
- - 9. The method of claim 8 wherein said non-word-based document is supplied in a data source.
  - 10. The method of claim 8 wherein said non-word-based document is on the Internet.
  - 11. The method of claim 8 wherein said non-word-based document is one of:
    - image document and document containing data with m-measurements over n-dimensional space.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sizhe Tan
Original Assignee
Sizhe Tan
Inventors
Tan, Sizhe

Granted Patent

US 8,898,171 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/741
CPC Class Codes

G06F 16/583   using metadata automaticall...

G06F 16/93   Document management systems

G06V 30/10   Character recognition

G06V 30/162   Quantising the image signal

G06V 30/18019   by matching or filtering

G06V 30/18105   related to colour

METHODS FOR EFFICIENTLY AND SYSTEMATICALLY SEARCHING STOCK, IMAGE, AND OTHER NON-WORD-BASED DOCUMENTS

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS FOR EFFICIENTLY AND SYSTEMATICALLY SEARCHING STOCK, IMAGE, AND OTHER NON-WORD-BASED DOCUMENTS

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links