Non-literal textual search using fuzzy finite non-deterministic automata
First Claim
1. A computer-implemented method for selectively retrieving information, including a plurality of stored text strings that do not have to be presorted, said stored text strings contained in a document set stored on a data storage medium and accessible by a computer processor, the method comprising the steps of:
- A. transmitting a user-defined text string query to the processor;
B. constructing a fuzzy finite non-deterministic fixed-size automaton corresponding to said query, wherein said automaton has at least two states, each state can have more than two values, and more than one state can be active simultaneously;
C. applying each distinct stored text string individually and sequentially to said automaton just once and generating thereby an accumulated dissimilarity metric associated with each distinct text string in the stored document set; and
D. displaying a subset of the distinct stored text strings, said subset based upon values of the accumulated dissimilarity metrics.
1 Assignment
0 Petitions
Accused Products
Abstract
Method and system for selectively retrieving information contained in a stored document set using a metric-based or "fuzzy" finite-state non-deterministic automaton. An automaton is constructed (501) corresponding to a text string query, text strings are read (502) from storage and corresponding dissimilarity values are generated (505). Those strings resulting in values less than a given threshold are recorded (508) and listed for the user. Dissimilarity values are determined based on penalties associated with missing characters, extra characters, incorrect characters, and other differences between the text string query and a text string read from storage.
-
Citations
26 Claims
-
1. A computer-implemented method for selectively retrieving information, including a plurality of stored text strings that do not have to be presorted, said stored text strings contained in a document set stored on a data storage medium and accessible by a computer processor, the method comprising the steps of:
-
A. transmitting a user-defined text string query to the processor; B. constructing a fuzzy finite non-deterministic fixed-size automaton corresponding to said query, wherein said automaton has at least two states, each state can have more than two values, and more than one state can be active simultaneously; C. applying each distinct stored text string individually and sequentially to said automaton just once and generating thereby an accumulated dissimilarity metric associated with each distinct text string in the stored document set; and D. displaying a subset of the distinct stored text strings, said subset based upon values of the accumulated dissimilarity metrics. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 25)
-
-
13. A computer system for selectively retrieving information, including a plurality of stored text strings that do not have to be presorted, said text strings contained in a document set stored on a data storage medium and accessible by a computer processor, the system comprising:
-
A. a data input device providing a user-defined text string query to the processor; B. a fuzzy non-deterministic fixed-size automaton corresponding to said query, adapted to accept as input each distinct stored text string and to produce in response a dissimilarity metric associated with each distinct stored text string, wherein said automaton has at least two states, each state can have more than two values, and more than one state can be active simultaneously; and C. a display device displaying a subset of the distinct stored text strings, said subset based upon values of the dissimilarity metrics. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26)
-
Specification