Method and system for formatting address strings into recognizable token sequences
First Claim
1. A computerized method for formatting the components of an address in the form of an address string into a recognizable token sequence for database processing, the method comprising:
- providing a memory;
storing a token table into the memory, the token table having predetermined token types;
storing a rule table into the memory, the rule table having a first plurality of rules corresponding to a plurality of selected sequences of token types;
comparing the components of the address string to the token table and assigning corresponding token types thereto to form a corresponding first sequence of token types;
comparing the first sequence of token types to the rule table to determine whether the first sequence of token types correspond to one of the plurality of selected sequences of token types contained in the rule table;
processing the first sequence of token types in accordance with one of the first plurality of rules to convert the first sequence of token types into a recognizable sequence format if the first sequence of token types correspond to one of the plurality of selected sequences of token types contained in the rule table; and
processing the first sequence of token types in accordance with a predetermined interpretation procedure to convert the first sequence of token types into a recognizable sequence format if the first sequence of token types do not correspond to one of the plurality of selected sequences of token types contained in the rule table.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and system are disclosed for formatting address strings into a recognizable sequence of token types for database processing. The system includes a token rule processor and a token sequence processor. The method begins with the step of assigning token types to components of an address string to form a sequence of token types. The method next includes the step of determining whether or not the sequence of token types is contained in an adjustable predetermined rule table. The method further includes the step of processing the sequence of token types into a recognizable sequence format if the sequence of token types is contained in the rule table. Finally, the method concludes with the step of processing the sequence of token types into a recognizable sequence format in accordance with a predetermined interpretation procedure if the sequence of token types is not contained in the rule table.
19 Citations
15 Claims
-
1. A computerized method for formatting the components of an address in the form of an address string into a recognizable token sequence for database processing, the method comprising:
-
providing a memory; storing a token table into the memory, the token table having predetermined token types; storing a rule table into the memory, the rule table having a first plurality of rules corresponding to a plurality of selected sequences of token types; comparing the components of the address string to the token table and assigning corresponding token types thereto to form a corresponding first sequence of token types; comparing the first sequence of token types to the rule table to determine whether the first sequence of token types correspond to one of the plurality of selected sequences of token types contained in the rule table; processing the first sequence of token types in accordance with one of the first plurality of rules to convert the first sequence of token types into a recognizable sequence format if the first sequence of token types correspond to one of the plurality of selected sequences of token types contained in the rule table; and processing the first sequence of token types in accordance with a predetermined interpretation procedure to convert the first sequence of token types into a recognizable sequence format if the first sequence of token types do not correspond to one of the plurality of selected sequences of token types contained in the rule table. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for formatting the components of an address in the form of an address string into a recognizable token sequence for database processing, the system comprising:
-
a first memory; a token table having predetermined token types, the token table stored in the first memory; a rule table having a first plurality of rules corresponding to a plurality of selected sequences of token types, the rule table stored in the first memory; a first comparator for comparing the components of the address string to the token table and assigning corresponding token types thereto to form a corresponding first sequence of token types; a second comparator for comparing the first sequence of token types to the rule table to determine whether the first sequence of token types correspond to one of the plurality of selected sequences of token types contained in the rule table; a first processor for processing the first sequence of token types in accordance with one of the first plurality of rules to convert the first sequence of token types into a recognizable sequence format if the first sequence of token types correspond to one of the plurality of selected sequences of token types contained in the table; and a second processor for processing the first sequence of token types in accordance with a predetermined interpretation procedure to convert the first sequence of token types into a recognizable sequence format if the first sequence of token types do not correspond to one of the plurality of selected sequences of token types contained in the rule table. - View Dependent Claims (9, 11, 12, 13, 14, 15)
-
-
10. The system of 8 wherein the first comparator is a lexical analyzer adapted to decompose the components of the address string into tokens in accordance with the token table.
Specification