NATURAL LANGUAGE PARSERS TO NORMALIZE ADDRESSES FOR GEOCODING

US 20090248605A1
Filed: 09/29/2008
Published: 10/01/2009
Est. Priority Date: 09/28/2007
Status: Active Grant

First Claim

Patent Images

1. A method for normalizing an input address comprising the steps of:

receiving an input address,parsing the input address into components,classifying each component according to one or more predetermined regular expressions and a lexicon of known tokens, thereby generating classified components, andexecuting a predictive model to associate each classified component with a unique address field.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a technique for building natural language parsers by implementing a country and/or jurisdiction specific set of training data that is automatically converted during a build phase to a respective predictive model, i.e., an automated country specific natural language parser. The predictive model can be used without the training data to quantify any input address. This model may be included as part of a larger Geographic Information System (GIS) data-set or as a stand alone quantifier. The build phase may also be run on demand and the resultant predictive model kept in temporary storage for immediate use.

89 Citations

View as Search Results

19 Claims

1. A method for normalizing an input address comprising the steps of:
- receiving an input address,parsing the input address into components,classifying each component according to one or more predetermined regular expressions and a lexicon of known tokens, thereby generating classified components, andexecuting a predictive model to associate each classified component with a unique address field.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising the step of executing the predictive model to generate a probability associated with each unique address field.
  - 3. The method of claim 1, further comprising the step of generating the predictive model from a training file comprising the one or more predetermined regular expressions and exemplary tokens.
  - 4. The method of claim 3, wherein the training file is associated with a particular country or jurisdiction.
  - 5. The method of claim 1, wherein the step of classifying each component is performed by matching a component to the one or more predetermined regular expressions only when there is no match between that component and the lexicon of known tokens.
  - 6. The method of claim 1, wherein the predictive model is associated with a particular country or jurisdiction.
  - 7. The method of claim 6, wherein the predictive model comprises a table of probabilities associated with the unique address fields.

8. A method of constructing a natural language parser comprising the steps of:
- loading a training file defining an acceptable format for one or more regular expressions and comprising exemplary address field and token pairs;
  
  parsing the training file into a number of tokens;
  
  classifying the tokens according to a lexicon of known tokens and the regular expressions; and
  
  generating a predictive model that defines a probability for each of one or more address fields that may be associated with a given token.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The method of claim 8, further comprising the step of identifying the most likely address field for each of the classified tokens.
  - 10. The method of claim 8, wherein the training file and predictive model are specific to a unique country or jurisdiction.
  - 11. The method of claim 8, further comprising the step of calculating the probability based on a number of times each classified token ends up in a given address field.
  - 12. The method of claim 8, wherein the training file indicates the relative positions of each exemplary token.

13. A computer readable medium encoded with computer readable program code, the program code comprising the instructions of:
- parsing an input address into components,classifying each component according to one or more predetermined regular expressions and a lexicon of known tokens, thereby generating classified components, andexecuting a predictive model to associate each classified component with a unique address field.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The computer readable medium of claim 13, further comprising the instruction of executing the predictive model to generate a probability associated with each unique address field.
  - 15. The computer readable medium of claim 13, further comprising the instruction of generating the predictive model from a training file comprising the one or more predetermined regular expressions and exemplary tokens.
  - 16. The computer readable medium of claim 15, wherein the training file is associated with a particular country or jurisdiction.
  - 17. The computer readable medium of claim 13, wherein the instruction of classifying each component is performed by matching a component to the one or more predetermined regular expressions only when there is no match between that component and the lexicon of known tokens.
  - 18. The computer readable medium of claim 13, wherein the predictive model is associated with a particular country or jurisdiction.
  - 19. The computer readable medium of claim 18, wherein the predictive model comprises a table of probabilities associated with the unique address fields.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verizon Patent and Licensing Incorporated (Verizon Communications Inc.)
Original Assignee
Telogis Incorporated (Verizon Communications Inc.)
Inventors
Morris, Arthur Newth IV, Mason, Ralph James, MITCHELL, David John

Granted Patent

US 8,868,479 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/52
CPC Class Codes

G06F 40/205   Parsing

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

NATURAL LANGUAGE PARSERS TO NORMALIZE ADDRESSES FOR GEOCODING

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

89 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

NATURAL LANGUAGE PARSERS TO NORMALIZE ADDRESSES FOR GEOCODING

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

89 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others