Address Validation

Using new techniques to increase the success rates of address validation queries between two service providers.

Project Blue Rock

Some of MEF’s members have recently highlighted the very significant challenge they face in automating some types of inter-provider business that relies in some way on validating a geographical location.

 

Validation of a geographic location refers to the action of a Buyer supplying a description of a geographic location (i.e., address) to a Seller that in turn checks to see if it can recognize that address in its own database of geographic locations. Typically, there will be some mismatch between the way the Buyer represents the geographic location (Buyer-Supplied Address or Address X) and the way the Seller represents the geographic location (Seller-Recorded Address or Address Y).

Address X not equal to Address Y.

The causes for mismatch include:

IT development choices

There are no standardized approach for recording an address in a Buyer or Seller database. Therefore, choices made often reflect internal needs of the Buyer or Seller which don’t match those of their partners or even other parts of the same organization that have their own address databases.

Local customs and languages

Addresses may be specified by a customer according to local customs and language which may not be recordable in the database setup of the Buyer.

Typos

Typographical error in Address X, Address Y or both. (e.g. Londn instead of London)

Abbreviations

Countries, provinces, counties, street, road etc. are often abbreviated in arbitrary ways (Rd. or Rd instead of Road; GBR for Great Britain)

Junctions/Intersections

Some locations are referenced arbitrarily at junctions (e.g. https://www.geocod.io/guides/intersections/)

Conversion of diacritical marks

Use of letters without diacritical marks to represent letters with diacritical marks (e.g. Ahtarintie instead of Ähtärintie; Duesseldorf instead of Düsseldorf)

Misunderstood municipal, regional and national boundaries

Provinces, counties, districts, colonies etc. are often included, excluded or referenced incorrectly (e.g., this full address in Mexico City includes district, colony and city HEGEL 111 PISO 01 COL POLANCO CHAPULTEPEC DEL MIGUEL HIDALGO CIUDAD DE MEXICO MEX MEXICO 11560)

Road section

Some roads require East or West, North or South to be included in order to identify the correct section (e.g. 1250 Rene-Levesque Blvd. W., Montreal, Quebec is a different location from 1250 Rene-Levesque Blvd. O., Montreal, Quebec)

Local names

Some areas may be identified by locals differently than the ‘official’ name (e.g. Andrew’s Knob is referenced in local maps but not in Google Maps)

Building Numbers

Addresses may often include a range of numbers or even fractional numbers rather than a single whole number (e.g. 109-111, 1600 15th Ave, Prince George, BC, Canada; 75½ Bedford St. in New York) There may also be two different numbers for the same building.

Multiple names for streets

A street may be referenced by one or more different names (e.g. https://www.lastwordonnothing.com/2018/08/17/where-the-streets-have-two-names/)

Building Floor or Room/Suite

A location may be on a specific floor in a building and/or in a particular room or suite (e.g. in data centers)

Part of a large complex

A location may be in a data center, a shopping mall, a business park, a military base etc. and not referenced correctly.

Information is left out

Key parts of the address may have been excluded for a variety of reasons (e.g. Exeter Airport, Exeter, EX5 2BA, GBR)

There is no address

In many countries, there are villages where there is no recognized address. (e.g. Locals pointing to the “3rd blue door on the left past the alley”)

There have been numerous attempts to standardize address formats (e.g. GPS coordinates, national standardization, What3Words, Open Location Code etc.) but none have achieved universal adoption.

There are also numerous third party commercial solutions (e.g. Melissa, ESRI etc.) but these often are more effective in certain countries and ineffective in others.

In MEF, there are over 120+ service providers that are buying and selling connectivity services from one another. The business ‘friction’ caused by the high rate of false-negatives (or false-positives) in address validation queries is increasingly a significant cause of slowing down supply chain automation in the telecom industry.

Here is an example of a completely correct address in Mexico City which can easily be misrepresented in different databases by even a small number of errors resulting in unsuccessful address validation queries with standard database lookup approaches:

HEGEL 111 PISO 01 COL POLANCO CHAPULTEPEC DEL MIGUEL HIDALGO CIUDAD DE MEXICO MEX MEXICO 11560
Objectives
  • Create an open-source rules builder and application that can be used to create a rules list that when run in the application, generates multiple near-versions of the original address validation query.

Significance

The streamlining of business automation is very dependent on the initial validation of the service delivery address.

Without This Project

If this problem is not solved, business automation between Buyers and Sellers of

Project Participants

MEF Member-only Resources:

All employees of active MEF-member companies are authorized to access the MEF Member Wiki. Don’t have a login? Register. Not a member? Join MEF. Not sure? Contact Us.