Providing an accurate location is something of a problem. Geographic coordinates are the obvious solution, but they mean very little to most people. Our mental concept of where things are is entirely referential and based on familiar routes and landmarks. These can be learned from a map just as they can be learned from actual experience. That's why maps are so important.
The advent of GPS in our daily lives has made navigation much easier in unfamiliar surroundings, but we still rely on translating those GPS coordinates into recognizable descriptors: cities, streets, buildings and addresses. Google Maps will produce an authentic-looking marker when searching for such things, too. Can we really trust this geographic magic? Are the coordinates of every named place in the world already known? Well, yes and no.
Geocoding is the process of translating a familiar descriptor, like an address, into geographic coordinates. This is commonly done to calculate distances, determine service parameters and predict environmental effects. Reverse geocoding is the opposite; converting a geographic coordinate into a recognizable descriptor. Both involve comparing an input with a spatial database of known locations. That sounds simple enough but a host of assumptions and variables can get in the way.
It is a mistake to think that any geocoding process is reliably accurate all of the time. There will always be some significant errors because addresses are horribly arbitrary. This seems odd since street addresses are a rather recent invention in the greater scheme of things.
A Brief History of Addresses
Postal services have existed in some form or other since antiquity. Cyrus the Great of Persia (600 – 530 BC) established an organized system based on relays of horsemen not unlike the Pony Express. The Greek historian Herodotus (484-425 BC) famously described the Persian couriers as "...stayed neither by snow nor rain nor heat nor darkness from accomplishing their appointed course with all speed."
Similar services were available in Egypt, Rome and medieval Europe. There were no addresses, though. Such efforts relied entirely on place names (or toponyms) to identify everything; regions, communities, roads and even individual houses. This apparently worked well enough when travel was slow and transport was always provided by locals. Mail service was largely limited to the aristocracy anyway. They were the only members of society who had much need for long-distance communication or the education to conduct it.
This started changing in the middle of the 18th century. One of the first street numbering systems was introduced in London in 1746 and similar ideas quickly followed throughout Europe. The concept was invented to manage many administrative details like land ownership, access rights and taxation. Facilitating mail delivery was a secondary benefit. Geographic location was not a consideration at all.
In fact, using spatial logic was among the last of many numbering schemes that have been tried. These include:
- order of building construction
- sequential order around a physical block
- consecutive numbering up one side of a street and continuing back down the other
- color-coding of residential vs. business addresses (denoted in writing with a small letter suffix)
- numbering of building entrances rather than individual buildings
As urban areas evolved and underwent redevelopment, new construction posed new problems. Existing addresses were rarely re-numbered and inconsistencies were common. More meticulous city planners adopted fractional addressing to solve this problem and others started adding letter prefixes.
Eventually, a more-or-less standard "European" system evolved in which odd and even addresses are on opposite sides of a street and the number itself would be the distance in meters (divided by 100 or 1,000) from the center of a city or important junction. Where metric measure was not used, address numbers tend to simply increase by 100 at each crossroad (as is common in North America). In towns laid out in grids rather than a medieval radial plan, distances might be relative to an X or Y origin with a directional prefix added to the street name.
The point to remember about addresses is that they are local and ALL of the variants discussed above are still in use! Geocoding an address is therefore a fairly challenging task. It involves two distinct steps; interpreting a written address and then comparing it to the local addressing scheme.
The first part is more complicated than the second. Addresses are meant for humans, but we increasingly rely on machines to read them. Any algorithm that parses addresses must be able to recognize the normal form for the locale in question. If there is a comprehensible pattern of numbers, directions, prefixes, names and suffixes, the address is standardized to resolve any variation in capitalization, abbreviation or punctuation. Then the address is separated into discrete fields of information. Spelling errors and non-standard abbreviations can produce results that are entirely believable but absolutely wrong.
After an address is parsed, it is compared to a spatial database of known descriptors in order to deduce a reasonable geographic coordinate. There are three general methods by which this is accomplished:
1. Address Interpolation.
This method assumes some spatial logic was used to allocate address numbers along each street. Such a database stores each segment of every street and road (from intersection to intersection) as a separate record. These records include the address at each end of the segment and indicate to which side of the segment odd and even addresses belong. Once the right segment is found, the location of the requested address is estimated by its linear position within the address range. It may also be offset from the road by some fixed distance to the correct side.
This is process is related to voodoo. Interpolation is just guessing where to stick the pin. It can't be too far wrong in a dense urban area but it can be wildly inaccurate on longer, curved and rural segments. Addresses are typically not verified, either, so there is no guarantee that they actually exist. What's worse, any result for which a matching road segment is found is still considered to be "Address" accuracy. Virtually all free geocoding services rely on address interpolation.
2. Parcel Lookup
Many high-quality commercial geocoders use parcel data to supplement address interpolation. This requires access to legal property records on a jurisdiction-by-jurisdiction basis. Such records must be standardized, converted to spatial geometries and the dataset is always incomplete and subject to change. If a matching parcel can be found, the geocoding result is typically the centroid and not the location of a driveway or structure. This can improve accuracy in the suburbs but it is much less useful in rural areas.
3. Rooftop Recognition
There are several methods for locating the actual structure to which an address refers. They might rely on extensive pre-processing of legal records and land use data for individual building footprints or on image recognition techniques using ortho-rectified aerial photos. This is the most expensive type of geocoding and is typically available within limited areas.
I worked on a wildfire response system a few years ago which needed to locate insured homes in the wildland/urban interface — exactly where geocoding can be sketchy. We discovered that regardless of the geocoding provider used, 4% to 6% of our results were suspect and 1% were simply wrong. Manual inspection was the only way to make reliable corrections. But which locations needed further attention?
We "addressed" this problem by using spatial analysis to compare the results of several different geocoding sources and methods. Tightly clustered results elevated the confidence score. Scattered results clearly indicated a problem. The project was an eye-opener for the client who had previously assumed that geocoding was an exact science.
It is not. Geocoding is the difficult marriage of geographic measurement and human convenience. Whatever technical improvements we might make, the process will always be encumbered with inconsistency, legacy ideas and constant change. For something that the Persians started two and a half thousand years ago, it's pretty surprising that the Universal Postal Union was not established until 1874. This organization of 192 member countries sets the rules for international mail exchanges. Unfortunately, there is no equivalent for maintaining address standards. It's much too late for that anyway.