Why Simplification Is Complicated

Geography is fractal.  Zooming in on a seacoast, river or roadway simply reveals more complexity as the scale is increased.  Closeup detail is often present in vector geometries so that it's there when you need it.  It becomes increasingly indiscernible when viewing those same vectors at smaller scales, however, and the effort to store, transfer and render it all is largely wasted.

Why deal with city blocks when you are creating an image (on screen or in print) of a state, country or continent?  Simplification is the art of retaining what's necessary and eliminating what isn't.

"Art" may seem like a pretentious word for describing the work of an algorithm but there is room for judgement.  Choosing the appropriate algorithm, tweaking its tolerance parameters and making manual edits after the fact are all a part of the process.

Most GIS programs provide a tool or function to simply (or generalize) lines and polygons.  The good ones provide several different simplification methods or at least a means to add them as a scripted procedure.  The differences are rarely explained, however, and algorithm research doesn't usually turn up real-world examples.

There is clearly a limit to how much information can be removed or synthesized and still maintain a visually relevant shape.  Curve simplification has been of interest to mathematicians for many years.  At least a dozen recognized methods have been devised and there is still active interest in developing new variations and improvements.  This suggests that a perfect solution either hasn't yet been found or doesn't actually exist.  An examination of two popular methods with noticeably different results may help explain why the academic effort continues.

David Douglas and Thomas Peucker published an algorithm in 1973 which is still the most commonly used method for vector simplification.  Their iterative algorithm uses the perpendicular distance of a point from a trend line to determine whether a point should be eliminated or whether the trend line should be redrawn through that point.

M. Visvalingam and J. D. Whyatt published an algorithm in 1993 that moves iteratively along a line or polygon ring, evaluating every set of 3 adjacent points.  If the triangle created by the middle point and a straight line between the outer 2 points is greater than some tolerance, the middle points retained.  Otherwise, it is eliminated.

Those explanations are purposefully brief to highlight their essential difference.  Douglas-Peucker uses the linear offset of each vertex to assess its visual relevance while Visvalingam-Whyatt uses the area of displacement created by each vertex.

Both methods are effective and efficient.  Visvalingam-Whyatt, however, produces a "smoother" result.  Sharp spikes tend to be preserved by Douglas-Peucker but may eliminated by Visvalingam-Whyatt if they are sufficiently narrow.  Conversely, Visvalingam-Whyatt may preserve gentle changes occurring over a long distance that Douglas-Peucker will summarily eliminate.

To see this in the real world, let's look at the outline of the state of Delaware.  This is an interesting example because Delaware is relatively small and its perimeter includes straight lines, a very convoluted inland waterway (Rehoboth and Indian River Bays) and a unique circular boundary with Pennsylvania.

A boundary shapefile (including islands) was obtained from the Delaware Geological Survey containing 51,346 vertices.  This geometry provides decent resolution down to a scale of about 1:1,000.  It was then simplified with PostGIS by both using both the ST_Simplify (Douglas-Peucker) and ST_SimplifyVW (Visvalingam-Whyatt) spatial functions.  The tolerance parameters, in meters and meters2 respectively, were manually tweaked to achieve simplified versions with the same number of vertices.

The following images are a section of Indian River Bay at approximately 1:21,000.  Retaining at least 10% of the total vertices would be necessary to draw a reasonably detailed line at this scale.  Reducing the total number well beyond that clearly reveals how well the algorithms perform.  The fully-detailed geometry is in light blue, the Douglas-Peucker simplification is red and the Visvalingam-Whyatt is gold:


720 total vertices (1.4%)

375 total vertices (0.7%)

215 total vertices (0.4%)

Douglas-Peucker makes a jagged attempt to follow the coast line, but is it really a better approximation of the general shape?  At the extreme, it is so contorted that it has created a self-intersection...

The following lines show the circular boundary on the northern end of the state at approximately 1:900,000:


720 total vertices

215 total vertices

70 total vertices

The comparative effect on a smooth, consistent curve becomes obvious as the simplifications become more extreme, Douglas-Peucker collapses the curve into flat segments sooner and with greater irregularity.

Comparing the total area of the extremely simplified polygons to the total area of the original is another test of effectiveness.  Units and projection are irrelevant; it is the variance that matters.  The original polygon was 8,498.4 kilometers2 in area.  The extreme simplifications used above varied from this total by the following amounts:

Total
Vertices
Douglas-
Peucker
Visvalingam-
Whyatt
750 - 3.86 + 0.04
375 - 4.45 + 0.88
215 - 9.80 + 3.00
125 - 37.59 - 2.08
70 - 40.51 + 6.77

The Douglas-Peucker algorithm is widely accepted as the "best" simplification method.  Although this investigation is admittedly unscientific, it clearly challenges that assumption.  Detail will always be lost but preserving area and avoiding visually disruptive contours are very desirable goals.  Visvalingam-Whyatt appears to be a nice solution and I plan to use it in the future.

Next time: How does polygon simplification affect topologies?  The complexity continues...

References:

{% put styles %}

{% endput %}

Posted in Open Source GIS, Techniques on May 24, 2017