Graham's scan is an algorithm used to find the boundary on a set of points that form a convex hull. Invented in the early 70's by a person called Ron Graham, it is one of the earliest algorithms used in the field of computational geometry.

The algorithm has been proved to be the most efficient possible, with a time complexity of O(n log n).

The only calculation used other than an initial sort is a (simple) function to determine whether a given point lies on the left or the right of an existing line between two other points. This is covered in tes' writeup in computational geometry.

The ''scan'' bit of the titular algorithm comes from the manner in which the points are processed. With the points laid out on a standard Cartesian plane, you can visualise a vertical ''scan line'' moving left to right, with the algorithm dealing with points as they hit the scan line. Algorithms that ''scan'' a set of points in order to do something to them are common in computational geometry. You normally come across them doing things in close to O(n log n) time that a naive algorithm would do in O(nx).

In this case, the scan moves left to right and then back again to the left, for reasons explained shortly, just after the end of this paragraph in fact. The entire algorithm is as follows:

The input is a set of n points P (with the standard Cartesian coordinates), and the output is the set of points Q that form a convex polygon with all points in P either in or on its boundary1.

  1. Sort the points in P according to x-axis, smallest first. If two or more points have the same x-coordinate, sort those points by y-axis, smallest first.

    This is something that can be farmed out to a sorting algorithm such as quicksort. The end result is that P is a sorted list of points.

  2. Move the first and second points from P to Q.

  3. Do this:

    1. Move the next point from P to Q.

    2. If the angle of the arc formed by the last three points in Q turn anticlockwise then:

      while there are 3 or more points in Q, AND the last three points in Q turn anticlockwise:
      delete the point just before the one you added in step 3.1.

  4. While there are still points in P, repeat step 3.

What's being done here is quite simple. Step 2 simply starts us off. Steps 3 and 4 form the main loop, and tests if the previous point added is in or on the hull. For example, if we take these first three points:

   ..........
   ..........
   .......c..
   ..........
   ..........
   ..........
   .....b....
   a.........

Points a and b are added to Q (our final list) in step 2. Point c is then added, and step 3.2 finds that the line formed by the points (a, b, c) turns anticlockwise at point b. Point b is then deleted, like the anti-clockian scum it so plainly is. Step 4 makes us repeat step 3 again, and point d is added:

   ....................
   ....................
   ....................
   ...............d....
   .......c............
   ....................
   ....................
   ....................
   .....b<--deleted!...
   a...................

This time the last three points in Q, (a, c, d) form a clockwise turn, so we the loop in steps 3-4 marches onwards, until we are at the right-most point.

The points we are left with in Q form the upper half of the boundary. To complete the hull we compute the lower half of boundary. We start again with a sorted list P, just like in steps 2-42.

  1. Move the second-to-last point from P to Q.

    Technically speaking we should add the last point first as well, but step 3 above should have already added this the last time it was used.

  2. Do this:

    1. Move the last remaining point from P to Q.

    2. If the angle of the arc formed by the last three points in Q turn clockwise then:

      While there are 3 or more points in Q AND the last three points in Q turn clockwise:
      delete the point just before the one you added in step 6.1.

  3. While there are 2 or more points in P, go back to step 6.

    Instead of saying ''While there are still points in P, go back to step 5'', we make sure not to duplicate the first point in Q, which was the first point added to Q in step 2.

Steps 5-7 are almost identical to steps 2-4, with two differences:

  1. We add points from the end of the list instead of the beginning.
  2. We delete points that cause clockwise instead of anticlockwise turns.

This gives us the bottom half of the hull. The resulting list of points in Q begins with the left-most point, and contains every point on the boundary.

And that's it. What makes this algorithm so efficient?

The first step is the sort, which need only be done once2. Sorting is O(n log n). For each point in P, the loops at steps 3-4 and 6-7 are performed once each - a constant overhead.

The remaining question is how the time taken for the while loops in steps 3.2 and 6.2 to execute will change as the size of the input increases. Here, we can see that each point in P can be deleted from Q twice at most - also a constant overheard.

So the main steps of the algorithm are O(n), which we can then ignore when computing the order of the algorithm. Thus proving that the complexity of computing the convex hull of a set of points is equivalent to sorting.

Footnotes:
1: The reason I'm assigning letters to these sets (or lists, to be pedantic) is because referring to them as ''the first/second set'', or ''the input/output set'' will eventually confuse me.

2: When actually coding this, you'd use the list sorted in step 1 in both steps, with a for loop or an iterator. The reason I'm deleting from P and then magically filling it again instead is so I don't have to explain for loops or iterators.

Sources:
http://www.cs.wustl.edu/~pless/506/l2.html
Computational Geometry: algorithms and applications
  2nd Ed, de Berg et al, 3-540-65620-0
Any mistakes are mine, not due to above sources.

Published by Ron Graham in 1972, Graham's scan is an optimal algorithm for finding the convex hull of a set of points.

As a bit of historical curiosity, the original algorithm published by Graham was shown to be incorrect. It was:

Form a simple polygon from the input points going in counterclockwise order.
Find the point with the smallest y component.  
Add this point and the point following it to the stack S.
While there are points left on the polygon,
  while a right turn is formed by stack[top - 1], stack[top], and the polygon's next point,
    pop the top point from the stack.
  push the next point from the polygon onto the stack.

This doesn't work for all polygons though. In fact, it only works for those polygons which are weakly externally visible. So the obvious fix was to create the polygon in the first step that is guaranteed to be weakly externally visible. One easy method of doing this is to sort all the points according to the angle formed by the line between them and the point with the lowest y coordinate and a horizontal line coming from the point with the lowest y coordinate. Once this is taken care of, the stack always contains the convex hull of all the points seen so far at the end of the second while loop, so a loop-invariant argument for the correctness of the algorithm is straightforward.

Time Complexity

Assuming that the method just given for creating a simple polygon is used, that takes O(n log n). The nested while loops would appear to take O(n2), but if you notice that no point can be popped off the stack twice, then it should be clear that it can take only O(n). Thus the time complexity is dominated by the O(n log n) sort.

Optimality

The convex hull problem is Ω(n log n). For a proof, we reduce the sorting problem to it. For the sorting problem, we are given an array of numbers and we have to output them in sorted order. First we make all the numbers points of the form (x, x2) (an O(n) step). Then we find the convex hull of these points using any convex hull algorithm. Then we find the point in the output list with the smallest x coordinate and read off the x coordinates of the points in the order returned (another O(n) step). This gives us the input points in sorted order. Now if there was a convex hull algorithm faster than O(n log n), we would have a sorting algorithm that was faster than O(n log n). But we know that all sorting algorithms (in a comparison based model of course) are Ω(n log n), so that would not be possible.

Log in or register to write something here or to contact authors.