For participants only. Not for public distribution.

Note #3
Vision for automatic driving

John Nagle
Last revised December 10, 2002.

We can buy depth from stereo, from Point Grey Research or Tyzx. That gets us a depth map from the camera viewpoint. What to do next?

Most of the useful literature is from CMU, which has been struggling with this problem since the late 1980s.

Vision for offroad driving

The CMU offroad NavLab work gives us a clue. Good references to the CMU work are

There's substantial material in there, and it's worth reading quite a bit of it.

As an overview of how they do it, here are some key extracts, with commentary:

From An Integrated System For Off-Road Navigation, D. Langer, J. Rosenblatt, and M. Hebert, 1994. (PostScript, compressed with gzip, file may be damaged)

Perception

The range image processing module takes a single (depth) image as input and outputs a list of regions which are untraversable. After filtering the input image, the module computes the (x,y,z) location of every pixel in the range image in a coordinate system relative to the vehicle’s current position. The transformation from sensor to vehicle takes into account the orientation of the vehicle read from an INS system. The points are then mapped into a discrete grid on the (x,y) plane. Each cell of the grid contains the list of the coordinates of the points which fall within the bounds of the cell. The size of a cell in the current system is 20 cm. This number depends on the angular resolution of the sensor, in this case 0.5 degrees , and on the size of terrain features which need to be detected. The terrain classification is first performed in every cell individually. The criteria used for the classification are the height variation of the terrain within the cell, the orientation of the vector normal to the path of terrain contained in the cell, and the presence of a discontinuity of elevation in the cell. To avoid frequent erroneous classification, the first two criteria are evaluated only if the number of points in the cell is large enough. In practice, a minimum of five points per cell is used.

There's the first step they did after obtaining a depth image. It's almost like constructing a height field, but not quite. The points are mapped into 3D space, after adjusting for vehicle orientation (note that a gyro/accelerometer INS system is needed for this) , and tallied in a grid map of 20cm cells. They require at least five points per cell. From this we can calculate the system's range. 0.5 degree subtends 20cm at a range of 22 meters, which sounds good, but if we require 5 points per cell, we get less than half that range at best. This CMU project drove at slow speeds, 3-10 MPH. For higher-speed operation, we're going to need to do this at several scales, with lower-resolution maps for more distant terrain.

Local Map Management

The purpose of the local map module is to maintain a list of the untraversable cells in a region around the vehicle. In the current system, the local map module is a general purpose module called Ganesha [7]. In this system, the active map extends from 0 to 20 meters in front of the vehicle and 10 meters on both sides. This module is general purpose in that it can take input from an arbitrary number of sensor modules and it does not have any knowledge of the algorithms used in the sensor processing modules. The core of Ganesha is a single loop in which the module first gets obstacle cells from the perception modules, and then places them in the local map using the position of the vehicle at the time the sensor data was processed (Figure 8). The sensing position has to be used in this last step because of the latency between the time a new image is taken, and the time the corresponding cells are received by the map module, typically on the order of 600ms. At the end of each loop, the current position of the vehicle is read and the coordinates of all the cells in the map with respect to the vehicle are recomputed. Cells that fall outside the bounds of the map are discarded. Finally, Ganesha sends the list of currently active cells in its map to the planning system whenever the information is requested.

The local map in that system is a map of untraversable cells. This is simple and straightforward, but relies on decisions about untraversability made very early. We probably need a more quantatitive measure of traversability. Less-traversable regions (bumpy or tilted) may need to be traversed, but at lower speeds or from more favorable approach directions.

Above that lies the level that actually makes driving decisions, which is a subject for a separate note.

Onroad driving

CMU's most successful on-road driving system, ALVINN, used a completely different approach - one camera recognizing roads with neural nets. This is the system used in the "Hands off across America" test, accomplishing over a thousand miles of on-road driving with a human standing by to take over at any moment. The neural nets were in control over 99% of the time.

Neural nets are somewhat out of fashion at the moment. Yet the ALVINN project, in the early 1990s, got surprisingly good results with a quite dumb algorithm. The basic idea is simple enough - 30 neural-net recognizers are provided for a range of situations from "road curving sharply ot the left" through "straight road" through "road curving sharply to the right". The set of outputs tends to have a peak near the correct result. The nets are all pre-trained from a set of modified images of actual roads.

Cameras

This is a first cut at the problem.

Currently, I'm thinking in terms of two camera systems. The three main cameras are behind the windshield, arranged in a triangle, and there are three of them. They may be gyro-stabilized and must be mounted to filter out enough vibration that there's little noticeable vibration during a frame time. These are our main system.

Stabilization will be tough, because the cameras have to be stabilized as a unit. This needs to be looked into. One useful trick might be to have a hood ornament visible in the field of each camera, and use it as an alignment guide.

The auxiliary cameras are a pair, mounted in the front brush guard and aimed forward and down. Something like the Point Grey Bumblebee. We might install it inside a transparent plastic cylinder. These cameras don't have to be gyro-stabilized, because they're for use only in slow-speed situations. Their main job is to deal with the blind spot topping a rise, when you can't see the ground through the windshield. This prevents going over a cliff, or, more likely, into a ditch.

It should be possible to drive the vehicle on either set of cameras alone, for redundancy, although at slower speeds and not in as many situations.