NWAPW Year 2

Technical Topics

(Back to Part 3)

Part 4: Hardware Considerations

Image Requirements

There are two kinds of patterns we might be trying to identify, those with features larger than twice the pixel size of the image (and therefore can be resolved) and those smaller (or slightly larger, but not aligned), which will be detectable only as a lumpy blur. These can be reduced uniformly to the latter category by application of Gaussian blur. So basically, all patterns (except for pedestrians too close to do anything about) can be treated as solid near-gray colors.

Much more difficult is recognizing unsaturated colors. Everything in most scenes -- except for a few red sports cars or lemon yellow chick cars and the occasional fire engine or school bus -- why do you think they chose those colors? -- everything is off-gray or off-white, or dirty black (or shiny black, reflecting the scene around it, with the result that it looks the same as dirty black) which comes off as dark gray inside the camera.

So if we restrict our first attempt to saturated colors, a blob of color 4-5 pixels square is sufficient to distinguish it from digital noise and small things like flowers and birds.

Assuming a fixed-focus lens (so pedestrian size in the image can give a reliable distance estimate, if we so choose) of normal field of view -- perhaps the equivalent of 50mm lens on a 35mm camera, it is not hard to calculate that a six-foot (2m) pedestrian at 50 meters would be 2mm on the film of that 35mm camera, and correspondingly 0.7mm on a 1/3" (8mm) standard C-mount video camera sensor chip. His shirt is a little less than half that, in round numbers 0.3mm or 300 microns. If the 8mm-diagonal chip resolves 320x240 color pixels (so the pixel size is about 20 microns square), that shirt image on the sensor chip is about 15 pixels square at 150 feet (50m).

Why 50 meters? According to the Oregon State Driver's instruction booklet, a car going 20mph (=10m/s, the standard downtown speed limit in Oregon) needs 65 feet (20m) to stop, so detecting him at 50m gives a 2x margin of safety. That's 15 pixels square; at 150m he is only 5 pixels square, which we guessed is the minimum for detection, another 3x margin of safety.
 

Frame Rate

The nominal pedestrian walking speed is 3mph, or about 1.5m/s. If your camera produces (and your software processes) one frame every second, the pedestrian has walked two feet between frames, which (other than very corpulent pedestrians: are there any? Such people get tired too quickly to do much walking) the guy is already more than his own width away, which makes tracking a single pedestrian very difficult. At 100fps he has moved only 1.5cm, which is less than the resolution of the camera. 10fps (15cm = 6" per frame) still offers plenty of overlap between frames.

A car driving the nominal Oregon downtown speed limit (10m/s), at a frame rate of 10fps, moves one meter every second, which gives you 20 seconds to get the car stopped in the distance we are told we need to do so.

The numbers are credible and consistent.
 

The Camera

Commercial low-cost digital surveilance cameras typically come in resolutions of 320x240 and 640x480 color pixels -- be careful: the vendors inflate the numbers by telling you the monochrome sensor density, but you must divide by 4 (= half each way) to get the true color pixel density. There are some factors related to the electronics and the physics of the sensors for how fast you can get images off the sensors, but they all promise 15fps, or 30fps under careful management (according to one vendor, USB2 cannot transfer the data fast enough to do 640x480 = 300K color pixels at 30fps, but USB3 can).

The PointGrey (now a division of FLIR) Firefly (320x240) and Chameleon (640x480) cameras both work with their FlyCapture2 API and driver software on Windows10 (and also on Linux). The API is defined for C/C++/C# but not Java. I wrote a Java wrapper class to encapsulate the API calls necessary to start the camera and capture frames at 15fps (or 30fps, if you can handle the data rate). This has been tested and works reasonably well at 15fps on a 2.4GHz Win10 computer with ample time for processing 320x240 images in Java. The (wrapper class + DLL + test code) you can download the zip file here. If you (meaning your browser) know the secret password, it's also available on GitHub.

Both cameras have encoding firmware for a variety of popular image compression formats, but my wrapper class delivers the data in the native raw (unprocessed) Bayer8 encoding, where each color pixel must be extracted from four (non-contiguous) sensor data bytes. There is example code included with the wrapper class code. You can also look at their API information directly, to better understand the wrapper class.
 

Test Video

There is a brief segment of a pedestrian walking past my house included with the zip download; the whole take is available here, but it's kind of boring. I adapted a segment from the Tesla video, hand-painted the shirt of one of the pedestrians a bright blue and converted it to my By8 format, which you can download here. We really need some better test clips, with actors dressed in bright colors, walking across the street in front of a moving car with the camera attached to the windshield. Hopefully we can get some of you (or your families) involved [click here needed].

Any questions or comments? This is your project.

Tom Pittman

Next time: Software Components

Rev. 2017 April 20