Tuesday, June 16, 2009

libraw1394, libdc1394, OpenCV, and OpenGL

This summer with a bunch of visiting undergrads interning in my eye tracking lab, I thought as a side project we'd explore real-time face and eye detection. Ever since I visited UBC to attend a PhD candidate's defense (I was the external reviewer), I became interested in cameras from Point Grey Research (or PGR). They're pretty well-known as good "Digital Cameras", particularly for computer vision applications. I put the term in quotes because although your run-of-the-mill store-bought camera is also digital, the DC cameras differ from consumer-level electronics such as digital video recorders (DVRs) or webcams. The PGR cameras are supposed to adhere to the IEEE 1394 bus specs, and thus are to some extent software-controllable. To do this in code, one has to learn a collection of new Application Program Interfaces, or APIs, namely APIs to libraw1394, libdc1394, OpenCV, and OpenGL. libraw1394 is the low-level library that is used to actually talk to the hardware (1394 controller card that puts commands on the bus). The higher-level library libdc1394 is what is mainly used in C/C++ code to open a connection and start video streaming. The latter relies on the former, being a kind of wrapper that abstracts away the really nitty-gritty stuff at the bus level. OpenCV is a higher-level abstraction still, allowing the programmer to apply various known computer vision algorithms to the video frames pulled off the camera. (Technically speaking, one can just use OpenCV and let it act as a wrapper for libdc1394, but libdc1394 is supposed to offer various useful techniques for camera synchronization and hardware pixel culling that may or may not be in OpenCV, so it's probably a good thing to learn anyway.) If you google for "OpenCV face detection" or "OpenCV eye detection" you'll come across code examples of how to do what's pictured above. The only hitch is that those examples all draw the resulting face and eye rectangles directly onto the image frames that they process. Coming from a graphics background, however, I of course would rather use OpenGL to draw those rectangles. Sounds easy enough, but the problem is keeping all the coordinate frames straight. Face detection, as given by the examples, uses a shrunken image to do its work (for speed). This implies a coordinate scaling operation. Meanwhile, eye detection uses image segmentation (stipulation of a Region Of Interest, or ROI) within which to search for the eyes. This implies a coordinate translation, or offset. Meanwhile, two other transformations are needed to normalize the frame coordinates and then to scale them to the display dimensions, remembering to flip the y-coordinate to re-orient the origin to the lower-left (computer vision types think it is at the upper-left, whereas computer graphics types think it's in the lower-left). It took me a couple of days to get all this squared away, but I was finally rewarded with the code being able to locate my face and eyes—at least when you're looking at the camera, I think the algorithms used are therefore not rotationally invariant. Anyway, it turns out that face detection is fairly popular these days. The latest version of Apple's iPhoto uses it and lets you label faces in your pics. Having done so, it will go and search your library for other pics where it thinks that person may also be in. It works fairly well.

No comments: