Monday, February 21, 2011

Let there be light!

I had a couple of blog posts in my head that I wanted to put up here, one just from just a couple of days ago, but I don't have the pics handy...instead I'm sitting here waiting for my photon mapper to finish, so I thought I'd write a bit about that...kind of a technical post. I'll try to add a brief pictorial progression so you can see what I'm talking about. At left is a basic ray-traced image that we started with this semester (in a class I'm teaching). It's fairly simple in that we have two spheres, one (the blue one, well it's sort of blue, maybe more like chrome) more reflective than the other, which is more diffuse. There are three lights in the scene but they only show up when reflected (point light sources). The ray tracer demonstrates several concepts: object-oriented hierarchy (spheres and planes are objects), recursion (the rays shot into the scene are reflected recursively), list processing (all objects are on a list), and basic file input/output. Fairly simple, an image like that takes about 3 seconds to render.

The next image in the progression would have been the same as the first, as we explored parallelism. With chip makers now producing multi-core chipsets, one may wonder how to take advantage of the multiple cores or CPUs on the chip? The ray tracer is very well suited to this because each pixel is processed the same way. If we had as many CPU cores as pixels, we could assign each on a one-to-one mapping. My desktop machine in my office has 8 cores, so a simple speedup is to let each of the 8 cores process each of the h/8 rows of pixels. The trick here is to make sure to avoid race conditions, that is, don't let any more than one core write to a piece of shared memory. The ray tracer, in its original conception, had this problem, so this turned out to be a nice exercise, complete with garbage images if done incorrectly. The solution called for each ray to maintain its own state info, which makes perfect sense thinking in parallel. Once that's done, multi-core parallelism is pretty easy, requiring basically one line of #pragma compiler directive to use OpenMP and voila! An almost k-factor speedup for k cores available. Coincidentally, the solution also leads in to the next step of the ray tracer evolution, and that is getting transmission to work right, like you see at right: we not only have reflective objects but transmissive (transparent) ones now.

Once we have the notion of independent rays (in terms of memory access anyway), then it's not a huge conceptual leap to think of photons instead of rays. These are shot from the light sources within the scene in a stochastic (random) sort of way. They reflect or transmit from/through objects just like rays, except that there's a finite number of photons—each makes its way through the scene unlike rays which recursively spawn new rays at each intersection point. Based on random conditions, photons eventually stick to surfaces, like shown at left. One of the goals of this type of photon mapping is to be able to render caustics, or focused concentration of photons, more or less. When rendering, what's important is the number of photons per unit area (why not volume?), that is, their density is what we're after. (Note: for those of you observant enough, you'll see that the photon map I have here doesn't match the other images—you're correct; this photon map, with only one light source, is what I used for debugging. It was clearer using one light that the caustic was not showing up opposite to where the light source was—turns out I was calculating distance incorrectly, d'oh!)

Sum up the photons' "flux" per ray intersection point, divide by their squared radius, and presto! We have caustics. Photon mapping also demonstrates a key aspect of careful program design: at each intersection point one has to find a number of the closest photons. The image at right shows 20 photons sampled at each intersection point from 2,000 initially shot out (fairly small numbers all told). With these numbers the image is rendered in about a minute. Increasing those numbers by an order of magnitude to 100 samples from 10,000 photons initially shot yields about a 9-minute render time. Meanwhile, increasing yet again to 500,000 photons and 500 samples takes...I don't yet know, still waiting...on the order of hours I expect. Ding! Just done "baking": 188.6 minutes, yup 3.14 (pi?) hours. The key aspect of program design is this search for closest photons—the program uses a kd-tree to find the k-closest photons in O(log n) time. It has to do this for every intersection point, of which there is a very large number. If no kd-tree was used, then the search would take O(n) every time, and I suspect it would have taken a lot longer to complete, perhaps days. So was all that extra number-crunching time worth it? Below are two images (10,000 photons on the left 500,000 photons on the right) that match the photon map above. See the difference? One could argue that the caustic boundaries and the caustic itself are a bit crisper in the image at right, but are they worth three hours?