Thursday, January 31, 2008

Making the tough coding decisions

When making a library of C++ classes for vector and matrix math, style still puts the art in programming. Jack gives his implementation du jour.


Lately I've been showing you how to create a set of serviceable C++ functions suitable as the basis for a full-up library of C++ classes for vector and matrix arithmetic. I started with some very simple vector functions written in the classical C (some might argue Fortran) style.

One thing has become extremely apparent: a lot of people have a lot of opinions about how to implement the ideas. I can't recall many times when I've received such a flurry of e-mails, most of them telling me that I'm doing it all wrong. All of the opinions carry weight with me, but a few came from people with huge name recognition--people whose opinions I value highly.

Why so many disparate opinions? Partly it's because personal preferences carry weight. Which brand of truck will you choose for your next big, white pickup? What's that you say? You want a black Audi sports sedan?

Sigh.

I've been thinking long and hard about the disparate points of view, and here's my conclusion: opinions vary because the decisions aren't clear cut. As small and simple as the vector functions are, there are many possible ways to do them and many pros and cons to each. What's more, the pros and cons tend to balance each other, leaving no clear winner. Which is perhaps why, after more than four decades of doing this stuff, I still tend to change my mind on alternate Thursdays and full moons. I've mentioned that I've probably written these functions hundreds of times. Why so many? Because each time I write them, I do it a little bit differently.

Speaking of which, it seems that in my last column, I wrote them one time too many. I got them wrong.

In writing and publishing these columns, I and the editorial staff try hard to get them right. Because I use a lot of math equations in this column, it's particularly susceptible to typos. Lots of people look the column over very hard before it gets to you.

In the case of last month's code, however, I'm afraid I got a little too complacent. After all (I reasoned), I've written these functions hundreds of times. They're not exactly the source code to Microsoft Vista; each function has only one to three executable lines of code. How hard can it be? What can go wrong?

Apparently, a lot. Because I didn't exercise the usual due diligence, many errors crept into the code; almost more errors than there were lines of code. Sorry about that. Believe me, I've learned my lesson. The code I'll show you this month is going to be thoroughly tested, as it usually is.

Weighty subject
Back to style decisions. The style one uses to write code depends a lot on the weight one assigns to the different "-ilities." In the past I've given you my list of attributes that I value. They are, pretty much in order:


Correctness (always #1)
Ease of use
Readability
Maintainability
Efficiency
You'll note that I almost always place efficiency way down on the list. Yes, it's important, but I try to never obfuscate or complicate the code just to gain efficiency. I've also mentioned, though, that in the case of vector and matrix math, efficiency rises higher on my radar screen than usual. That's because these are low-level routines, which tend to get used way down inside nested loops. Perhaps it's this vague thought that tends to make the choices so difficult, at least for me.

Most regular readers know that it's never my purpose to write code for you, tip the top of your head back, and pour it in. My goal is to show you the idea, give you the pros and cons, and encourage you to make your own decisions and create your own implementations. In this case, as in many others, the best I can do is to show you my implementation (my implementation of the week, that is), and explain the decisions I've made. Your mileage may vary (YMMV).

Let me give you an example. Last time, I showed you the vector add routine, which now looks like this:


// Add two vectors (c = a + b)
// Vector c can coincide with a or b
void vAdd(const double a[ ],
const double b[ ], double c[ ], int n)
{
for(int i=0; i c[i] = a[i] + b[i];
}
(Note the change in spelling.) As I explained, the purpose of the fourth parameter, n, is to allow the vectors to have different lengths. But in most of my scientific work, vectors represent real, physical quantities like position and velocity, so they tend to have length three (or sometimes even two). If every vector in my software has length three, it gets to be a bother always writing:
vAdd(a, b, c, 3);
Aside from the bother of the extra parameter, every literal parameter in a calling list is an opportunity to get it wrong. Perhaps, from force of habit dating from First Grade, I write:
vAdd(a, b, c, d);
I could spend a lot of time scratching my head over that one. For that reason, I also showed you special functions for the case n = 3.
// Add two 3-vectors (c = a + b)
// Vector c can coincide with a or b
void vAdd(const double a[ ], const double b[ ],
double c[ ])
{
for(int i=0; i<3; ++i)
c[i] = a[i] + b[i];
}
Aside from ease of use, this function has one big advantage: it can be optimized out of existence altogether. You see, any decent compiler will optimize out loops with small, fixed loop counts. In this case, it will replace the loop with:
c[0] = a[0] + b[0];
c[1] = a[1] + b[1];
c[2] = a[2] + b[2];
Having done that, it will also recognize that the indices in the three lines are now all constants, so it will optimize these out, as well. Finally, with the keyword inline, you have a much better chance of persuading the compiler to inline the code, sparing you the overhead of the function call altogether (I haven't had much luck getting Microsoft Visual C++ to inline my code using the GUI-based IDE, but I also haven't tried very hard. As usual, YMMV). All of this seems to make a compelling case for providing special functions for n = 3. On the other hand, if I relegate efficiency to its usual place at the end of my list, I can get the same ease of use, plus a file of half the size, simply by using a default parameter, as in:
void vAdd(const double a[ ], const double b[ ], double c[ ], int n = 3)
With this approach, the compiler will generate the same code for any value of n, but I only have to specify n if it has a value other than three. I'm leaning in this direction this month, fully recognizing that all the optimization advantages I just cited will not be used. Also I'm fully aware that at least one of the gurus who contacted me thinks of this approach as a kludge ranking right up there with goto's. The solution based on this approach does seem to be best for the medium we're using here, where short files are preferred over long ones. I just need you to be fully aware of the tradeoffs, and you certainly won't hurt my feelings if you prefer to stick with the special-case functions like vadd3( ).

Finishing the file
Enough philosophy; let's get on with wrapping up this set of vector operations. Incredible as it may seem, we're almost done. The operators +, -, and the two multiplication operators dot and cross, pretty much exhaust the list of mathematical things one can do using two vectors. There are, however, a few things you can do with one vector. Also some minor little items like input and output. We'll discuss these next. One of the simplest of the missing operations lets us scale a vector, multiplying it by a scalar.
// Multiply a vector by a scalar
void vScale(double a[ ], double s, int n = 3)
{
for(int i=0; i a[i] *= s;
}
The cases where s = 0 and s = "1 occur often enough to justify separate functions for them:
// Clear a vector
void vClear(double a[ ], double s, int n = 3)
{
for(int i=0; i a[i] = 0;
}

// Negate a vector
void vNegate(double a[ ], double s, int n = 3)
{
for(int i=0; i a[i] = -a[i];
}
We will probably need to be able to copy a vector.
// Copy a vector (b = a)
void vCopy(const double a[], double b[],
int n = 3)
{
for(int i=0; i b[i] = a[i];
}
One last arithmetic operator remains. This is the norm, or absolute value, of a vector. Mathematically, it's defined for a 3-vector by:


(1) Extension to higher dimensions is obvious. If you look at the definition of the dot product, you'll see that the term under the radical is the dot product of a with itself. This is another one of those functions that we can write one of several ways, depending on our choice of tradeoffs. If you want a simple and straightforward implementation (as I usually do), use the dot product function:
// Return the norm of a vector
double vNorm(const double a[ ], int n = 3)
{
return sqrt(Dot(a, a, n));
}
Note carefully that if we nest the functions this way, we mustn't forget to pass the size parameter through. Leaving off the n in that last line will give us the 3-vector version of Dot, regardless of the size of a. If this bothers you, breathe easy. What goes on inside vNorm or other nested functions shouldn't be a concern for you. If you draw a vector in three (or more) dimensions, its norm is the same as the scalar length, as measured along the vector itself. It's tempting to name the function vLength( ), but that usage might cause confusion with the number of elements in the array. In the past, I've also used vAbs( ) or even vAbsoluteValue( ). But vNorm( ) works for me. I should note that, like other vector operators, the norm tends to get used a lot. Because of this, you may decide that calling another function from inside vNorm( ) is a bit much. If that's the case, it's easy enough to implement the dot product inside.
// A more efficient version of vNorm
// Return the norm of a vector
double vNorm(const double a[ ], int n = 3)
{
double sum = 0;
for(int i=0; i sum += a[i] * a[i];
return sqrt(sum);
}
When doing vector math, we often find the need to convert a vector to a unit vector. You unitize a vector by dividing each element by the norm of the vector. Mathematically:


(2) Whenever the topic of unitizing a vector comes up, the obvious question arises: what happens if the vector has a norm of zero? Dividing by zero is not usually such a good idea. I've seen many implementations of vNorm( ), with a lot of different approaches for handling the zero case. Some implementations ignore the issue completely, which means that they crash if given a zero vector. Others bail if the norm is zero, leaving the vector unchanged. Still others zero out the whole vector, operating upon the dubious theory that:


(3) Both of these latter two solutions violate the "contract" between the function and the user. The function promises to return a vector of unit length but doesn't keep its promise for this special case. The problem is not that we can't deliver a unit vector; it's that we don't know which direction to aim it. For non-zero norms, the direction is well defined: the unitized vector should be parallel with the original vector. But if the input vector has zero length, in which direction does it point? After over 40 years of dinking around with this issue, I finally came up with a simple solution--very obvious once you see it. Since the direction of the input vector is undefined, the direction of the unit vector doesn't matter. One direction is as good as another. So, in this special case, I simply return a vector with a 1 in the first element. Here's the code:
// Convert a vector to a unit vector
// (a = a/|a|)
// If a = 0, the vector returned will have
// a one in the first element.
void vUnitize(double a[ ], int n = 3)
{
double magnitude = vNorm(a, n);
if(magnitude != 0)
vScale(a, 1/magnitude, n);
else
{
vClear(a, n);
a[0] = 1;
}
}
Wrapping up
We're almost done with the file for vector operations in the Fortran/C tradition. I only have a few more functions to add. These include input/output routines and handy functions for comparison tests such as equality, inequality, and less than.

Tuesday, January 8, 2008

Understanding User and Kernel Mode

Most operating systems have some method of displaying CPU utilization. In Windows, this is Task Manager.



CPU usage is generally represented as a simple percentage of CPU time spent on non-idle tasks. But this is a bit of a simplification. In any modern operating system, the CPU is actually spending time in two very distinct modes:


Kernel Mode
In Kernel mode, the executing code has complete and unrestricted access to the underlying hardware. It can execute any CPU instruction and reference any memory address. Kernel mode is generally reserved for the lowest-level, most trusted functions of the operating system. Crashes in kernel mode are catastrophic; they will halt the entire PC.


User Mode
In User mode, the executing code has no ability to directly access hardware or reference memory. Code running in user mode must delegate to system APIs to access hardware or memory. Due to the protection afforded by this sort of isolation, crashes in user mode are always recoverable. Most of the code running on your computer will execute in user mode.

It's possible to enable display of Kernel time in Task Manager, as I have in the above screenshot. The green line is total CPU time; the red line is Kernel time. The gap between the two is User time.

These two modes aren't mere labels; they're enforced by the CPU hardware. If code executing in User mode attempts to do something outside its purview-- like, say, accessing a privileged CPU instruction or modifying memory that it has no access to -- a trappable exception is thrown. Instead of your entire system crashing, only that particular application crashes. That's the value of User mode.

x86 CPU hardware actually provides four protection rings: 0, 1, 2, and 3. Only rings 0 (Kernel) and 3 (User) are typically used.




If we're only using two isolation rings, it's a bit unclear where device drivers should go-- the code that allows us to use our video cards, keyboards, mice, printers, and so forth. Do these drivers run in Kernel mode, for maximum performance, or do they run in User mode, for maximum stability? In Windows, at least, the answer is it depends. Device drivers can run in either user or kernel mode. Most drivers are shunted to the User side of the fence these days, with the notable exception of video card drivers, which need bare-knuckle Kernel mode performance. But even that is changing; in Windows Vista, video drivers are segmented into User and Kernel sections. Perhaps that's why gamers complain that Vista performs about 10 percent slower in games.

The exact border between these modes is still somewhat unclear. What code should run in User mode? What code should run in Kernel mode? Or maybe we'll just redefine the floor as the basement-- the rise of virtualization drove the creation of a new ring below all the others, Ring -1, which we now know as x86 hardware virtualization.

User mode is clearly a net public good, but it comes at a cost. Transitioning between User and Kernel mode is expensive. Really expensive. It's why software that throws exceptions is slow, for example. Exceptions imply kernel mode transitions. Granted, we have so much performance now that we rarely have to care about transition performance, but when you need ultimate performance, you definitely start caring about this stuff.

Probably the most public example of redrawing the user / kernel line is in webservers. Microsoft's IIS 6 moved a sizable chunk of its core functionality into Kernel mode, most notably after a particular open-source webserver leveraged Kernel mode to create a huge industry benchmark victory. It was kind of a pointless war, if you ask me, since the kernel optimizations (in both camps) only apply to static HTML content. But such is the way of all wars, benchmark or otherwise.

The CPU's strict segregation of code between User and Kernel mode is completely transparent to most of us, but it is quite literally the difference between a computer that crashes all the time and a computer that crashes catastrophically all the time. This is what we extra-crashy-code-writing programmers like to call "progress". So on behalf of all programmers everywhere, I'd like to say thanks User mode. You rock!