Pointer Power in C and C++, Part 2

Christopher Skelly

Christopher Skelly has been a teacher of C and C++ for the past ten years, first for Plum Hall Inc., and then for his own company, Insight Resource Inc. Insight Resource also developed the best-selling help utility, "KO-PILOT for WordPerfect," which Brit Hume called "the best add-in ever written." Chris has served on both the C and C++ ANSI committees, and was the Technical Chairman for this year's "CPlusC++" and "C++ in Action" conferences, presented by Boston University. He writes regularly for the C User's Journal and the C++ Journal, and can be reached at Insight Resource Inc., 914-631-5032, or at 71005.771@compuserve.com.

This article extends and continues the techniques presented last month in Part 1 of "Pointer Power in C and C++." At the end of this article I will repeat and then solve the pointer puzzle presented in Part 1. Just in case you don't happen to have last month's issue immediately available, here are the eight Key Facts from Part 1:

1. A pointer is a variable whose contents is an address.

2. A pointer always "knows" the type of thing it addresses. It can be properly used only to access something of the correct type.

3. Pointer values are address/type pairs, just like pointer variables. However, pointer values are not storable lvalues.

4. Every pointer has three fundamental attributes. These attributes are the location, the contents, and the indirect value of the pointer.

5. The three attributes of a pointer represent three distinct address levels. These address levels can also be called levels of indirection.

6. Pointer space is organized into a series of planes or levels. Every pointer expression can be assigned to one of these planes. The plane of a pointer expression is a measure of how much potential for indirection there is in that pointer expression.

7. The name of an array usually behaves as if the array name were a pointer value.

8. The name of an array, in almost every context, evaluates to the address of the array's own "zeroth" element.

Armed with these Key Facts, you are ready to learn the next set of techniques, a game informally called Pointer Dominos.

Pointer Dominoes

The key to mastering pointers is to learn to play Pointer Dominos. The game of pointer dominos is simply the game of using operators in expressions involving pointers. Each of the allowable operators does something very specific, and the operators are always to be played, or really evaluated, in a very precise order. If you know exactly what each operator does, and if you know how to determine the order of evaluation, you can play pointer dominos.

Key Fact #9 — Only a small number of operations are ever performed on pointers. If you know exactly what each operation does, and what the right order to apply the operations is, you can understand and create any pointer expression in C.

Each pointer has three attributes and each attribute is at a different level of indirection. You know that certain operators actually change the level of indirection of an expression using a pointer. Specifically, you have already seen that & takes you up one level and * takes us down a level when applied to a pointer in an expression. Note that in declarations the * builds in one level of indirection, as does the []. The rules of pointer dominos apply only to expressions, not to declarations.

int i = 0;   /* i is declared at level 0 */
int *p;    /* p is declared at level 1 */
p = &i;    /* &i is level 1, so is p */
x = *p;    /* * on p takes us down from level 1 to level 0 */
This leads to the first two rules of Pointer Dominos. When used in pointer expressions:

Several other operators affect the overall level of indirection of an expression. But all the operators do one of a small number of things. They take us up, down, or sometimes even sideways on the Ladder. Fortunately, each allowable operation has a precise, well-defined meaning. Only six moves appear in expressions involving pointers. These moves form our next key fact.

Key Fact #10 — The six moves of pointer dominos are:

1. Go up one level of indirection using &

2. Go down one level of indirection using *

3. Go down one level of indirection using []

4. Increase an address using + or ++

5. Decrease an address using - or --n

6. Change the type of the pointer's window on memory with a cast.

Structures and their members are not included here, but are easy to add to the fundamentals of the model.

Each move corresponds to one or two C language operators. The first move to consider is the &, the unary address-of operator. & always lifts you up one level of indirection. Generating the address of something is the equivalent of moving up to the next plane in pointer space. As I've said, this process is known as referencing, and involves a relative move up to the next higher address plane.

Two operators move you down one level of indirection. First the *, or unary indirection operator, means go down to the plane immediately below the plane you start on. If x lives on plane 5, *x is an expression that lives on plane 4. Most typically, if p lives on plane 1, *p is a non-pointer living on plane 0. This move is called dereferencing, and it is a relative move. *x is one level below x. You can't say anything about what level this actually is, until you know what level x itself resides on.

The [] operator is also a dereferencing operator. a[n] lives on the plane below a, where a is any address expression. An important principle of pointer dominos is the fact that both * and [] bring an expression down one level of indirection from what they are applied to. The formula which relates how * and [] are related is the important formula:

a[n] == *(a + n)
This master formula in C and C++ shows that a subscript is syntactically equivalent to dereferencing an offset from a pointer. The a in the formula represents any address. The address may come from a pointer, or from an array name, or from a casted expression. It doesn't matter. The subscript can always be applied to an address, just like the *, and the relation between the two forms comes from this formula.

The fourth and fifth moves involve operations of addition and substraction, including +, ++, -, and --. These operators produce no change in level of indirection. They move you sideways on the same plane, either toward higher or lower memory and always by a scaled offset. p + 1 is the address of one object higher in memory than the object p points to. p - 2 is the address of an object two objects below the object addressed by p. p, p + 1, and p - 2, however, all exist on the same plane and have the same level of indirection.

The final pointer domino move is the cast. Casting a pointer usually produces no change on the level of indirection. Casting a pointer to int to a pointer to double, for example, does not change the level of the pointer. What does change is the size and format of the "window on memory" that this pointer accesses. A pointer to char accesses one byte of integer data. A pointer to double accesses eight bytes of floating-point formatted data. There are special cases, however, where a cast does affect the level of indirection of an expression. Consider:

char *p = malloc(1000);
char **p2;
p2 = (char **)p;
Here the cast, (char **), does indeed produce a level change from level one up to level two. The real rule is this: the level of an expression with a cast is the level of indirection of the cast. Casts are the wild-cards of pointer dominos. Any expression can be cast to have some new level and type. The declaration inside the cast determines the level of the newly-casted expression.

Summarizing the rules of pointer dominos in terms of operators, you have:

This set of six rules forms the guts of the game. All you need now is one additional rule, which tells you which order to apply the moves when more than one operator are present in the same expression.

For instance, in the expression:

two operations, * (indirection) and ++ (pre-increment) are being applied to the pointer p. Which operation should you do first, the increment or the indirection? The resulting value will be very different depending on what you decide. If you did the * first, you would take the indirect value of p, and then increment that indirect value. In fact, this is just the opposite of what you are really supposed to do.

The rules of precedence state that unary operators, like both pre-increment ++ and *, group right to left. This means that the ++ binds with p before the * is even considered. You must increment the pointer and then take the indirect value. The point here is that the rules of precedence always determine the proper order of evaluation. This is our last key fact.

Key Fact #11 — If more than one operator is applied in an expression, apply the operators in order of precedence.

A quick glance at the precedence table reveals that primary operators include both [] and (), while the * and & are weaker-binding unary operators. Furthermore, primary operators group left to right and unaries group right to left. In effect, this means that you will deal with the primary [] and () first in left to right order, and then handle the unary *s or &s in right to left order.

For example, the expression

contains two operators, one primary [] and one unary *. The [] binds first followed by the *, so the interpretation is "locate the array element p[n], then dereference this element."

A more complex expression may have lots of operators to consider:

*(char *)p2[n][m]
Here both subscripts bind first, in left to right order. Then comes the cast, (char *), which changes the type of the value in p2[n][m] to be a pointer to char. Finally, the * on the left dereferences this casted pointer.

Showing each step in order:

1. p2—starting with p2

2. p2[n]—access the nth element offset from p2

3. p2[n] [m]—access the mth element offset from p2[n]

4. (char *)p2[n][m]—cast to the type pointer to char

5. *(char *)p2[n][m]—dereference the resulting char pointer

The rules are simple. Apply each pointer move in the proper order of precedence. Keep track of levels as you go. Now you are thinking just like the C compiler!

Solving the Puzzle

It's time to solve the puzzle presented at the beginning of this article. Though the puzzle has inordinately complex expressions, the rules of pointer dominos will make short work of the task. Listing 1 contains the puzzle again.

What kind of data structures are you working with in this puzzle? Figure 1 contains a picture of the data.

As you can see in Figure 1, ap is an array of pointers to chars, each pointer aimed at one of five character strings. ap is a level two object. Why? First, because ap is an array, it has intrinsically one level of indirection. But the elements of ap are all pointers, each holding their own level one address. So ap evaluates to the address of a pointer, hence ap is a level two expression.

app is similarly a level three expression. An array of level two pointers evaluates to the address of a level two pointer, hence app lives on level three. ppp is a level 3 pointer, and pppp is a level four pointer. The relationships between the pointers are illustrated in Figure 1.

Here is the first expression to be printed:

printf("%.*s", 2, *--**pppp);
Do the easy part first. The %.*s format specifier means to fill in the * with the first argument in the argument list following the format string. So you are really asking for %.2s, that is, print the first two characters of the string *--**pppp. How do you unravel *--**pppp? With the rules of pointer dominos!

Start at the identifier, pppp, and apply the operators in order of precedence. Both * and -- are unary operators so they group right to left, as follows:

1. pppp—first the identifier pppp

2. *pppp—right-most * dereference pppp

3. **pppp—second right-most * dereference *pppp

4. --**pppp—unary -- pre-decrement the result

5. *--**pppp left-most * dereference again

Reading off the quoted strings gives a comprehensible formula for solving this part of the puzzle.

To find the answer, start at the top of the diagram of the puzzle's data, at pppp, and move down two levels, following the arrows. You should be at app[0], the zeroth element in the app array of char ** pointers. app[0] holds the address of ap[4], the last element in the array of char * pointers. But now, the rules say you must apply the unary — operator to app[0]. Instead of pointing at ap[4], app[0] will now hold the address of ap[3]! This change, by the way, persists, and changes the diagram slightly from that shown in Figure 1. After decrementing app[O], you apply the final dereference or *, and arrive at the contents of ap[3]. This is what you will print with the first expression. Actually, the program prints only the first two characters of the string PORTABLE. So PO appears on the output.

What does the second expression print?

printf("%.*s", 3, *(++*pppp[0] - 4));
This complex expression again uses the %.*s mechanism to pick up the 3 as the number of chars to be printed. In effect, you will print three chars from the address given by the complex expression *(++*pppp [0] - 4)).

This one breaks down as follows:

1. pppp—start at pppp

2. pppp[0]—dereference to access [0] th element

3. *pppp[0]—dereference pppp[0]

4. ++*pppp[0]—pre-increment the result

5. ++*pppp[0] - 4—subtract 4

6. *(++*pppp[0] - 4)—dereference again

Handling each operator one step at a time gives you the solution.

The only new trick here is the translation between [] and *. Remember the all-important a[n] == *(a + n) formula and you'll zip through the steps.

Accessing the zeroth element of pppp is the same as dereferencing pppp. a[0] is always the same object as *a. So the subscript [0] and the right-most * bring you down two levels, just as before. Only now you are told to increment the result, namely app[0]. So app[0] now pops back right back to where it started in the first place, namely to point to ap[4], the string TOWER!.

Now what? Now, you have to subtract 4 from this pointer. This is where it can get tough. But remember, app is an array of char ** pointers, so app[0] acts like a pointer to a pointer to a char. Decrementing means subtracting the space for four char * pointers. So app[0] - 4 points to ap[0], the very first string in the puzzle, INTEGER. You have to print three characters from this string. So INT appears on the display right after the PO. The screen says POINT.

The third expression prints the whole of the expression:

++*--*++pppp[0] + 5
This whopper breaks down according to the precedence rules as follows:

1. pppp

2. pppp[0]

3. ++pppp[0]

4. *++pppp[0]

5. -- *++pppp[0]

6. *--*++pppp[0]

7. ++*--*++pppp[0]

8. ++*--*++pppp[0] + 5

Here's the explanation.

Again [0] means move down one level, to ppp. Now increment ppp, so ppp points to ppp[1], not ppp[0] anymore. Dereference with * means move down to app[1]. Now decrement app[1], so app[1] points to ap[2] from now on, rather than ap[3]. Dereferencing with * brings you down to ap[2]. Now increment ap[2], so the pointer to char, ap[2], points to the E rather than the D of DEBUGGER. Finally, add 5. Since you are now down at level one, the contents of ap[2], adding 5 means add the size of five chars to the pointer. Hence you are finally left pointing at the second E in DEBUGGER. Printing the resulting string, along with a space as the puzzle requires, gives us ER.

The screen now says POINTER.

Only two more. By the time you're done you'll never forget how to do this! The fourth printf statement looks like this:

printf("%.*s", 2, *++pppp[0][3] + 3);
You are asked to print two characters from the expression:

*++pppp[0][3] + 3
Here's the break down of the steps.

1. pppp

2. pppp[0]

3. pppp[0][3]

4. ++pppp[0][3]

5. *++pppp[0][3]

6. *++pppp[0][3] + 3

Now just read through this break down, supplying the interpretation for each step.

[0] again moves down to ppp. Adding a [3] to ppp means two things; go down to the next level, and offset by three elements. So while pppp[0][0] is app[1] (remember, you incremented ppp in the last step), pppp[0][3] is app[4]! Now increment this result, app[4], so app[4] points at ap[1], rather than ap[0]. Dereferencing with the * brings you down to the contents of ap[1]. Adding 3, skips the first three characters of PROPORTION so you print the PO substring from the middle of PROPORTION.

Now the screen reads POINTER PO.

Time for the last one! See if you can get this one without an explanation. If you get to WER!, you're right. Remember to consider the changes that -- and ++ created in the earlier expressions. Here's how.

printf("%s\n",  (*pppp + 2)[-2][2] + 2);
1. pppp

2. *pppp

3. *pppp + 2

4. (*pppp + 2)[-2]

5. (*pppp + 2)[-2][2]

6. (*pppp + 2)[-2][2] + 2

Dereference pppp to ppp, now still pointing at app[1]. Adding 2 creates a pointer value pointing at app[3]. What does the [-2] subscript do? First it moves you down to the level of app[3], but the -2 means two objects lower in memory, so you move down to app[1], not app[3]. app[1] is still pointing at ap[2], where the — in the third expression left app[1]. Applying the [2] subscript to this value of app[1] moves us down and over to ap[4], two pointers offset from ap[2]. The last + 2 skips over the first two chars in TOWER!, so you see the last part of the puzzle, WER! on the display.

The screen says POINTER POWER! and you had better believe it!

You may want to work the steps of this puzzle over several times, perhaps drawing some intermediate diagrams as the pointers change. If you can work this puzzle correctly, you have indeed acquired pointer power as a long-term resource for your future C and C++ programs!

Remembering to carefully distinguish the Three Attributes, following the levels on the Ladder of Indirection, and applying the rules of Pointer Dominos will get you to the solution every time. Furthermore, understanding complex pointer behavior will give you the confidence to create your own sophisticated data-handling mechanisms, when and where appropriate. The examples in the puzzle, of course, are not designed to be "good code." They are designed to show that this most powerful part of C is indeed governed by straight-forward rules which can always be applied to understand and work with complex pointer declarations and expressions in C.

Now it's on to C++, where void isn't quite so void anymore, where references are not pointers, but sometimes act like them, and where pointers to members are not even addresses in the standard C sense at all!