pointers to pointers in c why and how and some suggestions for some good software development...

Pointers to Pointers in C

Why and How

And some suggestions for some good software development practices along the way

This presentation makes extensive use of PowerPoint’s animation capabilities. Many of the slides are incomprehensible (impossibly cluttered) unless the dynamic elements appear and disappear as scripted in the animation. If you are reading these words, you’re not in PowerPoint’s slideshow mode, which is necessary to see the animation. Hit function key F5 to enter the slideshow (animation) mode.

MSJ-2Copyright 2007 by M.S. Jaffe. All rights reserved

The Dreaded Double Asterisk in C(Not So Dreadful, Actually ;-)

First, a review of how and why pointers are used in C to create functions that produce side-effects

Then a discussion and example of why that can lead to declarations with two asterisks in them, like

struct stack_item **ptrToTheTopPtr;


Writing Functions That Produce Side Effects

One of the features of C that requires a little getting used to is that changing the value stored in a local variable inside a function has absolutely no effect on anything outside the function

Even if the local variable or parameter has the same name as another variable defined outside the function which is a baaad programming practice! there is absolutely no relationship between them since they have different scope

But sometime we may want one function to change the value stored in a variable defined in another function (e.g., scanf)

That’s called “creating a side effect” (I’ll explain why later) and it requires using pointer variables


Review of Passing Arguments to Functions in C

Remember: the only way C passes argument values into functions is by copying (formally known as pass-by-value)

int funct1(int param1){

/* function body */ }

int main(){ int x; while (…) x = funct1( ) ...}

36*2

param1

... the expression for each argument in the function call is then evaluated ...

72

... this temporary integer cell gets created/allocated

So each time this function call (or “invocation” or “activation”) of funct1 is executed ...

... and the resultant value is copied into the cell for the corresponding function parameter

After the function’s execution has completed (an explicit return statement is executed or when the execution flow hits the ‘}’ at the end of the function definition) …

… and all storage local to the function is destroyed (unless the programmer has explicitly declared that it is to be static storage and hence not to be destroyed)

When it compiles a function definition, the compiler generates code that causes each separate execution of the function to request new cells for the temporary storage of the function’s parameters before the statements in the body of the definition start executing

After all arguments have been evaluated and the resultant values copied into the corresponding parameter cells the value of the first argument (if any) into the cell for the first parameter, the value of the second argument into the cell for the second parameter, and so on then the statements inside the body of the function get executed

… the returned value is substituted for the function call (the function’s name* followed by the argument list in parentheses)

_____________________________________

* Which is why in C a function is said to “return its value through its name”


Passing Pointer Values (Addresses) as Arguments to a Function

The compiler’s mechanism for passing a pointer value into a function is exactly the same as for any other type of value: When the function call is executed, a parameter cell to hold a pointer

value is created An argument gets evaluated to yield a value The resultant value (an address, which we illustrate by an arrow, but

it’s really just a bit pattern like any other value) gets copied into the correct parameter cell

But what the programmer can do with a copied address/pointer value (i.e., de-reference it!) makes for some powerful new programming capabilities


The compiler’s mechanism for passing a pointer value into a function is exactly the same as for any other type of value

Passing Pointer Values (Addresses) as Arguments to a Function (cont’d)

int funct1(int *param1)

{

*param1 = 17;

}

int main(){ int x = 5; ... funct1(&x) ...

}•••

•••

param1

x

175

When this statement gets executed ...

Now when this statement gets executed ...

Whenever this function call gets executed ...

... then this argument gets evaluated ...

&x

... this cell gets created and initialized

Note that what’s stored here is in reality just a binary bit pattern that will be interpreted as an address, not as anything symbolic like ‘&x’, so even if there were a local variable named x here in funct1 (and that’s a very bad programming practice, since it makes the code potentially quite confusing) this cell would contain the address for the x back in the main function

...into this cell...... this (integer pointer type)

cell gets created …

This address (pointer) value is then copied ...

... and its value is an address, or pointer value, pointing to this cell ...

... so this cell now contains the address of (points to) the cell named ‘x’ back in the main function

.. the value in main’s cell named x is changed; so funct1 has produced a side-effect


Summary of Values and Side Effects

It’s called a side-effect because the original “purist” view of functions came from mathematics, where sin(x) is just a mathematical expression designating a value, not a programming entity that might actually do something

A “pure” function in C just computes and returns a value but doesn't affect anything outside of itself during its computation (i.e., it has no side effects)

Side effects in the calling function occur only if you specifically program for them using pointers — but the mechanism for passing pointer/address values as arguments to a function is the same as for passing any other kind of value


Summary of Programming for Side Effects

To make a function able to change the value stored in a variable declared in a function that calls it, the calling function must supply the address of the variable it wants changed as an argument to the function call

The called function must:

1. Have a pointer of the correct pointer type as the parameter receiving a copy of the address whose content it's supposed to change

2. Dereference that parameter to make the change back in the calling function

int funct1(int param1){ param1 = ... ; }

int main(){ int x; ... funct1(&x) ...}

*

*


Summary of Values and Side Effects (cont'd)

Don’t be confused by the jargon: sometimes a side-effect is the whole point of the function, e.g., scanf, which we want to read some keyboard input, convert the ASCII codes into some other value (e.g., a floating point value) according to the format descriptor, and store the resultant value not in a variable local to the scanf function but in a variable declared in the function that called it; which is, by definition, a side-effect, even though it’s the whole point of the scanf function in the first place

“Side effect” is software technical jargon here, not a medical or military damage assessment judgment

Remember: the purpose of a function can be to compute and return a value or produce side effects, or both


Pointers to Pointers

Suppose we want a function to change the value of a pointer variable declared in a calling function? For example, let’s examine the code for a function to push an integer onto a stack of integers implemented as a linked list

Let’s assume the following structure has been globally* defined:typedef struct stack_element{ int an_integer; struct stack_element *next_in_stack;} STACK_ITEM;

* It has to be global so that different functions can use it to declare local variables of the same (user-defined) type. Global definitions are fine; it’s global variables whose usage is discouraged – in SE300 we’ll talk about why

• Let’s assume there’s a function that creates the stack and any new records (structures) for pushing onto the stack but calls separate functions to actually do the pushing or popping operations

Visually, we’ll portray a structure of this type like so

• Note: I’m putting the structure definition in a typedef to improve readability

• That’s the reason (and an important one) that we use typedef’s in the real world, too

• But anywhere we see STACK_ITEM, we could equally well write struct stack_element, they are completely interchangeable; although it would seem stylistically improper to use both in the same program

The integer component (that we’ll totally ignore in this example)

The pointer component

Note: The use of ALL CAPS for the alias provided by a user defined typedef is a widely observed stylistic custom, not a requirement of the language C itself


Pointers to Pointers (cont’d)

The “creating” function can create an initially empty stack just by declaring a NULL pointer to it: STACK_ITEM *top1 = NULL;

It will also have to have the right kind of pointer to hold the address returned by malloc STACK_ITEM *new_item;

Now whenever it wants to create a new item for the stack, it makes a call to malloc:

new_item = malloc(sizeof(STACK_ITEM));

And to push the new item, we might want to simply callpush(top1, new_item);

but that won’t actually work and we need to understand why

top1

new_item

Note: By not casting the result from the malloc, I am assuming that the program has used #include<stdlib.h> This is not quite a religious issue, but the reason is complicated and beyond our scope here; consult the web if you’re interested


void push(STACK_ITEM *the_top, \ STACK_ITEM *newguy)

{ newguy->next_in_stack = the_top; the_top = newguy;}

int main(){STACK_ITEM *top1 = NULL;STACK_ITEM *new_item;new_item = malloc(sizeof(STACK_ITEM));push( top1, new_item);}


Let’s look at some simple (but incorrect) code for the push:void push(STACK_ITEM *the_top, STACK_ITEM *newguy){newguy->next_in_stack = the_top;the_top = newguy;

}

Now let’s look at an animation similar to the previous ones we’ve seen

When this declaration is processed …

top1

And we’re now ready to try to push the new STACK_ITEM onto the stack

When this declaration is processed …

new_item

Note that although new_item itself is local to main, the cell obtained from malloc is not (take CS420 to learn more about memory management ;-)

… another pointer is created

• Note that I did not name this parameter top1, the name of the corresponding argument on the previous slide

• It is a bad habit to use the same name for a parameter as will be used for an argument provided to a function by some calling function

• C doesn’t care at all, of course, the difference between an argument and a parameter being quite clear to the compiler; but it’s all too easy to confuse yourself and any other reader of your code as to what’s going to get changed where if you re-use names improperly

… an empty stack is created

Note:

•From the standpoint of C, these two cells are the same type of cell (STACK_ITEM *)

•The reason I said top1 = NULL created an empty stack but new_item is just a pointer is that our code intends to treat top1 as pointing to a stack whereas new_item will not be manipulated that way

•But C doesn’t know anything about stacks; to it, top1 and new_item are the same sort of thing – pointers to the same type of object; it is our manipulating them differently which makes top1 a pointer to the top of our stack whereas new_item to us is just an ordinary pointer temporarily pointing to the thing we get from malloc until we can push it onto the stack

new_item is promptly set to point to the address returned by malloc





newguynew_itemtop1


the_top

… the local cells for its parameters are created

Then the second argument is evaluated and the resultant value is copied into the cell for the second parameter … and the_top point to the

new STACK_ITEM (pointed to by newguy)

… and the real pointer to the top of the stack has not been updated to point to the newest STACK_ITEM

The first argument is then evaluated and the resultant value is copied into the cell for the first parameter

When push is called …• Remember: Although the arrow copied just now

looks different from the original, that visual difference is just an artifact of the picture, since the copied arrow now originates from a different place (newguy vice new_item)

• The reality is that the address value depicted by the arrow is what was copied and its bit pattern was copied exactly (without distortion); it points to (the arrowhead touches) the same cell as the original

After the parameters have been created and initialized, processing starts with the first statement in the function body

The new STACK_ITEM is inserted in “front” of the old top by making it point to the old top…

Here’s the stack we wanted to wind up with; the problem is that the_top is going to disappear when we exit from the push function…





new_itemtop1

The Problem: We Needed a Side-Effect(and Didn’t Get One)

The problem of course is that inside push, we didn’t want to merely change our local copy of the value of the true top, top1, which is all that this statement did …

… we wanted to change top1 itself, which would mean we needed push to create a side effect





new_itemtop1

We Know How to Do That

If the calling function wants the called function (push, in this case) to change something one of the calling function’s local variables (top1, in this case), the calling function must pass the called function the address of the local variable it wants changed

&

The called function, of course, now can’t use a parameter of the same type as the one it’s supposed to change; it needs a pointer to that type …

*

*

*

Note that newguy doesn’t need to be a pointer to a pointer, since it’s not going to be used to change anything in the calling program …

Note that now even when we want merely to refer to the value of the top but not change it, we must still de-reference it as well. If we were passed a pointer to integer value, we would have to de-reference the pointer to refer to the actual integer value, would we not? Pointer/address values are no different.

… so it can de-reference its local cell which contains the address of the cell it is really supposed to change

… it just holds the local copy of the (address) value we want to set top1 to point to – the address of the new STACK_ITEM being pushed onto the stack

We don’t want to change the contents of new_item until the next time we call malloc to create a new cell


void push(STACK_ITEM **the_top, \ STACK_ITEM *newguy)

{ newguy->next_in_stack = *the_top; *the_top = newguy;}

int main(){STACK_ITEM *top1 = NULL;STACK_ITEM *new_item;new_item = malloc(sizeof(STACK_ITEM));push(&top1, new_item);}

new_itemtop1

Creating The Side Effect

Now when push is called …

newguythe_top

Then this expression is evaluated …Then this expression

is evaluated …

Then this expression is evaluated (the_top is de-referenced) …

… so this value …

… is stored in newguy->next_in_stack

Then this expression is evaluated …

… is stored in the cell pointed to by the_top

The push function is now complete …… and the resultant

address value is copied into this cell

… so this pointer …… and its local cells are

destroyed as part of the return processing

And we have successfully pushed a new STACK_ITEM onto the stack whose top element is pointed to by the pointer top1

… is copied into this cell

… and the resultant address value …

The ** in the type name was necessary to allow push to change a pointer, top1, declared back in the calling function

Note that this cell is a completely different type from the cells or components we’ve created up until now

… these local cells are created


top1

A Note About the Push Logic(Nothing to Do With Pointers)



int main(){STACK_ITEM *top1;STACK_ITEM *new_item;new_item = malloc(sizeof(STACK_ITEM));push(&top1, new_item);}

• Although it is not obvious, we got lucky this time: This push logic will work correctly for subsequent pushes as well as the first one illustrated here

• The best way to see that is to step through the logic just as we did here, drawing and updating the diagrams as you go

• If we’d not been lucky, we’d have to modify the push logic to provide ‘if’ statements to handle as many special cases as were needed


top1

Another Push




new_item

newguythe_top

The second call to the push function has successfully pushed the newest stack_item onto the stack whose top is pointed to by top1


A Note on Names




top1 should really be named something like “ptr_to_top_of_stack”, since it isn’t actually the top element in the stack; it just points to the top element

Similarly, new_item might better be named “ptr_to_new_item” …

… and newguy could be “ptr_to_newguy”

• The choice of informative variable names is particularly important in this sort of program

• The names in the example here are not in fact very good; they were chosen as much or more for brevity than accuracy, since I needed to minimize the number of characters per line or the font would have been unreadably small.

But the_top needs a more complicated name, perhaps something like “ptr_to_top_ptr”, to better highlight the importance of the fact that it is not itself a pointer to a STACK_ITEM but a pointer to a pointer that will be changed


A Good Software Development Practice

“Complex Systems That Succeed Are Invariably Found to Have Evolved From Simpler Systems That Succeeded” Unless you get better with pointers, recursive structures, and pointers

to pointers than I am, I wouldn’t recommend setting out to write complex programs from scratch with pointers to pointers

Instead, modularize your code properly – e.g., push and pop as separate functions – but initially code them without all the parameters they’ll use eventually and just declare your key variables, such as the pointer to the top of a stack, to be global (not passed as arguments) until you get the basic logic for the data structure working

Once you know your basic logic for manipulating the data structure(s) works, then modify your function definitions to accept the necessary parameters and eliminate your use of globals (and I’d do that one key parameter at a time, myself)


A Good Software Development Practice (cont’d)

Trying to debug both the basic logic of a complex data structure with complex operations and the dereferencing of pointers to pointers to structures containing pointers gives you too much to worry about – you can never be sure if your program is malfunctioning because your understanding of the data structure or its algorithms is wrong or because of mechanical difficulties with the complex syntax of C

“Simplification of concerns” or “development in small steps” or “build a little, test a little” is another good software development practice – the fewer the number of things that could go wrong in each step, the fewer will and the quicker you’ll be able to pinpoint what you were doing wrong

pointers to pointers in c why and how and some suggestions for some good software development...

Documents