Thursday, July 1, 2010

Arrays and pointers Questions

1. I had the definition char x[6] in one source file, and in another I declared extern char *x. Why didn't it work?

A: The declaration extern char *x simply does not match the actual definition. The type "pointer-to-type-T" is not the same as "array-of-type-T." Use extern char x[].

References: CT&P Sec. 3.3 pp. 33-4, Sec. 4.5 pp. 64-5.

2. But I heard that char x[] was identical to char *x.

A: Not at all. (What you heard has to do with formal parameters to functions; see question 19.) Arrays are not pointers. The array declaration "char a[6];" requests that space for six characters be set aside, to be known by the name "a." That is, there is a location named "a" at which six characters can sit. The pointer declaration "char *p;" on the other hand, requests a place which
holds a pointer. The pointer is to be known by the name "p," and can point to any char (or contiguous array of chars) anywhere.

As usual, a picture is worth a thousand words. The statements
char a[] = "hello";
char *p = "world";

would result in data structures which could be represented like this:
+---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
+---+---+---+---+---+---+

+-----+ +---+---+---+---+---+---+
p: | *======> | w | o | r | l | d |\0 |
+-----+ +---+---+---+---+---+---+

3. You mean that a reference like x[3] generates different code depending on whether x is an array or a pointer?

A: Precisely. Referring back to the sample declarations in the previous question, when the compiler sees the expression a[3], it emits code to start at the location "a," move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location "p," fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to. In the example above, both a[3] and p[3] happen to be the character 'l', but the compiler gets there differently. (See also question 98.)

4. So what is meant by the "equivalence of pointers and arrays" in C?

A: Much of the confusion surrounding pointers in C can be traced to a misunderstanding of this statement. Saying that arrays and pointers are "equivalent" does not by any means imply that they are interchangeable.

"Equivalence" refers to the following key definition:
An lvalue of type array-of-T which appears in an expression decays (with three exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-to-T.

(The exceptions are when the array is the operand of the sizeof() operator or of the & operator, or is a literal string initializer for a character array.)

As a consequence of this definition, there is not really any difference in the behavior of the "array subscripting" operator [] as it applies to arrays and pointers. In an expression of the form a[i], the array reference "a" decays into a pointer, following the rule above, and is then subscripted exactly as would be a pointer variable in the expression p[i]. In either case, the expression
x[i] (where x is an array or a pointer) is, by definition, exactly equivalent to *((x)+(i)).
References: K&R I Sec. 5.3 pp. 93-6; K&R II Sec. 5.3 p. 99; H&S Sec. 5.4.1 p. 93; ANSI Sec. 3.3.2.1, Sec. 3.3.6 .

5. Then why are array and pointer declarations interchangeable as function formal parameters?

A: Since arrays decay immediately into pointers, an array is never actually passed to a function. Therefore, any parameter declarations which "look like" arrays, e.g.
f(a)
char a[];

are treated by the compiler as if they were pointers, since that is what the function will receive if an array is passed:
f(a)
char *a;

Th1is conversion holds only within function formal parameter declarations, nowhere else. If this conversion bothers you, avoid it; many people have concluded that the confusion it causes outweighs the small advantage of having the declaration "look like" the call and/or the uses within the function.

References: K&R I Sec. 5.3 p. 95, Sec. A10.1 p. 205; K&R II Sec. 5.3 p. 100, Sec. A8.6.3 p. 218, Sec. A10.1 p. 226; H&S Sec. 5.4.3 p. 96; ANSI Sec. 3.5.4.3, Sec. 3.7.1, CT&P Sec. 3.3 pp. 33-4.

6. Someone explained to me that arrays were really just constant pointers.

A: That person did you a disservice. An array name is "constant" in that it cannot be assigned to, but an array is _not_ a pointer, as the discussion and pictures in question 16 should make clear.

7. I came across some "joke" code containing the "expression" 5["abcdef"] . How can this be legal C?

A: Yes, Virginia, array subscripting is commutative in C. This curious fact follows from the pointer definition of array subscripting, namely that a[e] is exactly equivalent to *((a)+(e)), for _any_ expression e and primary expression a, as long as one of them is a pointer expression and one is integral. This unsuspected commutativity is often mentioned in C texts as if it were something
to be proud of, but it finds no useful application outside of the Obfuscated C Contest (see question 95).

8. My compiler complained when I passed a two-dimensional array to a routine expecting a pointer to a pointer.

A: The rule by which arrays decay into pointers is not applied recursively. An array of arrays (i.e. a two-dimensional array in C) decays into a pointer to an array, not a pointer to a pointer. Pointers to arrays can be confusing, and must be treated carefully. (The confusion is heightened by the existence of incorrect compilers, including some versions of pcc and pcc-derived lint's, which improperly accept assignments of multi-dimensional arrays to multi-level pointers.) If you are passing a two-dimensional array to a function:
int array[YSIZE][XSIZE];
f(array);

the function's declaration should match:
f(int a[][XSIZE]) {...}

or
f(int (*ap)[XSIZE]) {...} /* ap is a pointer to an array */

In the first declaration, the compiler performs the usual implicit parameter rewriting of "array of array" to "pointer to array;" in the second form the pointer declaration is explicit. Since the called function does not allocate space for the array, it does not need to know the overall size, so the number of "rows," YSIZE, can be omitted. The "shape" of the array is still important, so the "column" dimension XSIZE (and, for 3- or more dimensional arrays, the intervening ones) must be included.

If a function is already declared as accepting a pointer to a pointer, it is probably incorrect to pass a two-dimensional array directly to it.

9. How do I declare a pointer to an array?

A: Usually, you don't want to. Consider using a pointer to one of the array's elements instead. Arrays of type T decay into pointers to type T, which is convenient; subscripting or incrementing the resultant pointer access the individual members of the array. True pointers to arrays, when subscripted or incremented, step over entire arrays, and are generally only useful when operating on multidimensional arrays, if at all. (See question 22 above.) When people speak casually of a pointer to an array, they usually mean a pointer to its first element.

If you really need to declare a pointer to an entire array, use something like "int (*ap)[N];" where N is the size of the array. (See also question 66.) If the size of the array is unknown, N can be omitted, but the resulting type, "pointer to array of unknown size," is useless.

10. How can I dynamically allocate a multidimensional array?

A: It is usually best to allocate an array of pointers, and then initialize each pointer to a dynamically allocated "row." The resulting "ragged" array can save space, although it is not necessarily contiguous in memory as a real array would be. Here is a two-dimensional example:
int **array = (int **)malloc(nrows * sizeof(int *));
for(i = 0; i < nrows; i++)
array[i] = (int *)malloc(ncolumns * sizeof(int));

(In "real" code, of course, malloc should be declared correctly, and each return value checked.)

You can keep the array's contents contiguous, while making later reallocation of individual rows difficult, with a bit of explicit pointer arithmetic:
int **array = (int **)malloc(nrows * sizeof(int *));
array[0] = (int *)malloc(nrows * ncolumns * sizeof(int));
for(i = 1; i < nrows; i++)
array[i] = array[0] + i * ncolumns;

In either case, the elements of the dynamic array can be accessed with normal-looking array subscripts: array[i][j].

If the double indirection implied by the above schemes is for some reason unacceptable, you can simulate a two-dimensional array with a single, dynamically allocated one-dimensional array:
int *array = (int *)malloc(nrows * ncolumns * sizeof(int));

However, you must now perform subscript calculations manually, accessing the i,jth element with array[i * ncolumns + j]. (A macro can hide the explicit calculation, but invoking it then requires parentheses and commas, which don't look exactly like multidimensional array subscripts.)

11. I have a char * pointer that happens to point to some ints, and I want to step it over them. Why doesn't
((int *)p)++;
work?

A: In C, a cast operator does not mean "pretend these bits have a different type, and treat them accordingly;" it is a conversion operator, and by definition it yields an rvalue, which cannot be assigned to, or incremented with ++. (It was an anomaly in certain versions of pcc that expressions such as the above were ever accepted.) Say what you mean: use
p = (char *)((int *)p + 1);

No comments:

Post a Comment