Arrays of Structures
Consider writing a program to count the occurrences of each C keyword. We need an array of character strings to hold the names, and an array of integers for the counts. One possibility is to use two parallel arrays, keyword and keycount, as in
char *keyword[NKEYS];
int keycount[NKEYS];
But the very fact that the arrays are parallel suggests a different organization, an array of structures. Each keyword is a pair:
char *word;
int count;
and there is an array of pairs. The structure declaration:
struct key {
char *word;
int count;
} keytab[NKEYS];
declares a structure type key
, defines an array keytab
of structures of this type, and sets aside storage for them. Each element of the array is a structure. This could also be written:
struct key {
char *word;
int count;
};
struct key keytab[NKEYS];
Since the structure keytab
contains a constant set of names, it is easiest to make it an external variable and initialize it once and for all when it is defined. The structure initialization is analogous to earlier ones - the definition is followed by a list of initializers enclosed in braces:
struct key {
char *word;
int count;
} keytab[] = {
"auto", 0,
"break", 0,
"case", 0,
"char", 0,
"const", 0,
"continue", 0,
"default", 0,
/* ... */
"unsigned", 0,
"void", 0,
"volatile", 0,
"while", 0
};
The initializers are listed in pairs corresponding to the structure members. It would be more precise to enclose the initializers for each "row" or structure in braces, as in:
{ "auto", 0 },
{ "break", 0 },
{ "case", 0 },
...
but inner braces are not necessary when the initializers are simple variables or character strings, and when all are present. As usual, the number of entries in the array keytab
will be computed if the initializers are present and the []
is left empty.
The keyword counting program begins with the definition of keytab
. The main
routine reads the input by repeatedly calling a function getword
that fetches one word at a time. Each word is looked up in keytab
with a version of the binary search function that we wrote in Chapter 3. The list of keywords must be sorted in increasing order in the table.
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#define MAXWORD 100
int getword(char *, int);
int binsearch(char *, struct key *, int);
/* count C keywords */
main() {
int n;
char word[MAXWORD];
while (getword(word, MAXWORD) != EOF)
if (isalpha(word[0]))
if ((n = binsearch(word, keytab, NKEYS)) >= 0)
keytab[n].count++;
for (n = 0; n < NKEYS; n++)
if (keytab[n].count > 0)
printf("%4d %s\n", keytab[n].count, keytab[n].word);
return 0;
}
/* binsearch: find word in tab[0]...tab[n-1] */
int binsearch(char *word, struct key tab[], int n) {
int cond;
int low, high, mid;
low = 0;
high = n - 1;
while (low <= high) {
mid = (low + high) / 2;
if ((cond = strcmp(word, tab[mid].word)) < 0)
high = mid - 1;
else if (cond > 0)
low = mid + 1;
else
return mid;
}
return -1;
}
We will show the function getword
in a moment; for now it suffices to say that each call to getword
finds a word, which is copied into the array named as its first argument.
The quantity NKEYS
is the number of keywords in keytab
. Although we could count this by hand, it's a lot easier and safer to do it by machine, especially if the list is subject to change. One possibility would be to terminate the list of initializers with a null pointer, then loop along keytab
until the end is found.
But this is more than is needed, since the size of the array is completely determined at compile time. The size of the array is the size of one entry times the number of entries, so the number of entries is just:
*size of* keytab / *size of* struct key
C provides a compile-time unary operator called sizeof
that can be used to compute the size of any object. The expressions:
sizeof object
and
sizeof(type name)
yield an integer equal to the size of the specified object or type in bytes. (Strictly, sizeof
produces an unsigned integer value whose type, size_t
, is defined in the header <stddef.h>
.) An object can be a variable or array or structure. A type name can be the name of a basic type like int or double, or a derived type like a structure or a pointer.
In our case, the number of keywords is the size of the array divided by the size of one element. This computation is used in a #define
statement to set the value of NKEYS
:
#define NKEYS (sizeof keytab / sizeof(struct key))
Another way to write this is to divide the array size by the size of a specific element:
#define NKEYS (sizeof keytab / sizeof(keytab[0]))
This has the advantage that it does not need to be changed if the type changes.
A sizeof
can not be used in a #if
line, because the preprocessor does not parse type names. But the expression in the #define
is not evaluated by the preprocessor, so the code here is legal.
Now for the function getword
. We have written a more general getword
than is necessary for this program, but it is not complicated. getword
fetches the next "word" from the input, where a word is either a string of letters and digits beginning with a letter, or a single nonwhite space character. The function value is the first character of the word, or EOF for end of file, or the character itself if it is not alphabetic.
/* getword: get next word or character from input */
int getword(char *word, int lim)
{
int c, getch(void);
void ungetch(int);
char *w = word;
while (isspace(c = getch()))
;
if (c != EOF)
*w++ = c;
if (!isalpha(c)) {
*w = '\0';
return c;
}
for ( ; --lim > 0; w++)
if (!isalnum(*w = getch())) {
ungetch(*w);
break;
}
*w = '\0';
return word[0];
}
getword
uses the getch
and ungetch
that we wrote in Chapter 4. When the collection of an alphanumeric token stops, getword
has gone one character too far. The call to ungetch
pushes that character back on the input for the next call. getword
also uses isspace
to skip whitespace, isalpha
to identify letters, and isalnum
to identify letters and digits; all are from the standard header <ctype.h>
.
Exercise 6-1. Our version of getword
does not properly handle underscores, string constants, comments, or preprocessor control lines. Write a better version.