Part F - Refinements
Character Strings
Design data collections using arrays to manage information efficiently
Stream data using standard library functions to interact with users
"The most common type of array in C is the array of characters"
(Kernighan and Ritchie, 1988)
Definition |
String Handling |
Formatted String Input |
String Output |
Exercises
Although some original programming languages focused on processing
numerical information, most languages include extensive features for
processing textual data. Textual data involves sets of characters
and these sets are often referred to as character strings. The C
language libraries provide facilities for processing character strings,
treated as arrays of characters with a special delimiter.
This chapter introduces these C-style strings, highlights their distinguishing feature
and notes the advantage of using strings to pass textual data from
one function to another.
This chapter include the conversion specifiers for input and
output of character strings.
Definition (Review)
A string is a char array with a special property:
a terminator element follows the last meaningful character in the string.
We refer to this terminator as the null terminator
and identify it by the escape sequence '\0'.
The null terminator has the value 0 on any host platform (in its collating
sequence).
All of its bits 0's. The null terminator occupies the first position in the
ASCII and EBCDIC
collating sequences.
The index identifying the null terminator element is the same as the number of meaningful characters in
the string.
char name |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 |
M | y | | n | a | m | e | |
i | s | | A | r | n | o | l | d |
\0 |
The number of memory locations occupied by a string is one more
than the number of meaningful characters in the string.
Allocating Memory
We allocate memory for a string in the same way that we allocate memory
for an array. Since the null terminator is one of the elements in
the array, we allocate memory for one character more than the
number of meaningful characters.
For example, to allocate memory for a string with up to 30 meaningful characters,
we write
char name[31]; // 30 chars plus 1 char for the terminator |
Initializing Memory
To initialize a string at the time of memory allocation, we follow the
definition with the assignment operator and the set of initial characters
enclosed in
braces.
const char name[31] = {'M','y',' ','n','a','m','e',' ','i','s',' ',
'A','r','n','o','l','d','\0'}; |
For a more compact form we enclosed the list of meaningful characters in double quotes.
const char name[31] = "My name is Arnold"; |
The C compiler copies the characters in the string literal into the
character string and
appends the null-byte terminator after the last character copied.
char name |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
M | y | | n | a | m | e | |
i | s | | A | r | n | o | l | d |
\0 | | | | | | |
| | | | | | |
Since the number of initializers (18) is less than the number of
elements (31) available, the compiler fills the
uninitialized elements with 0's.
String Handling
Arrays of numbers require a separate variable to hold the number of elements
that are filled. Unlike arrays of numbers, character strings do not require
a separate variable for sizing. In iterations on the characters in a string,
we check for the presence of the null terminator in our test conditions.
Iterations
The following program displays the string stored in name[31]
character by character
// Iterations on Strings
// string_iterations.c
#include <stdio.h>
int main(void)
{
int i;
const char name[31] = "My name is Arnold";
for (i = 0; name[i] != '\0'; i++)
printf("%c", name[i]);
putchar('\n');
return 0;
} |
My name is Arnold
|
Functions
Using a character string instead of an array of characters
with a separate sizing variable achieves a more compact
argument list for function calls.
For example,
// Strings To Functions
// string_to_function.c
#include <stdio.h>
void print(const char name[]);
int main(void)
{
int i;
const char name[31] = "My name is Arnold";
print(name);
return 0;
}
void print(const char name[])
{
int i;
for (i = 0; name[i] != '\0'; i++)
printf("%c", name[i]);
putchar('\n');
} |
My name is Arnold
|
Formatted String Input
The scanf() and fscanf()
library functions support conversion specifiers particularly designed
for character string input.
These specifiers are:
- %s - whitespace delimited set
- %[] - rule delimited set
The corresponding argument for these specifiers is the
address of the string to be populated from the input stream.
%s
The %s conversion specifier
- reads all characters until the first whitespace character
- stores the characters read in the char
array identified by the corresponding argument
- stores the null terminator in the char
array after accepting the last character
- leaves the delimiting whitespace character and any subsequent
characters in the input buffer
For example,
char name[31];
scanf("%s", name); |
My name is Arnold |
stops accepting input after the character y
and stores
char name[31] |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
M | y | \0 | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | |
The characters ' name is Arnold' remain
in the input buffer.
A qualifier on the conversion specifier limits the number of
characters accepted. For instance, %10s
reads no more than 10 characters
char name[31];
scanf("%10s", name); |
Schwartzenegger |
stopping after the character n and storing
char name[31] |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
S | c |
h | w | a | r | t | z | e | n |
\0 | | | | | |
| | | | | | | |
| | | | | | |
By specifying the maximum number of characters to be read at less than 31,
we ensure that scanf() does not exceed
the memory allocated for the string.
%s discards all leading
whitespace characters.
For example,
char name[31];
scanf("%10s", name); |
Schwartzenegger |
stops at n and stores
char name[31] |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
S | c |
h | w | a | r | t | z | e | n |
\0 | | | | | |
| | | | | | | |
| | | | | | |
Because %s discards leading whitespace,
it cannot accept an empty string; that is, %s
does not treat a '\n' in an otherwise empty
input buffer as an empty string. If the buffer only contains
'\n', scanf("%10s", name)
discards the '\n' and waits for
non-whitespace input followed by another
'\n'.
%[ ]
The %[] conversion specifier accepts input
consisting only of a set of pre-selected characters. The brackets contain the admissible
and/or inadmissible characters. The symbol ^ prefaces
the list of inadmissible characters. The symbol - identifies
a range of characters in an inclusive set.
For example, the %[^\n] conversion specifier
- reads all characters until the newline ('\n')
- stores the characters read in the char array
identified by the corresponding argument
- stores the null terminator in the char array after
accepting the last character
- leaves the delimiting character (here, '\n')
in the input buffer
For example,
char name[31];
scanf("%[^\n]", name); |
My name is Arnold |
accepts the full line and stores
char name[31] |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
M | y |
| n | a | m | e | | i | s |
| A | r | n | o | l | d |
\0 | | | | | | |
| | | | | | |
A qualifier on this conversion specifier before the opening bracket limits the number of characters accepted.
For instance, %10[^\n] reads no more than 10 characters.
char name[31];
scanf("%10[^\n]", name); |
My name is Arnold |
stores
char name[31] |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
M | y |
| n | a | m | e | | i | s |
\0 | | | | | |
| | | | | | | |
| | | | | | |
We specify the maximum number of characters as the qualifier to ensure that
scanf() does not store more
characters than room allows.
%[ ], like %s, ignores any leading
whitespace characters.
For example,
char name[31];
scanf("%10[^\n]", name); |
My name is Arnold |
stores
char name[31] |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
M | y |
| n | a | m | e | | i | s |
\0 | | | | | |
| | | | | | | |
| | | | | | |
Caution
Because %[ ] ignores leading whitespace,
it cannot accept an empty string; that is, %[^\n]
does not treat a '\n' in an otherwise empty input buffer
as an empty string. If the input buffer only contains
'\n', scanf("%[^\n]", name),
unlike %s, returns 0 and leaves
name unchanged.
Example
Consider a text file named spring.dat that
contains
Light Jacket
Long-Sleeved Shirts
Large Skateboards
|
The following program reads and displays this data
// Reading from a file
// readFromFile.c
#include <stdio.h>
int main(void)
{
FILE *fp = NULL;
char phrase[61];
fp = fopen("spring.dat","r");
if (fp != NULL) {
while (fscanf(fp, "%60[^\n]%*c",
phrase) != EOF)
printf("%s\n", phrase);
fclose(fp);
} else {
printf("Failed to open file\n");
}
return 0;
}
|
Light Jacket
Long-Sleeved Shirts
Large Skateboards
|
String Output
Formatted Output
The printf() and fprintf()
library functions support the %s conversion
specifier for character string output.
The corresponding argument is the address of the character string or
string literal. Under this specifier printf()
displays all of the characters from the address provided up to but excluding
the null terminator. For example,
// Displaying Strings
// displayStrings.c
#include <stdio.h>
int main(void)
{
const char name[31] = "My name is Arnold";
printf("%s\n", name);
return 0;
} |
My name is Arnold
|
// Writing to a File
// writeToFile.c
#include <stdio.h>
int main(void)
{
FILE *fp = NULL;
const char phrase[] = "My name is Arnold";
fp = fopen("alpha.txt","w");
if (fp != NULL) {
fprintf(fp, "%s\n", phrase);
fclose(fp);
} else {
printf("Failed to open file\n");
}
return 0;
}
|
Qualifiers
Qualifiers on the %s specifier add detail
control:
- %20s displays a string right-justified in a field of 20
- %-20s displays a string left-justified in a field of 20
- %20.10s displays the first 10 characters of a string right-justified in a field of 20
- %-20.10s displays the first 10 characters of a string left-justified in a field of 20
Unformatted Output
The puts() and fputs()
library functions output a character string to the standard or
specified output device respectively.
puts
The prototype for puts() is
int puts(const char *);
The parameter receives the address of the character string to be
displayed.
For example,
// Displaying Lines
// puts.c
#include <stdio.h>
int main(void)
{
const char name[31] = "My name is Arnold";
puts(name);
return 0;
} |
My name is Arnold
|
fputs
fputs() writes a null-terminated string to a file.
The prototype for fputs() is:
int fputs(const char *str, FILE *fp);
str receives the address of the string
to be written and fp receives the address of
the FILE object. fputs() returns
a non-negative value if successful; EOF in the event of an error.
Exercises
- Complete the following practice problems
|