Part F - Refinements

Character Strings

Design data collections using arrays to manage information efficiently
Stream data using standard library functions to interact with users

"The most common type of array in C is the array of characters"
(Kernighan and Ritchie, 1988)

Definition | String Handling | Formatted String Input | String Output | Exercises



Although some original programming languages focused on processing numerical information, most languages include extensive features for processing textual data.  Textual data involves sets of characters and these sets are often referred to as character strings.  The C language libraries provide facilities for processing character strings, treated as arrays of characters with a special delimiter. 

This chapter introduces these C-style strings, highlights their distinguishing feature and notes the advantage of using strings to pass textual data from one function to another.  This chapter include the conversion specifiers for input and output of character strings.


Definition (Review)

A string is a char array with a special property: a terminator element follows the last meaningful character in the string.  We refer to this terminator as the null terminator and identify it by the escape sequence '\0'

char
                     \0

The null terminator has the value 0 on any host platform (in its collating sequence).  All of its bits 0's.  The null terminator occupies the first position in the ASCII and EBCDIC collating sequences. 

The index identifying the null terminator element is the same as the number of meaningful characters in the string. 

char
name
01234567 89101112131415 1617
My  name  is  Arnold \0

The number of memory locations occupied by a string is one more than the number of meaningful characters in the string. 

Allocating Memory

We allocate memory for a string in the same way that we allocate memory for an array.  Since the null terminator is one of the elements in the array, we allocate memory for one character more than the number of meaningful characters. 

For example, to allocate memory for a string with up to 30 meaningful characters, we write

 char name[31]; // 30 chars plus 1 char for the terminator 

Initializing Memory

To initialize a string at the time of memory allocation, we follow the definition with the assignment operator and the set of initial characters enclosed in braces. 

 const char name[31] = {'M','y',' ','n','a','m','e',' ','i','s',' ', 
                      'A','r','n','o','l','d','\0'};

For a more compact form we enclosed the list of meaningful characters in double quotes. 

 const char name[31] = "My name is Arnold";

The C compiler copies the characters in the string literal into the character string and appends the null-byte terminator after the last character copied. 

char
name
01234567 89101112131415 1617181920212223 24252627282930
My  name  is  Arnold \0                         

Since the number of initializers (18) is less than the number of elements (31) available, the compiler fills the uninitialized elements with 0's. 


String Handling

Arrays of numbers require a separate variable to hold the number of elements that are filled.  Unlike arrays of numbers, character strings do not require a separate variable for sizing.  In iterations on the characters in a string, we check for the presence of the null terminator in our test conditions. 

Iterations

The following program displays the string stored in name[31] character by character

 // Iterations on Strings
 // string_iterations.c

 #include <stdio.h>

 int main(void)
 {
         int i;
         const char name[31] = "My name is Arnold"; 

         for (i = 0; name[i] != '\0'; i++)
                 printf("%c", name[i]);
         putchar('\n');

         return 0;
 }











 My name is Arnold




Functions

Using a character string instead of an array of characters with a separate sizing variable achieves a more compact argument list for function calls.  For example,

 // Strings To Functions
 // string_to_function.c

 #include <stdio.h>
 void print(const char name[]);

 int main(void)
 {
         int i;
         const char name[31] = "My name is Arnold"; 

         print(name);
         return 0;
 }

 void print(const char name[])
 {
         int i;

         for (i = 0; name[i] != '\0'; i++)
                 printf("%c", name[i]);
         putchar('\n');
 }




















 My name is Arnold



Formatted String Input

The scanf() and fscanf() library functions support conversion specifiers particularly designed for character string input.  These specifiers are:

  • %s - whitespace delimited set
  • %[] - rule delimited set

The corresponding argument for these specifiers is the address of the string to be populated from the input stream. 

%s

The %s conversion specifier

  • reads all characters until the first whitespace character
  • stores the characters read in the char array identified by the corresponding argument
  • stores the null terminator in the char array after accepting the last character
  • leaves the delimiting whitespace character and any subsequent characters in the input buffer

For example,

         char name[31];
         scanf("%s", name);

 My name is Arnold

stops accepting input after the character y and stores



char
name[31]
01234567 89101112131415 1617181920212223 24252627282930
My\0                                                       

The characters ' name is Arnold' remain in the input buffer.

A qualifier on the conversion specifier limits the number of characters accepted.  For instance, %10s reads no more than 10 characters

         char name[31];
         scanf("%10s", name);

 Schwartzenegger

stopping after the character n and storing

char
name[31]
01234567 89101112131415 1617181920212223 24252627282930
Sc hwartzen \0                                       

By specifying the maximum number of characters to be read at less than 31, we ensure that scanf() does not exceed the memory allocated for the string.

%s discards all leading whitespace characters.

For example,

         char name[31];
         scanf("%10s", name);

             Schwartzenegger

stops at n and stores

char
name[31]
01234567 89101112131415 1617181920212223 24252627282930
Sc hwartzen \0                                       

Because %s discards leading whitespace, it cannot accept an empty string; that is, %s does not treat a '\n' in an otherwise empty input buffer as an empty string.  If the buffer only contains '\n', scanf("%10s", name) discards the '\n' and waits for non-whitespace input followed by another '\n'.

%[ ]

The %[] conversion specifier accepts input consisting only of a set of pre-selected characters.  The brackets contain the admissible and/or inadmissible characters.  The symbol ^ prefaces the list of inadmissible characters.  The symbol - identifies a range of characters in an inclusive set. 

For example, the %[^\n] conversion specifier

  • reads all characters until the newline ('\n')
  • stores the characters read in the char array identified by the corresponding argument
  • stores the null terminator in the char array after accepting the last character
  • leaves the delimiting character (here, '\n') in the input buffer

For example,

         char name[31];
         scanf("%[^\n]", name);

 My name is Arnold

accepts the full line and stores

char
name[31]
01234567 89101112131415 1617181920212223 24252627282930
My   name  is   Arnold \0                         

A qualifier on this conversion specifier before the opening bracket limits the number of characters accepted.  For instance, %10[^\n] reads no more than 10 characters. 

         char name[31];
         scanf("%10[^\n]", name);

 My name is Arnold

stores

char
name[31]
01234567 89101112131415 1617181920212223 24252627282930
My   name  is \0                                       

We specify the maximum number of characters as the qualifier to ensure that scanf() does not store more characters than room allows.

%[ ], like %s, ignores any leading whitespace characters.

For example,

         char name[31];
         scanf("%10[^\n]", name);

             My name is Arnold

stores

char
name[31]
01234567 89101112131415 1617181920212223 24252627282930
My   name  is \0                                       

Caution

Because %[ ] ignores leading whitespace, it cannot accept an empty string; that is, %[^\n] does not treat a '\n' in an otherwise empty input buffer as an empty string.  If the input buffer only contains '\n', scanf("%[^\n]", name), unlike %s, returns 0 and leaves name unchanged.

Example

Consider a text file named spring.dat that contains

 Light Jacket
 Long-Sleeved Shirts
 Large Skateboards

The following program reads and displays this data

 // Reading from a file
 // readFromFile.c

 #include <stdio.h>

 int main(void)
 {
         FILE *fp = NULL;
         char phrase[61];

         fp = fopen("spring.dat","r");
         if (fp != NULL) {
                 while (fscanf(fp, "%60[^\n]%*c", 
                  phrase) != EOF)
                         printf("%s\n", phrase);
                 fclose(fp);
         } else {
                 printf("Failed to open file\n"); 
         }
         return 0;
 }













 Light Jacket
 Long-Sleeved Shirts
 Large Skateboards






String Output

Formatted Output

The printf() and fprintf() library functions support the %s conversion specifier for character string output.  The corresponding argument is the address of the character string or string literal.  Under this specifier printf() displays all of the characters from the address provided up to but excluding the null terminator.  For example,

 // Displaying Strings
 // displayStrings.c

 #include <stdio.h>

 int main(void)
 {
         const char name[31] = "My name is Arnold"; 

         printf("%s\n", name);

         return 0;
 }









 My name is Arnold



 // Writing to a File
 // writeToFile.c

 #include <stdio.h>

 int main(void)
 {
         FILE *fp = NULL;
         const char phrase[] = "My name is Arnold";

         fp = fopen("alpha.txt","w");
         if (fp != NULL) {
                 fprintf(fp, "%s\n", phrase);
                 fclose(fp);
         } else {
                 printf("Failed to open file\n"); 
         }
         return 0;
 }

Qualifiers

Qualifiers on the %s specifier add detail control:

  • %20s displays a string right-justified in a field of 20
  • %-20s displays a string left-justified in a field of 20
  • %20.10s displays the first 10 characters of a string right-justified in a field of 20
  • %-20.10s displays the first 10 characters of a string left-justified in a field of 20

Unformatted Output

The puts() and fputs() library functions output a character string to the standard or specified output device respectively. 

puts

The prototype for puts() is

 int puts(const char *);

The parameter receives the address of the character string to be displayed.  For example,

 // Displaying Lines
 // puts.c

 #include <stdio.h>

 int main(void)
 {
         const char name[31] = "My name is Arnold"; 

         puts(name);

         return 0;
 }









 My name is Arnold



fputs

fputs() writes a null-terminated string to a file.  The prototype for fputs() is:

 int fputs(const char *str, FILE *fp);

str receives the address of the string to be written and fp receives the address of the FILE object.  fputs() returns a non-negative value if successful; EOF in the event of an error.


Exercises




   Printer Friendly Version of this Page print this page     Top  Go Back to the Top of this Page
Previous Reading  Previous: Records and Fields Next: String Library   Next Reading


  Designed by Chris Szalwinski   Copying From This Site   

Creative Commons License