Part E - Secondary Storage
Text Files
Stream data using standard library functions to access persistent text<
"The next step is to write a program that accesses a file that is
not already connected to the program" (Kernighan and Ritchie, 1988)
Files |
Communication |
State |
Comparison |
Exercises
Secondary storage retains its information when a computer is turned off
and provides a mechanism for holding information beyond the execution of
a program. Information in secondary storage can be accessed later
by the same or a different program. This information resides in
secondary memory in the form of files.
This chapter describes how to connect a program to a file, how to store
information in that file and how to retrieve that information.
Files
A file is a named area of secondary storage. The file
may be fragmented; that is, it may consist of several parts stored
at different non-contiguous locations in secondary memory.
A file does not necessarily occupy contiguous space on the storage
device.

The byte is the fundamental storage unit of a file. The
distinguishing feature of a file is its end-of-file mark.
We refer to this mark as EOF.
EOF typically has the value
-1.
Text Format
A file holds information in either of two formats:
- text - readable and editable data
- binary - executable program code (beyond the scope of these notes)
Data stored in text format is suitable for displaying and modifying
through a text editor. Files stored in text format are portable
across platforms that share the same character set. A common
standard is the IEC/ISO 646-1083 Invariant Code Set, which
consists of
- 52 upper and lower case alphabetic characters: A, B, ..., Z, a, b, ..., z
- 10 digits: 0, 1, ..., 9
- space
- null, line feed, carriage return, horizontal tab, vertical tab and form feed: \0, \l, \n, \t, \v, \f
- 29 graphic characters: ! # % ^ & * ( _ ) - + = ~ [ ]
' | \ ; : " { } , . < > / ?
Note that this set excludes the $ and ` characters.
The encoding for characters like $ and ` does not produce the
same characters on all platforms (for more details see National Variants).
Sequential Access
The most common way to access data in a text file is sequentially,
byte by byte. We process the file as a stream of bytes
without skipping any byte until we reach the file's end-of-file mark.
Connection
A C program connects to a file through an object of FILE type. The object holds information about the file
and keeps track of the next position to be accessed. We use a library
function to retrieve the address of the file object, store that address in
a pointer and subsequently access the file through that pointer.

Allocating a pointer to a FILE object takes
the form
FILE *identifier;
where FILE is the type of the FILE
object and identifer is the name of the
pointer to the FILE object.
We call this pointer a handle to the
object.
The structure type FILE is declared in the <stdio.h> header file.
To allocate memory for a FILE pointer,
we write
#include <stdio.h>
FILE *fp = NULL;
|
We initialize the pointer fp to
NULL as a precaution against
premature dereferencing. If our program accesses data
at fp before the connection to
the file is open, our program may generate a segmentation
fault.
(NULL is defined in the
<stdio.h> header file.)
Opening a File
fopen() opens the named file and
returns the address of the FILE object
that connects to that file. The prototype for
fopen() is
FILE *fopen(const char file_name[], const char mode[]);
The first parameter holds the address of the file's name, which
could be a string literal: a set of characters enclosed in a pair of
double quotes. The second parameter holds the address of the
connection mode, which could also be a string literal.
The most common connection modes are
- "r" - read from the
file
- "w" - write to the
file: if the file exists, truncate its contents and then write;
if the file does not exist, create a new file and then write to
that file
- "a" - write to the end
of the file: if the file exists, append to the end of the file;
if the file does not exist, create it and then write to it
The less common connection modes for text files are
- "r+" - opens the file
for reading and possibly writing
- "w+" - opens the file
for writing and possibly reading; if the file exists, truncates
its contents and then writes to the file; if the file does not
exist, creates a new file and then writes to that file
- "a+" - opens the file
for writing to the end of the file and possibly reading; if the
file exists, appends to the end of the file; if the file does
not exist, creates it and then writes to the file
The mode parameter is enclosed in a pair of double quotes, not single quotes.
To open a file named alpha.txt for writing,
we write
// Open a file
// openFile.c
#include <stdio.h>
int main(void)
{
FILE *fp = NULL;
fp = fopen("alpha.txt","w");
if (fp != NULL) {
// statements to be added later
} else {
printf("Failed to open file\n");
}
return 0;
}
|
fopen() returns NULL if it fails to connect to the file.
fopen() can fail due to lack of
permission, premature removal of the secondary storage
medium or a full device.
Closing
fclose() disconnects the file
from the host program. This library function takes as its only parameter
the file pointer. The prototype for fclose() is
int fclose(FILE *);
If the file is open for writing or appending, fclose() writes any data remaining in the file's
buffer to the file and appends the end of file mark after the last
character written. If the file is open for reading,
fclose() ignores any data left
in the file's buffer and closes the connection.
To close a file named alpha.txt that is open for writing,
we write
// Close an Opened file
// closeFile.c
#include <stdio.h>
int main(void)
{
FILE *fp = NULL;
fp = fopen("alpha.txt","w");
if (fp != NULL) {
// statements to be added later
fclose(fp);
} else {
printf("Failed to open file\n");
}
return 0;
}
|
fclose() returns 0 if successful,
EOF if unsuccessful.
fclose() fails if the
storage device is full, an I/O error occurs or the
storage medium is prematurely removed.
Communication
The C library functions for communicating with an open file include:
- fprintf() - formatted write to file
- fputc() - write single character to file
- fscanf() - formatted read from file
- fgetc() - read single character from file
Writing
Formatted Writing
fprintf() writes data
to an open file under format control. The
prototype for this library function is
int fprintf(FILE *, const char [], ...);
The first parameter receives the address of the FILE object.
The second parameter receives the address of the string literal that specifies
the format. This literal may contain text to be written directly to the
file as well as conversion specifiers, if any, to be applied to the data values
supplied as arguments.
For example:
// Writing to a File
// writeToFile.c
#include <stdio.h>
int main(void)
{
FILE *fp = NULL;
int sku = 4664;
double price = 1.49;
fp = fopen("alpha.txt","w");
if (fp != NULL) {
fprintf(fp, "sku = %d price = %10.2lf\n", sku, price);
fclose(fp);
} else {
printf("Failed to open file\n");
}
return 0;
}
|
Unformatted Writing
fputc() writes a single character to an open file.
The prototype for this library function is:
int fputc(int ch, FILE *fp);
ch receives a copy of the character to be written
and fp receives the address of the
FILE object. fputc() returns the
character written, or EOF in the event of
an error.
Reading
Formatted Reading
fscanf() reads
a sequence of bytes from an open file
under format control. The prototype for this
library function is
int fscanf(FILE *, const char [], ...);
The first parameter receives the address of the FILE
object. The second parameter receives the address of the string literal that
specifies the format. This literal contains the conversion specifiers to be
used in translating the file data to data stored in memory.
For example:
// Reading from a File
// readFromFile.c
#include <stdio.h>
int main(void)
{
FILE *fp = NULL;
int sku;
double price;
fp = fopen("alpha.txt","r");
if (fp != NULL) {
fscanf(fp, "%d %lf", &sku, &price);
printf("sku = %d price = %10.2lf\n", sku, price);
fclose(fp);
} else {
printf("Failed to open file\n");
}
return 0;
}
|
Unformatted Reading
fgetc() reads a single character
from an open file. The prototype for
this library function is
int fgetc(FILE *fp);
fp receives the address of the FILE
object. fgetc() returns the
character read; EOF in the event of an error.
State of a File Object
The C library functions for managing the state of a FILE
object include:
- rewind() - rewind the file
- feof() - identify the end of the file
Rewind
rewind() resets the record pointer in the
FILE object the first byte in a file.
The next byte to be accessed by the object will be the first byte in
the file.
In other words, to jump to the beginning of a file, instead of disconnecting
and re-connecting it, we simply rewind the file.
The prototype for this library function is
void rewind(FILE *fp);
fp receives the address of the FILE
object.
Consider a text file named produce.txt that contains
4664 1.49
4419 1.29
4011 0.59
|
The following program reads and displays this data, rewinds
the file and reads and displays again
// Reading from a file
// readFromFile.c
#include <stdio.h>
int main(void)
{
FILE *fp = NULL;
int sku;
double price;
fp = fopen("produce.txt","r");
if (fp != NULL) {
while(fscanf(fp, "%d%lf",
&sku, &price) != EOF)
printf("%5d %6.2lf\n",
sku, price);
rewind(fp);
while (fscanf(fp, "%d%lf",
&sku, &price) != EOF)
printf("%5d %6.2lf\n",
sku, price);
fclose(fp);
} else {
printf("Failed to open file\n");
}
return 0;
}
|
4664 1.49
4419 1.29
4011 0.59
4664 1.49
4419 1.29
4011 0.59
|
End of File
feof() indicates whether or not
the caller attempted to read the end-of-file mark;
that is, read beyond the last character in the file.
The prototype for this library function is
int feof(FILE *fp);
feof() returns false (0) if the
caller has not attempted to read the end-of-file mark;
true if the caller attempted to read the end-of-file mark.
If the next byte to be read is the end-of-file mark, but the caller
has not yet read the mark (that is, has only read the last character
in the file), feof() returns false.
In other words, to receive true, the caller must have attempted
to read the end-of-file mark at least once.
Comparison
The library functions for communicating with files share many
common properties with the functions for communicating with users
directly. The functions belong to the same library, follow
the same rules for format control and share a common syntax.
Return Type |
Standard I/O |
File I/O |
Notes |
int |
scanf(...) |
fscanf(fp,...) |
check to see if the return value is
EOF |
int |
printf(...) |
fprintf(fp,...) |
returns the number of characters
written |
int |
getchar() |
fgetc(fp) |
check to see if the return value is
EOF before converting
it to a char type |
int |
putchar(ch) |
fputc(ch, fp) |
check to see if the return value is
EOF |
Exercises
- Complete the Workshop on Files
- Complete the exercise on Files
- Complete the following practice problems
|