FILES
File is a collection of data stored on some device, perhaps Floppy Disk Drive,
Hard Disk Drive etc. Typically operating system manages file keeping track of
their locations, their sizes, when they were created and so on. Disk I/O
operations are performed on entities called files. There are a large number of
standard library functions available for performing Disk or file I/O.
These functions can be broadly divided into two
categories:
-
High level file I/O functions (also called standard I/O or stream I/O
functions)
- Low
level file I/O functions (also called system I/O functions)
High-level disk I/O functions are more commonly used
in C programs, since they are easier to use than low-level disk I/O
functions.The low-level disk I/O functions are more closely related to the
computer's operating system than the high level disk I/O. However, low-level
disk I/O is more efficient both in terms of operation and the amount of memory
used by the program.
The high-level file I/O functions are further
categorized into text and binary. This classification arises out of the mode in
which a file is opened for input or output. Which of these two modes is used to
open the file determines:
- How
newlines ('\n') are stored.
- How
end of file is indicated.
- How
numbers are stored in the file.
The program will read a file and count how many
characters, spaces, tabs and newlines are present in the file. We will first
list the program and show what it does, and then dissect it line by line. Here
is the listing.
/* Count chars, spaces, tabs
and newlines in a file */
#include<stdio.h>
main( )
{
FILE *fp;
char ch;
int nol =0, not =0, nob =0, noc =0;
fp =fopen ("pr1.c","r");
while (1)
{
ch =fgetc(fp);
if(ch== EOF)
break;
noc++;
if(ch=='
' )
nob++;
if(ch=='
\n')
nol++;
if(ch=='\t')
not++;
}
fclose(fp);
printf(" Number of characters =%d \n",noc);
printf(" Number of blanks =%d \n",nob);
printf(" Number of tabs=%d \n",not);
printf(" Number of lines = %d",nob);
getch();
}
Considering that you have a file already created by
the name "PRC1.C". The above statistics are true for a file "PR1.C", which you
had on your diskette. You may give any other filename and obtain different
results.
Opening a File:
Before we can write information to a file on a disk or read it, we must open the
file. Opening a file establishes a link between the program and the operating
system, about, which file we are going to access and how. We provide the
operating system with the name if the file and whether we plan to read or write
to it. The link between our program and the operating system is structure called
FILE, which has been defined, in the header file "stdio.h" (standard Input
Output header file). Therefore,it is necessary to
always include this file when we are doing high-level disk I/O. When we request
the operating system to open a file, what we get back (if the request is indeed
granted), is a pointer to the structure FILE.That is why; we make the following
declaration before opening the file,
FILE *fp;
Each file we open will have its own FILE structure.
The FILE structure contains information about the file being used, such as its
current size, its location in memory etc.
Now let us understand the following statements,
FILE *fp;
fp = fopen ("PR1.c", "r" ) ;
fp is a pointer variable,
which contains address of the structure FILE which has been defined in the
header file "stdio.h". fopen( ) will open a file
"PR1.C" in 'read' mode, which tells the C compiler that we would be reading the
contents of the file. Note that "r" is a string and not a character; hence the
double quotes and not single quotes. In fact, fopen(
) performs three important tasks when you open the file in "r" mode:
-
Searches the file to be opened on the disk.
b.
If the file is present, it loads the file from the disk into memory.
Of course if the file is very big,then it loads the
file part by part. If the file is absent, fopen( )
returns a NULL. NULL is a macro defined in "stdio.h" which indicates that you
failed to open the file.
c.
It sets up a pointer, which points to the first character of the chunk
of memory where the file has been loaded.
8.3 Reading From a File:
Once the file has been opened for reading using fopen(
), as we have seen the file's contents are brought into memory (partly or
wholly) and a pointer points to the very first character. To read the file's
contents from memory there exists a standard library function calledg
etc( ). This has been used in our sample program
through,
ch
= getc( fp ) ;
getc( ) function reads the character from current
pointer position, advances the pointer position so that it now points to the
next character,and returns the character that is read, which we collected in the
variable ch. Note that once the file has been opened, we no longer refer to the
file by its name, but through the file pointer fp.
In the program above getc(
) is in an indefinite loop. The moment the end of file is reached. End of file
is signified by a special character,whose ASCII value
is 26. This character is inserted beyond the last character in the file, when
the file is created.
Trouble Opening The File
There is a possibility that when we try to open a file using the function
fopen( ), the file may not be opened. While opening
the file in "r" mode, this may happen because the file being opened may not be
present on the disk at all. And you obviously cannot read a file, which doesn't
exist.
Similarly, while opening the file for writing, fopen(
) may fail due to a number of reasons, like, disk space may be insufficient to
open a new file, or the disk may be write protected and so on. Here is how this
can be handled in a program...
#include "stdio.h"
main( )
{
FILE *fp ;
fp = fopen ( "pr1.c", "r" ) ;
if ( fp == NULL)
{
puts
( "cannot open file" ) ;
exit( ) ;
}
else
puts
( "file is opened" ) ;
}
Closing The File:
When we have finished reading from the file, we need to close it. This is done
using the function fclose( ) through the statement,
fclose
( fp );
This deactivates the file and hence it can no longer
be accessed using getc( ). Once again we don't use
the filename but the file pointer fp.
File Opening Modes:
In our first program on disk I/0 we have opened the file in read ("r") mode.
However, "r" is but one of the several modes in which we can open a file.
Following is a list of all possible modes in which a file can be opened. The
tasks performed by fopen( ) when a file is opened in
each of these modes are also mentioned.
"r"
Searches file. If the
file exists, loads it into memory and sets up a pointer which points to the
first character in it. If file doesn't exist it returns NULL.
Operations possible - reading from the file.
"w"
Searches file. If the
file exists, its contents are overwritten. If the file doesn't exist, a new file
is created. Returns NULL, if unable to open file.
Operations possible - writing to the file
"a"
Searches file. If the
file exists, loads it into memory and sets up a pointer which points to the
first character in it. If the file doesn't exist, a new file is created. Returns
NULL, if unable to open file.
Operations possible - appending new contents at end of file.
"r+"
Searches file. If it
exists, loads it into memory and sets up a pointer which points to the first
character in it. If file doesn't exist it returns NULL.
Operations possible - reading existing contents, writing new
contents, modifying existing contents of the file.
"w+"
Searches file. If the
file exists, its contents are destroyed. If the file doesn't exist a new file is
created. Returns NULL, if unable to open file.
Operations possible - writing new contents, reading them
back and modifying existing contents of the file.
"a+"
Searches file. If it
exists, loads it into memory and sets up a pointer which points to the first
character in it. If the file doesn't exist, a new file is created. Returns NULL,
if unable to open file.
Operations possible - reading existing contents, appending
new contents to end of file. Cannot modify existing
contents.
Writing To a File:
The putc( ) function is similar to the putch( )
function, in the sense that both output characters. However,
putch( ) function always writes to the VDU, whereas, putc( ) writes to
the file. Which file? The file signified by ft. The writing process continues
till all characters from the source file have been written to the target file,
following which the loop terminates.
A Closer Look at fclose(
):
Closing the file has several effects. First any characters remaining in the
buffer (an area in memory) are written to the disk.
Consider, for example, how inefficient it would be
to actually access the disk every time we want to write a character to it. Every
time we write something to a disk, it takes some time for the disk drive to
position the read/write head correctly. On a floppy disk system, the drive motor
has to actually start rotating the disk from a standstill every time the disk is
accessed. If this is to be done for every character we write to the disk, it
would take a long time to perform disk I/0. This is where a buffer comes in.
When you send a character off to a file by using
putc( ), the character is actually stored in a buffer
(an area in memory), rather than being immediately written to the disk. When the
buffer is full, its contents are written to the disk at once. Or if the program
knows that the last character to be written to the disk has been received in the
buffer, but it is still not full, it forces the buffer to be written to the disk
by 'closing' the file.
A major advantage of using the high-level disk I/0
functions is that these activities take place automatically; the programmer
doesn't need to worry about them.
Another purpose that fclose( ) serves is, it frees
the link used by the particular file, and the associated buffers, so that these
are available for other files.
For formatted reading and writing of characters, strings, integers, floats,
there exist two functions, fscanf( ) and fprintf( ). Here is a program, which
illustrates the use of these functions...
# include "stdio.h"
main( )
{
FILE *fp ;
char another = 'Y' ;
char name[40] ;
int age ;
float bs ;
fp = fopen ( "EMPLOYEE.DAT", "w" );
if ( fp == NULL )
{
puts
( "Cannot open file" );
exit( ) ;
}
while ( another == 'Y' )
{
prinlf ( "\nEnter name, age and
basic salary\n" ) ;
scanf ( "%s %d %f", &name, &age, &bs) ;
fprinlf ( fp, "%s %d %f\n", name, age, bs ) ;
printf ( "\nAnother employee (Y/N) ");
fflush ( stdin ) ;
another = getche( ) ;
}
fclose ( fp ) ;
}
And here is the output of the
program...
Enter name, age and basic
salary Amar 34 1550
Another employee (Y/N) Y
Enter name, age and basic salary Sanju 24 1200
Another employee (Y /N) Y
Enter name, age and basic salary Ryan 26 2000
Another employee (Y/N) N
The key to this program is the function
fprintf( ), which writes the values of three
variables to the file. This function is similar to printf(
), except that a FILE pointer is included as the first argument. As in
printf( ), we can format the data in a variety of
ways, by using fprintf( ). In fact all the format conventions of
printf( ) function work with fprintf( ) as well. The
function fflush() is used to get rid of a peculiarity
of scanf( ). After supplying data for one employee, we would hit the enter key.
What scanf( ) does is it assigns name, age and salary
to appropriate variables and keeps the enter key unread in the keyboard buffer.
Binary Mode Versus
Text Mode:
As we have seen earlier, the high-level disk I/O functions can be categorised as
text and binary. This classification arises out of the mode in which a file is
opened. There are three main areas where text and binary mode files are
different:
- The
handling of newlines
- The
representation of end of file
- The
storage of numbers
Text Versus
Binary Mode - Newlines:
We have already seen that, in text mode, a newline character is converted into
the carriage return - linefeed combination before being written to the disk.
Likewise, the carriage return - linefeed combination on the disk is converted
back into a newline when the file is read by a C program. However, if a file is
opened in binary mode, as opposed to text mode, these conversions will not take
place.
Program to open a file in binary mode and see what effect it
has on the count of characters present in the file.
# include "stdio.h"
main( )
{
FILE *fp;
char ch ;
int noc = 0 ;
fp = fopen ( "Employee.txt", "rb" ) ;
if ( fp == NULL)
{
puts
( "Cannot open file");
exit( ) ;
}
while (1)
{
ch
= getc (fp) ;
if ( ch == EOF )
break ;
noc + + ;
}
fclose ( fp ) ;
printf ( "Number of characters = %d\n", noc ) ;
}
And here is the output.
Number of characters = 105
Text Versus
Binary Mode - End Of File:
The second difference between text and binary modes is in the way the
end-of-file is detected. In text mode, a special character,whose
ASCII value is 26, is inserted after the last character in the file to mark the
end of file. If this character is detected at any point in the file, the read
function would return the EOF signal to the program.
As against this, there is no such special character
present in the binary mode files to mark the end of file. The binary mode files
keep track of the end of file from the number of characters present in the
directory entry of the file.
There is a moral to be derived from the end of file
marker of text mode files. If a file stores numbers in binary mode, it is
important that binary mode only be used for reading the numbers back, since one
of the numbers we store might well be the number 26 (hexadecimal lA). If this
number is detected while we are reading the file by opening it in text mode,
reading would be terminated prematurely at that point.
Text Versus Binary Mode - Storage Of
Numbers:
The only function that is available for storing numbers in a disk file is the
fprintf( ) function .It is important to understand
how numerical data is stored on the disk by fprintf( ). Text and characters are
stored one character per byte, as we would expect. Numbers are stored as strings
of characters. Thus, 1234, even though it occupies two bytes in memory, when
transferred to the disk using fprintf( ), it would
occupy four bytes, one byte per character. Similarly, the floating -point number
1234.56 would occupy 7 bytes on disk. Thus, numbers with more digits would
require more disk space. Hence if large amount of numerical data is to be stored
in a disk file, using text mode may turn out to be inefficient. The solution is
to open the file in binary mode and use those functions, which store the numbers
in binary format. It means each number would occupy same number of bytes on disk
as it occupies in memory.
# include "stdio.h"
main( )
{
FILE *fp ;
char another = 'Y' ;
struct emp
{
char name[40] ;
int age ;
float bs ;
};
struct emp e;
fp = fopen ( "EMP.DAT", "wb" );
if ( fp == NULL )
{
puts ( "Cannot open file" );
exit( ) ;
}
while ( another == 'Y' )
{
prinlf ( "\nEnter name, age and
basic salary\n" ) ;
scanf ( "%s %d %f", &e.name, &e.age, &e.bs) ;
fwrite ( &e,sizeof(e),1,fp ) ;
printf ( "\nAnother employee (Y/N) ");
fflush ( stdin ) ;
another = getche( ) ;
} fclose (
fp ) ;
}
The information obtained about the employee from the
keyboard is placed in the structure variable e. Then, the following statement
writes the structure to the file:
fwrite
( &e, sizeof ( e ), 1, fp )
Here, the first argument is the address of the
structure to be written to the disk.
The second argument is the size of the structure in bytes. Instead of counting
the bytes occupied by the structure ourselves, we let the program do it for us
by using the sizeof( ) operator.
sizeof( ) operator gives the size of variable in bytes. This keeps the
program unchanged in event of change in the elements of the structure.
The third argument is the number of such structures
that we want to write at one time. In this case, we want to write only one
structure at a time. Had we had an array of structures, for example, we might
have wanted to write the entire array at once.
The last argument is the pointer to the file we want to write to.
Detecting Errors in Reading/ Writing:
Not at all times when we perform a read or write operation on a file are we
successful in doing so. Naturally there must be a provision to test whether our
attempt to read/write was successful or not.
The standard library function
ferror( ) reports any error that might have occurred during a read/write
operation on a file. It returns a zero if the read/write is successful and a
non-zero value in case of a failure. The following program illustrates the usage
of ferror( ).
#include "stdio.h"
main( )
{
FILE *fp ;
char ch ;
fp = fopen ( "TRIAL", "w" );
while ( !feof ( fp ) ) {
ch = getc ( fp) ;
if ( ferror(fp ) )
{
printf ( "Error in reading file" );
break ;
}
else
printf ( "%c", ch ) ;
}
fclose ( fp ) ;
}
In this program the getc(
) function would obviously fail first time around since the file has been opened
for writing whereas getc( ) is attempting to read from the file The moment the
error occurs ferror( ) returns a non-zero value and the if block gets executed -
instead of printing the error message using printf( ) we can use the standard
library function perror( ) which prints the error message specified by the
compiler. Thus in the above program the perror( )
function can be used as shown below.
if(ferror(fp
))
{
perror('TRIAL");
break ;
}
Note that when the error occurs
the error message that is, displayed is:
TRIAL: Permission denied
This means we can precede the system error message
with any message of our choice. In our program we have just displayed the
filename in place of the error message.
C Preprocessor:
It is a program that processes our source program before it is passed to the
compiler. Preprocessor commands (often known as directives) form what can almost
be considered a language within C language. The preprocessor offers several
features called preprocessor directives. Each of these preprocessor directives
begin with a # symbol. The directives can be placed
anywhere in a program but are most often placed at the beginning of a program,
before main( ), or before the beginning of a
particular function. We would learn the following preprocessor directives here:
-
#define directive
-
#include directive
-
#under directive
-
Conditional Compilation directives
#define Directive:
Consider the following program
#define UPPER 25
main( )
{
int i;
for ( i = 1 ; i <=UPPER; i++)
printf ("%d \n", i) ;
getch();
}
In this program instead of writing 25 in the for
loop we are writing it in the form of UPPER, which has already been defined
before main( ) through the statement,
#define UPPER 25
This statement is called 'macro definition' or more commonly, just a 'macro'.
What purpose does it serve? During preprocessing, the preprocessor replaces
every occurrence of UPPER in the program with 25.
When we compile the program, before the source code passes to the compiler it is
examined by the C preprocessor for any macro definitions. When it sees the
#define directive, it goes through the entire program in search of the macro
templates; wherever it finds one, it replaces the macro template with the
appropriate macro expansion. Only after this procedure has been completed is the
program handed over to the compiler. In C programming it is customary to use
capital letters for macro template. This makes it easy for programmers to pick
out all the macro templates when reading through the program.
Note that blanks or tabs separate macro template and
its macro expansion. A space between # and define is optional. Remember that a
macro definition is never to be terminated by a semicolon.
Macros With
Arguments:
The macros that we have used so far are called simple macros. Macros can have
arguments, just as functions can. Here is an example, which illustrates this
fact.
#define AREA(x) ( 3.14 * x* x )
main( )
{
float r1 = 6.25, r2 = 2.5 , a;
a =AREA ( r1 ) ;
printf ( "Area of circle = %f\n", a ) ;
a = AREA ( r2) ;
printf ( "Area of circle = %f", a) ;
getch();
}
Here's the output of the
program...
Area of circle = 122.656250
Area of circle = 19.625000
In this program wherever the preprocessor finds the
phrase AREA(x) it expands it into the statement ( 3.14
* x * x ). However, that's not all that it does. The x in the macro template
AREA(x) is an argument that matches the x in the macro expansion
( 3.14 * x * x ). The statement
AREA(r1) in the program causes the variable r1 to be substituted for x .
#include Directive:
The second preprocessor directive we'll explore in this chapter is file
inclusion. This directive causes one file to be included in another. The
preprocessor command for file inclusion looks like this:
#include "filename"
and it simply causes the
entire contents of filename to be inserted into the source code at that point in
the program. Of course this presumes that the file being included exists. This
feature can be used in two cases:
1.
If we have a very large program, the code is best divided into several
different files, each containing a set of related functions. It is a good
programming practice to keep different sections of a large program separate.
These files are #included at the beginning of main program file.
2.
Many a times we need some functions or some macro definitions almost
in all programs that we write. In such a case these commonly needed functions
and macro definitions can be stored in a file, and that file can be included in
every program we write, which would add all the statements in this file to our
program as if we have typed them in. for example we included a file called
stdio.h in many programs.
It is common for the files, which are to be included
to have a .h extension. The '.h' extension stands for
'header file', possibly because it contains statements which when included go to
the head of your program.
Actually there exist two ways to write #include
statements. These are:
#include "filename"
#Include <filename>
The meaning of each of these forms is given below:
#include "goto.c"
This command would look for the file goto.c in the
current directory as well as the specified list of directories as mentioned in
the include search path that might have been set up.
#include <goto.c>
This command would look for the file goto.c in the
specified list of directories only.
HOME
PREVIOUS>> |