C      PROGRAMMING

FILES

 
File is a collection of data stored on some device, perhaps Floppy Disk Drive, Hard Disk Drive etc. Typically operating system manages file keeping track of their locations, their sizes, when they were created and so on. Disk I/O operations are performed on entities called files. There are a large number of standard library functions available for performing Disk or file I/O.

These functions can be broadly divided into two categories:

  1. High level file I/O functions (also called standard I/O or stream I/O functions)
  2. Low level file I/O functions (also called system I/O functions)

High-level disk I/O functions are more commonly used in C programs, since they are easier to use than low-level disk I/O functions.The low-level disk I/O functions are more closely related to the computer's operating system than the high level disk I/O. However, low-level disk I/O is more efficient both in terms of operation and the amount of memory used by the program.

The high-level file I/O functions are further categorized into text and binary. This classification arises out of the mode in which a file is opened for input or output. Which of these two modes is used to open the file determines:

  1. How newlines ('\n') are stored.
  2. How end of file is indicated.
  3. How numbers are stored in the file.

The program will read a file and count how many characters, spaces, tabs and newlines are present in the file. We will first list the program and show what it does, and then dissect it line by line. Here is the listing.

/* Count chars, spaces, tabs and newlines in a file */
#include<stdio.h>
main( )
{

FILE *fp;
char ch;
int nol =0, not =0, nob =0, noc =0;

fp =fopen ("pr1.c","r");

while (1)
{


ch =fgetc(fp);

if(ch== EOF)

break;
noc++;

if(ch==' ' )

nob++;

if(ch==' \n')

nol++;

if(ch=='\t')

not++;

}

fclose(fp);

printf(" Number of characters =%d \n",noc);
printf(" Number of blanks =%d \n",nob);
printf(" Number of tabs=%d \n",not);
printf(" Number of lines = %d",nob);
getch();
}

Considering that you have a file already created by the name "PRC1.C". The above statistics are true for a file "PR1.C", which you had on your diskette. You may give any other filename and obtain different results.

 Opening a File:

Before we can write information to a file on a disk or read it, we must open the file. Opening a file establishes a link between the program and the operating system, about, which file we are going to access and how. We provide the operating system with the name if the file and whether we plan to read or write to it. The link between our program and the operating system is structure called FILE, which has been defined, in the header file "stdio.h" (standard Input Output header file). Therefore,it is necessary to always include this file when we are doing high-level disk I/O. When we request the operating system to open a file, what we get back (if the request is indeed granted), is a pointer to the structure FILE.That is why; we make the following declaration before opening the file,

FILE *fp;

Each file we open will have its own FILE structure. The FILE structure contains information about the file being used, such as its current size, its location in memory etc.

Now let us understand the following statements,

FILE *fp;
fp = fopen ("PR1.c", "r" ) ;

fp is a pointer variable, which contains address of the structure FILE which has been defined in the header file "stdio.h". fopen( ) will open a file "PR1.C" in 'read' mode, which tells the C compiler that we would be reading the contents of the file. Note that "r" is a string and not a character; hence the double quotes and not single quotes. In fact, fopen( ) performs three important tasks when you open the file in "r" mode:

  1. Searches the file to be opened on the disk.

b.      If the file is present, it loads the file from the disk into memory. Of course if the file is very big,then it loads the file part by part. If the file is absent, fopen( ) returns a NULL. NULL is a macro defined in "stdio.h" which indicates that you failed to open the file.

c.       It sets up a pointer, which points to the first character of the chunk of memory where the file has been loaded.

 

8.3 Reading From a File:

Once the file has been opened for reading using fopen( ), as we have seen the file's contents are brought into memory (partly or wholly) and a pointer points to the very first character. To read the file's contents from memory there exists a standard library function calledg etc( ). This has been used in our sample program through,

ch = getc( fp ) ;

getc( ) function reads the character from current pointer position, advances the pointer position so that it now points to the next character,and returns the character that is read, which we collected in the variable ch. Note that once the file has been opened, we no longer refer to the file by its name, but through the file pointer fp.

In the program above getc( ) is in an indefinite loop. The moment the end of file is reached. End of file is signified by a special character,whose ASCII value is 26. This character is inserted beyond the last character in the file, when the file is created.

 Trouble Opening The File

There is a possibility that when we try to open a file using the function fopen( ), the file may not be opened. While opening the file in "r" mode, this may happen because the file being opened may not be present on the disk at all. And you obviously cannot read a file, which doesn't exist.

Similarly, while opening the file for writing, fopen( ) may fail due to a number of reasons, like, disk space may be insufficient to open a new file, or the disk may be write protected and so on. Here is how this can be handled in a program...

#include "stdio.h"
main( )
{

FILE *fp ;
fp = fopen ( "pr1.c", "r" ) ;
if ( fp == NULL)
{

puts ( "cannot open file" ) ;
exit( ) ;

}
else

puts ( "file is opened" ) ;

}

 

 Closing The File:

When we have finished reading from the file, we need to close it. This is done using the function fclose( ) through the statement,

fclose ( fp );

This deactivates the file and hence it can no longer be accessed using getc( ). Once again we don't use the filename but the file pointer fp.

 

File Opening Modes:

In our first program on disk I/0 we have opened the file in read ("r") mode. However, "r" is but one of the several modes in which we can open a file. Following is a list of all possible modes in which a file can be opened. The tasks performed by fopen( ) when a file is opened in each of these modes are also mentioned.

"r"

Searches file. If the file exists, loads it into memory and sets up a pointer which points to the first character in it. If file doesn't exist it returns NULL.

Operations possible - reading from the file.

"w"

Searches file. If the file exists, its contents are overwritten. If the file doesn't exist, a new file is created. Returns NULL, if unable to open file.

Operations possible - writing to the file

"a"

Searches file. If the file exists, loads it into memory and sets up a pointer which points to the first character in it. If the file doesn't exist, a new file is created. Returns NULL, if unable to open file.

Operations possible - appending new contents at end of file.

"r+"

Searches file. If it exists, loads it into memory and sets up a pointer which points to the first character in it. If file doesn't exist it returns NULL.

Operations possible - reading existing contents, writing new contents, modifying existing contents of the file.

"w+"

Searches file. If the file exists, its contents are destroyed. If the file doesn't exist a new file is created. Returns NULL, if unable to open file.

Operations possible - writing new contents, reading them back and modifying existing contents of the file.

"a+"

Searches file. If it exists, loads it into memory and sets up a pointer which points to the first character in it. If the file doesn't exist, a new file is created. Returns NULL, if unable to open file.

Operations possible - reading existing contents, appending new contents to end of file. Cannot modify existing contents.

 Writing To a File:

The putc( ) function is similar to the putch( ) function, in the sense that both output characters. However, putch( ) function always writes to the VDU, whereas, putc( ) writes to the file. Which file? The file signified by ft. The writing process continues till all characters from the source file have been written to the target file, following which the loop terminates.

A Closer Look at fclose( ):

Closing the file has several effects. First any characters remaining in the buffer (an area in memory) are written to the disk.

Consider, for example, how inefficient it would be to actually access the disk every time we want to write a character to it. Every time we write something to a disk, it takes some time for the disk drive to position the read/write head correctly. On a floppy disk system, the drive motor has to actually start rotating the disk from a standstill every time the disk is accessed. If this is to be done for every character we write to the disk, it would take a long time to perform disk I/0. This is where a buffer comes in.

When you send a character off to a file by using putc( ), the character is actually stored in a buffer (an area in memory), rather than being immediately written to the disk. When the buffer is full, its contents are written to the disk at once. Or if the program knows that the last character to be written to the disk has been received in the buffer, but it is still not full, it forces the buffer to be written to the disk by 'closing' the file.

A major advantage of using the high-level disk I/0 functions is that these activities take place automatically; the programmer doesn't need to worry about them.

Another purpose that fclose( ) serves is, it frees the link used by the particular file, and the associated buffers, so that these are available for other files.

For formatted reading and writing of characters, strings, integers, floats, there exist two functions, fscanf( ) and fprintf( ). Here is a program, which illustrates the use of these functions...

# include "stdio.h"
main( )
{
FILE *fp ;
char another = 'Y' ;
char name[40] ;
int age ;
float bs ;

fp = fopen ( "EMPLOYEE.DAT", "w" );
if ( fp == NULL )
{

puts ( "Cannot open file" );


exit( ) ;
}
while ( another == 'Y' )
{

prinlf ( "\nEnter name, age and basic salary\n" ) ;
scanf ( "%s %d %f", &name, &age, &bs) ;
fprinlf ( fp, "%s %d %f\n", name, age, bs ) ;

printf ( "\nAnother employee (Y/N) ");
fflush ( stdin ) ;
another = getche( ) ;
}
fclose ( fp ) ;
}

And here is the output of the program...

Enter name, age and basic salary Amar 34 1550
Another employee (Y/N) Y
Enter name, age and basic salary Sanju 24 1200
Another employee (Y /N) Y
Enter name, age and basic salary Ryan 26 2000
Another employee (Y/N) N

The key to this program is the function fprintf( ), which writes the values of three variables to the file. This function is similar to printf( ), except that a FILE pointer is included as the first argument. As in printf( ), we can format the data in a variety of ways, by using fprintf( ). In fact all the format conventions of printf( ) function work with fprintf( ) as well. The function fflush() is used to get rid of a peculiarity of scanf( ). After supplying data for one employee, we would hit the enter key. What scanf( ) does is it assigns name, age and salary to appropriate variables and keeps the enter key unread in the keyboard buffer.

Binary Mode Versus Text Mode:

As we have seen earlier, the high-level disk I/O functions can be categorised as text and binary. This classification arises out of the mode in which a file is opened. There are three main areas where text and binary mode files are different:

  1. The handling of newlines
  2. The representation of end of file
  3. The storage of numbers

 Text Versus Binary Mode - Newlines:

We have already seen that, in text mode, a newline character is converted into the carriage return - linefeed combination before being written to the disk. Likewise, the carriage return - linefeed combination on the disk is converted back into a newline when the file is read by a C program. However, if a file is opened in binary mode, as opposed to text mode, these conversions will not take place.

Program to open a file in binary mode and see what effect it has on the count of characters present in the file.

# include "stdio.h"
main( )
{
FILE *fp;
char ch ;
int noc = 0 ;

fp = fopen ( "Employee.txt", "rb" ) ;
if ( fp == NULL)
{

puts ( "Cannot open file");
exit( ) ;

}
while (1)
{

ch = getc (fp) ;
if ( ch == EOF )
break ;
noc + + ;

}
fclose ( fp ) ;
printf ( "Number of characters = %d\n", noc ) ;
}

And here is the output.

Number of characters = 105  

 Text Versus Binary Mode - End Of File:

The second difference between text and binary modes is in the way the end-of-file is detected. In text mode, a special character,whose ASCII value is 26, is inserted after the last character in the file to mark the end of file. If this character is detected at any point in the file, the read function would return the EOF signal to the program.

As against this, there is no such special character present in the binary mode files to mark the end of file. The binary mode files keep track of the end of file from the number of characters present in the directory entry of the file.

There is a moral to be derived from the end of file marker of text mode files. If a file stores numbers in binary mode, it is important that binary mode only be used for reading the numbers back, since one of the numbers we store might well be the number 26 (hexadecimal lA). If this number is detected while we are reading the file by opening it in text mode, reading would be terminated prematurely at that point.

 Text Versus Binary Mode - Storage Of Numbers:

The only function that is available for storing numbers in a disk file is the fprintf( ) function .It is important to understand how numerical data is stored on the disk by fprintf( ). Text and characters are stored one character per byte, as we would expect. Numbers are stored as strings of characters. Thus, 1234, even though it occupies two bytes in memory, when transferred to the disk using fprintf( ), it would occupy four bytes, one byte per character. Similarly, the floating -point number 1234.56 would occupy 7 bytes on disk. Thus, numbers with more digits would require more disk space. Hence if large amount of numerical data is to be stored in a disk file, using text mode may turn out to be inefficient. The solution is to open the file in binary mode and use those functions, which store the numbers in binary format. It means each number would occupy same number of bytes on disk as it occupies in memory.

# include "stdio.h"
main( )
{
FILE *fp ;
char another = 'Y' ;
struct emp
{
char name[40] ;
int age ;
float bs ;
};

struct emp e;
fp = fopen ( "EMP.DAT", "wb" );
if ( fp == NULL )
{
puts ( "Cannot open file" );
exit( ) ;
}
while ( another == 'Y' )
{

prinlf ( "\nEnter name, age and basic salary\n" ) ;
scanf ( "%s %d %f", &e.name, &e.age, &e.bs) ;
fwrite ( &e,sizeof(e),1,fp ) ;

printf ( "\nAnother employee (Y/N) ");
fflush ( stdin ) ;
another = getche( ) ;

} fclose ( fp ) ;
}

The information obtained about the employee from the keyboard is placed in the structure variable e. Then, the following statement writes the structure to the file:

fwrite ( &e, sizeof ( e ), 1, fp )

Here, the first argument is the address of the structure to be written to the disk.

The second argument is the size of the structure in bytes. Instead of counting the bytes occupied by the structure ourselves, we let the program do it for us by using the sizeof( ) operator. sizeof( ) operator gives the size of variable in bytes. This keeps the program unchanged in event of change in the elements of the structure.

The third argument is the number of such structures that we want to write at one time. In this case, we want to write only one structure at a time. Had we had an array of structures, for example, we might have wanted to write the entire array at once.

The last argument is the pointer to the file we want to write to.

 Detecting Errors in Reading/ Writing:

Not at all times when we perform a read or write operation on a file are we successful in doing so. Naturally there must be a provision to test whether our attempt to read/write was successful or not.

The standard library function ferror( ) reports any error that might have occurred during a read/write operation on a file. It returns a zero if the read/write is successful and a non-zero value in case of a failure. The following program illustrates the usage of ferror( ).

#include "stdio.h"
main( )
{
FILE *fp ;
char ch ;
fp = fopen ( "TRIAL", "w" );
while ( !feof ( fp ) ) {
ch = getc ( fp) ;
if ( ferror(fp ) )
{
printf ( "Error in reading file" );
break ;
}
else
printf ( "%c", ch ) ;
}
fclose ( fp ) ;
}

In this program the getc( ) function would obviously fail first time around since the file has been opened for writing whereas getc( ) is attempting to read from the file The moment the error occurs ferror( ) returns a non-zero value and the if block gets executed - instead of printing the error message using printf( ) we can use the standard library function perror( ) which prints the error message specified by the compiler. Thus in the above program the perror( ) function can be used as shown below.

if(ferror(fp ))
{

perror('TRIAL");
break ;

}

Note that when the error occurs the error message that is, displayed is:

TRIAL: Permission denied

This means we can precede the system error message with any message of our choice. In our program we have just displayed the filename in place of the error message.  

 C Preprocessor:

It is a program that processes our source program before it is passed to the compiler. Preprocessor commands (often known as directives) form what can almost be considered a language within C language. The preprocessor offers several features called preprocessor directives. Each of these preprocessor directives begin with a # symbol. The directives can be placed anywhere in a program but are most often placed at the beginning of a program, before main( ), or before the beginning of a particular function. We would learn the following preprocessor directives here:

  1. #define directive
  2. #include directive
  3. #under directive
  4. Conditional Compilation directives

#define Directive:

Consider the following program

#define UPPER 25
main( )
{
int i;
for ( i = 1 ; i <=UPPER; i++)
printf ("%d \n", i) ;
getch();
}

In this program instead of writing 25 in the for loop we are writing it in the form of UPPER, which has already been defined before main( ) through the statement,

#define UPPER 25

This statement is called 'macro definition' or more commonly, just a 'macro'. What purpose does it serve? During preprocessing, the preprocessor replaces every occurrence of UPPER in the program with 25.

When we compile the program, before the source code passes to the compiler it is examined by the C preprocessor for any macro definitions. When it sees the #define directive, it goes through the entire program in search of the macro templates; wherever it finds one, it replaces the macro template with the appropriate macro expansion. Only after this procedure has been completed is the program handed over to the compiler. In C programming it is customary to use capital letters for macro template. This makes it easy for programmers to pick out all the macro templates when reading through the program.

Note that blanks or tabs separate macro template and its macro expansion. A space between # and define is optional. Remember that a macro definition is never to be terminated by a semicolon.

Macros With Arguments:

The macros that we have used so far are called simple macros. Macros can have arguments, just as functions can. Here is an example, which illustrates this fact.

#define AREA(x) ( 3.14 * x* x )
main( )
{
float r1 = 6.25, r2 = 2.5 , a;
a =AREA ( r1 ) ;
printf ( "Area of circle = %f\n", a ) ;
a = AREA ( r2) ;
printf ( "Area of circle = %f", a) ;
getch();
}

Here's the output of the program...

Area of circle = 122.656250
Area of circle = 19.625000

In this program wherever the preprocessor finds the phrase AREA(x) it expands it into the statement ( 3.14 * x * x ). However, that's not all that it does. The x in the macro template AREA(x) is an argument that matches the x in the macro expansion ( 3.14 * x * x ). The statement AREA(r1) in the program causes the variable r1 to be substituted for x .

#include Directive:

The second preprocessor directive we'll explore in this chapter is file inclusion. This directive causes one file to be included in another. The preprocessor command for file inclusion looks like this:

#include "filename"

and it simply causes the entire contents of filename to be inserted into the source code at that point in the program. Of course this presumes that the file being included exists. This feature can be used in two cases:

1.      If we have a very large program, the code is best divided into several different files, each containing a set of related functions. It is a good programming practice to keep different sections of a large program separate. These files are #included at the beginning of main program file.

2.      Many a times we need some functions or some macro definitions almost in all programs that we write. In such a case these commonly needed functions and macro definitions can be stored in a file, and that file can be included in every program we write, which would add all the statements in this file to our program as if we have typed them in. for example we included a file called stdio.h in many programs.

It is common for the files, which are to be included to have a .h extension. The '.h' extension stands for 'header file', possibly because it contains statements which when included go to the head of your program.

Actually there exist two ways to write #include statements. These are:

#include "filename"
#Include <filename>

The meaning of each of these forms is given below:

#include "goto.c"

This command would look for the file goto.c in the current directory as well as the specified list of directories as mentioned in the include search path that might have been set up.

#include <goto.c>

This command would look for the file goto.c in the specified list of directories only.

 

  HOME

PREVIOUS>>

Want Easy lessons and exercises!! Then search here:

Google
Web www.poombatta.com



© poombatta.com