Working with Files in C: File I/O

Working with Files in C: A Beginner's Guide to Input and Output Operations

May 08, 2023

Processes are an important concept in modern operating systems, and they are used extensively to provide a secure and efficient environment for running computer programs.

When a program is executed, the operating system creates a new process for it. The process consists of the executable code, data, and stack that are loaded into memory,

Each process has its own table of file descriptors, which maps the file descriptor number to an open file or resource in the file system. The file descriptor table is maintained by the kernel and is unique to each process. The file descriptor table indexes into the system-wide file table, which keeps track of all the files that are currently open by all processes. The file table, in turn, indexes into the inode table, which describes the actual underlying files and directories in the file system.

When a process wants to perform input/output operations on a file, it passes the file descriptor to the kernel through a system call, and the kernel performs the operation on behalf of the process. The kernel uses the file descriptor to access the appropriate file or resource in the file system.

File Handling in C:

First we need to distinguish between 2 terms you need to know:
Stream Data Files: Stream data files are data files that are stored in the same manner as they appear on the screen or are input/output as a stream of bytes. These files are typically used for text-based data that is easily read and written by programs. Examples of stream data files include text files, CSV files, and log files. Stream data files are often processed using functions such as fread(), fwrite(), and fprintf().
System Data Files: System data files are data files that are more closely associated with the operating system. These files are used for storing system configuration information, device drivers, and other system-related data. Examples of system data files include device files (/dev), configuration files (/etc), and system logs (/var/log). System data files are often accessed using system calls such as open(), read(), and write().
It's worth noting that the distinction between stream data files and system data files is not always clear-cut, and there can be overlap between the two types. For example, a configuration file may contain text-based data that is read and written using stream I/O functions.

We need certain steps for performing file operations in C:

Declare a file pointer variable: Before working with a file, you need to declare a file pointer variable of type FILE*. This variable will be used to refer to the file throughout the program.
Open the file using fopen(): To open a file, you can use the fopen() function. This function takes two arguments: the filename (as a string) and the mode in which to open the file (e.g. "r" for read mode, "w" for write mode, "a" for append mode).
Perform file operations using suitable functions: Once the file is open, you can perform various operations on it using suitable functions. For example, you can read data from the file using fread(), write data to the file using fwrite(), or read a line of text from the file using fgets().
Close the file using fclose(): After you are done working with the file, you should close it using the fclose() function. This function takes the file pointer variable as an argument and flushes any remaining data to the file before closing it.

Let’s see the following example:

#include <stdio.h>

int main() {
    // Declare a file pointer variable
    FILE *fp;
    char ch;

    // Open a file in read mode
    fp = fopen("example.txt", "r");

    // Check if the file was opened successfully
    if (fp == NULL) {
        printf("Error opening file\n");
        return 1;
    }

    // Read and print the contents of the file
    while ((ch = fgetc(fp)) != EOF) {
        printf("%c", ch);
    }

    // Close the file
    fclose(fp);

    return 0;
}

In this example, the program declares a file pointer variable fp, opens a file named "example.txt" in read mode using fopen(), reads and prints the contents of the file using fgetc(), and then closes the file using fclose().

Note that The code while ((ch = fgetc(fp)) != EOF) is a loop that reads characters from the file associated with the file pointer fp until the end of file (EOF) is reached. Here's how it works:

The fgetc(fp) function reads a character from the file associated with the file pointer fp and returns it as an int. If the end of file is reached, the special value EOF is returned instead.
The ch = fgetc(fp) statement assigns the value returned by fgetc(fp) to the variable ch. If EOF is returned, the loop will exit because the condition
(ch != EOF) will evaluate to false.
The loop body consists of a single statement: printf("%c", ch). This statement prints the character stored in ch to the console using the %c format specifier for characters.
The loop continues to execute as long as the condition (ch != EOF) is true, which means that there are still characters left to be read from the file.

In summary, this loop reads characters from a file one at a time and prints them to the console until the end of file is reached. It's a common pattern for processing text data stored in a file.

Another code example :

int main()

{

FILE *fp;

fp = fopen("test.txt", "w");

/*process between open file and closing it : processing :*/

fprintf(fp, "%s", "Hello Alx Geek");

fclose(fp);

return (0);

} Note herefprintf(fp, "%s", "Hello Alx Geek");
Writes the string "Hello Alx Geek" to the file using fprintf(). The fp argument specifies the file to write to, and the %s format specifier indicates that the second argument ("Hello Alx Geek") is a string.

what is difference between open() - fopen() ,
close() fclose(), fprintf() dprintf() , read() fread(), …?

The main differences between open(), fopen(), close(), fclose(), fprintf(), dprintf(), read(), and fread() are:

open() and close() are low-level file I/O functions provided by the operating system, while fopen() and fclose() are higher-level functions provided by the C standard library. open() and close() typically operate on file descriptors, which are low-level identifiers for open files, while fopen() and fclose() operate on FILE pointers, which are higher-level structures that provide additional functionality for stream-based I/O.
Similarly :
dprintf() is similar to fprintf(), but it writes to a file descriptor instead of a FILE pointer. This can be useful in situations where low-level file I/O is required.
read() and fread() are low-level and high-level functions, respectively, for reading data from a file. read() typically operates on file descriptors and reads a specified number of bytes into a buffer, while fread() operates on FILE pointers and reads a specified number of elements of a specified size into a buffer.
fprintf() and dprintf() are used for writing formatted data to a file or file descriptor, respectively. They provide a way to format data using format specifiers, similar to printf(). fprintf() typically operates on FILE pointers, while dprintf() operates on file descriptors.
close() and fclose() are used to close a file that was previously opened with open() or fopen(), respectively. close() typically operates on file descriptors, while fclose() operates on FILE pointers. Closing a file is important to ensure that changes made to the file are saved and that resources are not wasted.

Important non-inclusive file I/O functions list:

Let’s explore an example that demonstrates the use of open(), close(), read(), write(), and dprintf(). This program is supposed to read the contents of a file named "input.txt", converts all lowercase letters to uppercase, and writes the result to a file named "output.txt". It also writes the result to the console using dprintf().

Here's a brief explanation of what we will try to code:

1- Declarations:

Declare file descriptor variables for the input and output files.
int input_fd, output_fd
Declare string variables for the input and output filenames.
Declare variables for use in the loop that reads and processes data from the input file: ssize_t n; and int i
buffer:
buffer[1024] declares an array named buffer that can hold up to 1024 characters. In this specific program, buffer is used as a temporary storage area for the data read from the input file using read(). The read() function reads a specified number of bytes from a file into a buffer, which is a contiguous block of memory.
- In general, the size of the buffer should be large enough to hold the largest amount of data that the program is likely to process at any given time, but not so large that it wastes memory.
- In practice, the size of the buffer can have a significant impact on the performance of the program. A larger buffer can reduce the number of calls to read() and write(), which are relatively slow operations, but can also consume more memory. A smaller buffer can reduce memory usage but may result in more calls to read() and write(), which can be less efficient.

2- Open the input file in read-only mode using open() and assigns the resulting file descriptor to input_fd.

3- Open the output file in write-only mode using open() and assigns the resulting file descriptor to output_fd. “The O_CREAT and O_TRUNC flags ensure that the file is created if it doesn't exist and truncated if it does, and the 0644 argument specifies the file permissions.”

4- Read data from the input file into the buffer array using read() and assigns the number of bytes read to a variable n.

5- Convert all lowercase letters in the buffer array to uppercase using the toupper() function.

6- Write the contents of the buffer array to the output file using write().

7- Write the contents of the buffer array to the console using dprintf().

8-Close the input and output files using close().

#include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <ctype.h> #include <unistd.h> int main() { int i; ssize_t n; int input_fd; int output_fd; char *input_filename = "input.txt"; char *output_filename = "output.txt"; char buffer[1024]; input_fd = open(input_filename, O_RDONLY); if (input_fd < 0) { perror("open"); exit(1); } output_fd = open(output_filename, O_WRONLY | O_CREAT | O_TRUNC, 0644); if (output_fd < 0) { perror("open"); exit(1); } while ((n = read(input_fd, buffer, sizeof(buffer))) > 0) { for (i = 0; i < n; i++) { buffer[i] = toupper(buffer[i]); } write(output_fd, buffer, n); dprintf(STDOUT_FILENO, "%.*s", (int)n, buffer); } close(input_fd); close(output_fd); return (0); }

Note that The use of ssize_t instead of int or size_t allows functions to return an error code or a negative value to indicate an error condition, rather than using a special value such as -1 or EOF. This can help to prevent errors and make error handling more robust and reliable.

What are the 3 standard file descriptors, what are their purpose and what are their POSIX names ?

POSIX stands for Portable Operating System Interface, and it is a set of standards that define a common interface between Unix-like operating systems and applications.

In Unix-like operating systems, there are three standard file descriptors that are associated with each process. These file descriptors are:

Standard input (stdin): This file descriptor is used for input from the user or from another program. By default, it is connected to the keyboard, but it can be redirected to read input from a file or from another program's output. The POSIX name for this file descriptor is STDIN_FILENO.
Standard output (stdout): This file descriptor is used for output to the user or to another program. By default, it is connected to the console, but it can be redirected to write output to a file or to another program's input. The POSIX name for this file descriptor is STDOUT_FILENO.
Standard error (stderr): This file descriptor is used for error messages and diagnostic output. By default, it is also connected to the console, but it can be redirected to write error messages to a file or to another program's input. The POSIX name for this file descriptor is STDERR_FILENO.
Now let us let me try to explain the difference between a function and a system call in a funny way:

Imagine you're a kid at a birthday party, and you want to get some cake. You have two options:

Ask the host nicely to give you a slice of cake. This is like calling a function - you make a request to someone who is already there (the host) to do something for you (give you cake). The host has the cake and the knife, and they can give you a slice of cake without you having to leave your spot at the party.
Go to the kitchen and get the cake yourself. This is like making a system call - you need something (cake), but you don't have it, so you have to leave your current context (the party) and go to a different context (the kitchen) to get it. In the kitchen, you have to interact with the system (open the fridge, find the cake, cut a slice) before you can bring the cake back to the party.

In computer terms, a function is like a request to someone who is already in the program to do something for you. For example, if you call the strlen() function in C, you're asking the program to calculate the length of a string for you. The function is already part of the program, and it can do what you need without you having to interact with the system directly.

On the other hand, a system call is like a request to the operating system to do something for you. For example, if you call the open() function in C, you're asking the operating system to open a file for you. The file is not part of the program, and you have to interact with the system (the operating system) to get it.

So, in summary: a function is like asking someone who is already in the program to do something for you, while a system call is like interacting with the operating system to get something you need that's not already part of the program.

Remember that in computer terms, a system call is like making a request to the kernel to access system resources. For example, if your program needs to read a file, it makes a system call to the kernel to access the file system. The kernel has access to all the resources (like the file system) that your program needs, and it can perform privileged operations on your behalf.

The kernel is the heart of the operating system, and it manages all the resources that your programs need to run. It's like the store where you can get everything you need to make lemonade, but you have to go through the kernel to get access to those resources.

So, in summary: a system call is like asking someone who has the resources you need to do something for you, and the kernel is like the store where all the resources are kept, but you have to go through the kernel to get access to them.

With this knowledge, you can better understand how your programs interact with the operating system and how to leverage system calls to perform privileged operations on your behalf. Whether you're working on a simple file manipulation program or a complex system-level application, understanding system calls and the kernel is an important step towards becoming a proficient systems programmer.

Ahmad

May 15, 2023

This is such a detailed article! This article covers file ops in such a considerable depth with comparisons and examples. I freshened up important fundamental concepts like what is stdin, stdout, function, etc. I've been eager to learn them from a complete understanding - genuinely glad that those are now being discussed in this substack! Thanks Shaza Ali.

Expand full comment

1 reply by Shaza Ali

1 more comment...

Shaza’s Substack

Discussion about this post