"Creating Your Own Command Interpreter: Main Concepts Guide: Simple Shell - Part 2"

Deeper into UNIX: Exploring the Inner Workings of UNIX with a Custom Interpreter

May 07, 2023

Welcome to Part 2 of our blog series on shells and processes! In the first part, we explored some of the fundamental concepts behind shells and processes, including how shells work, what PIDs and PPIDs are, how to manipulate the environment of the current process, and the difference between functions and system calls.

In this part of our blog series on shells and processes, we'll explore some key concepts related to processes and how they interact with the operating system.

By the end of this part, you'll have a solid understanding of how to create and manage processes in a shell environment, and how to interact with other programs using input/output redirection. So let's dive in and continue our exploration of the fascinating world of shells and processes!

How to create processes ?

Why ? : Creating new processes is an essential part of modern operating systems, and it is necessary for a wide range of tasks, such as: running multiple programs concurrently: By creating separate processes for each program, the operating system can run multiple programs at the same time, even on a single-core processor. This allows users to work on multiple tasks simultaneously without having to wait for one program to finish before starting another.

In C, you can create new processes using the fork system call, which creates a new process by duplicating the existing process. Here's an example of how to create a new process in a simple shell.

#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

int main() {

pid_t pid;

// Fork a new process

pid = fork();

if (pid < 0) {

// Error

fprintf(stderr, "Fork failed\n");

exit(1);

} else if (pid == 0) {

// Child process

printf("This is the child process\n");

exit(0);

} else {

// Parent process

printf("This is the parent process\n");

exit(0);

}

return 0;

}

In this example, the fork() system call is used to create a new process. The fork() call returns the process ID of the new child process to the parent process, and returns 0 to the child process. In this case, the parent process prints "This is the parent process", and the child process prints "This is the child process".

note that pid_t is a data type in the C programming language that is used to represent process IDs (PIDs). It is a signed integer type and is defined in the sys/types.h header file.

FUNCTIONS TO KNOW :

getpid() is a function in the C library that returns the process ID (PID) of the calling process. It is defined in the unistd.h header file and takes no arguments. - The PID returned by getpid() is a unique identifier assigned by the operating system to each process running on a system.

Here's an example of how to use getpid():

wait() is a system call in the C library that suspends the calling process until one of its child processes terminates. It is defined in the sys/wait.h header file and takes a pointer to an integer variable that will be set to the exit status of the terminated child process.

What are the three prototypes of `main ?`

In the C programming language, the main() function is the entry point of a program. It is the first function that is executed when a program starts running. There are three valid prototypes for the main() function in C, which are:

int main(): This is the most common prototype for the main() function in C. It takes no arguments and returns an integer value to the operating system that indicates the success or failure of the program.
int main(int argc, char *argv[]): This prototype takes two arguments. The first argument, argc, is an integer that represents the number of command-line arguments passed to the program. The second argument, argv, is an array of strings that contains the command-line arguments themselves.
int main(int argc, char *argv[], char *envp[]): This prototype takes three arguments. The first two arguments are the same as in the previous prototype, but the third argument, envp, is an array of strings that contains the program's environment variables.

Here's an example of how to use the second prototype of the main() function:

#include <stdio.h>

int main(int argc, char *argv[]) {
    printf("Number of arguments: %d\n", argc);
    for (int i = 0; i < argc; i++) {
        printf("Argument %d: %s\n", i, argv[i]);
    }
    return 0;
}

Here's an example of how to use the third prototype of the main() function:

#include <stdio.h>

int main(int argc, char *argv[], char *envp[]) {
    for (int i = 0; envp[i] != NULL; i++) {
        printf("Environment variable %d: %s\n", i, envp[i]);
    }
    return 0;
}

In this example, the main() function takes three arguments: argc and argv, as in the previous example, and envp, which is an array of strings that contains the program's environment variables. The program prints out each environment variable, one by one.

note that upon gcc you may encounter undefined reference to `main' error, that means you need to define the _GNU_SOURCE macro, which includes support for the envp parameter in the main() function, so when you compile it use:

gcc example.c -o example -D_GNU_SOURCE

`argv` versus `envp:`

Command-line arguments: example, Input file: A program that reads data from a file can take the name of the file as a command-line argument. For example, myprogram input.txt would run the program myprogram and pass the name of the input file input.txt as an argument.
Environment variables that can be set using envp in C: some examples
1. PATH: The PATH environment variable specifies which directories the system should search for executable files. For example, PATH=/usr/local/bin:/usr/bin:/bin would tell the system to search for executable files in the directories /usr/local/bin, /usr/bin, and /bin.
2. HOME: The HOME environment variable specifies the home directory of the current user. For example, HOME=/home/username would specify that the home directory of the current user is /home/username.

How does the shell use the `PATH` to find the programs ?

The PATH environment variable is a string that contains a list of directories separated by a delimiter (usually a colon : on Unix-like systems and a semicolon ; on Windows). The PATH variable is used by the shell or the operating system to search for executable files when a command is entered.

This process allows the user to run a command from anywhere on the system, without having to specify the full path to the executable file. It also allows the user to customize the system by adding directories to the PATH variable, which can be useful for installing custom software or modifying the behaviour of existing programs.

Here's how the shell uses the PATH environment variable to find a program:

The user types a command into the shell, such as ls.
The shell looks up the value of the PATH environment variable.
The shell splits the PATH variable into individual directory names based on the : delimiter. For example, if PATH is set to /usr/local/bin:/usr/bin:/bin, the shell will split it into the following directories: /usr/local/bin, /usr/bin, and /bin.
The shell looks for the executable file in each directory in the PATH variable, in the order listed. For example, the shell would look for ls in the following directories: /usr/local/bin/ls, /usr/bin/ls, and /bin/ls.
If the shell finds the executable file, it executes it. If the shell does not find the executable file in any of the directories listed in PATH, it displays an error message.
For example, suppose you have a program called myprogram installed in the directory /usr/local/bin. If the PATH variable is set to /usr/local/bin:/usr/bin:/bin, and you enter the command myprogram in a shell, the shell will search for an executable file called myprogram in the directories /usr/local/bin, /usr/bin, and /bin, in that order. If it finds an executable file called myprogram in the directory /usr/local/bin, it will execute that file.

You can modify the PATH variable to include additional directories where executable files may be located. For example, if you install a program in the directory /opt/myprogram, you can add that directory to the PATH variable by appending the following line to your shell configuration file (such as .bashrc or .zshrc):

export PATH=$PATH:/opt/myprogram

This adds the directory /opt/myprogram to the end of the PATH variable, so that the shell will search for executable files in that directory as well.

In this part, we have explored the basics of creating processes in C programming language. We have discussed the three prototypes of the main function, which is the entry point of any C program. Additionally, we have looked at how the shell uses the PATH environment variable to find programs when commands are entered.

In the next part of this blog series, we will delve deeper into the topic of process creation and management by looking at how to execute another program using the execve system call. We will also explore how to suspend the execution of a process until one of its children terminates.

Go to Part 1 : PART 1 Simple Shell

Go to Part 3 : PART 3 Simple Shell

Shaza’s Substack

Discussion about this post