What Are Subreapers In Linux?

May 17, 2023

223

This article touches upon zombie and orphan processes, and the corresponding re-parenting from the perspective of a Linux based operating system. It then discusses a relatively newer concept related with re-parenting called subreapers.

Zombie and orphan processes are popular concepts that are commonly discussed among system programmers and operating system developers. So what are zombie processes and why does the kernel maintain them? To answer this question, we have to first understand the basics of process cleanup/termination in the Linux kernel.

In Linux, a process is a running instance of a program or application. It can be seen as an individual task that is being executed on the system. Every process in Linux has a unique process ID (PID), which is a number assigned to it by the operating system. The PID can be used to identify and manage the process. If we close or terminate a program or application, the process exits/terminates.

When a process exits/terminates in Linux, the kernel doesn’t perform the complete clean-up of the process from the memory immediately; instead, its process descriptor stays in the memory. The kernel defers the process clean-up of the child process until its parent/ancestor has reaped the child process via a wait(2) series system call. In order to do this, the kernel updates the child process state as zombie (the kernel maintains the EXIT_ZOMBIE state) and the process’s parent/ancestor is notified that its child process has been terminated/exited via the SIGCHLD signal (refer to the box that follows to know more about SIGCHILD). The parent/ancestor process is then supposed to execute the wait(2) series system call to read the terminated child process’s exit status and other information. This allows the parent process to get information from the exited/terminated child process. After wait(2) is called, the zombie process is marked as a dead process (EXIT_DEAD), and the kernel performs further clean-up and removes the process from the memory.

But what if the parent process didn’t reap its child processes on termination? Well, if the parent process isn’t programmed properly and never calls wait(2) series system calls, the system may be filled with zombie processes. The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait(2) to obtain information about the child. As long as a zombie is not removed from the system via a wait(2) series system call, it will consume a slot in the kernel process table, and if this table fills up, it will not be possible to create further processes.

In a multithreaded application, if the main thread has exited and other threads in the thread group are still running, the kernel hasn’t performed the clean-up of the main thread. Instead, the kernel has updated the status of the main thread as zombie (EXIT_ZOMBIE). The kernel defers the clean-up of the main thread till the execution of the last thread in the thread group.
The kernel sends a SIGCHILD signal to the parent/ancestor process when the child process changes its state, or in more generic terms when the child process stops/resumes or terminates.
The default action of SIGCHILD is ignored. However, when explicitly set to SIG_IGN using sigaction(2) system call, the children that terminate do not become zombies.
If SA_NOCLDWAIT flag is set while installing the handler of SIGCHILD through a siagction(2) system call, it prevents transforming the child process to zombies upon termination.

Here is what the man page explains: “POSIX.1-2001 specifies that if the disposition of SIGCHLD is set to SIG_IGN or the SA_NOCLDWAIT flag is set for SIGCHLD (see sigaction(2)), then children that terminate do not become zombies and a call to wait() or waitpid() will block until all children have terminated, and then fail with errno set to ECHILD.”

Orphan processes

As the name suggests, an orphan process is a child process whose parent has exited or been terminated, but the process is still running. In Linux, orphan processes by default re-parent to init(1) processes. Further, the init(1) process will be responsible for reaping the newly re-parented child process. The init(1) process periodically invokes a wait(2) system call for clean-up of all its zombie child processes. However, there are a few exceptions when the orphan processes are not re-parented to init(1). One such case is a multithreaded application (see the box below) and another is a relatively new feature called subreapers, which we will discuss in detail in the sections that follow.

In a multithreaded application, if a parent terminates/exits before the child, the kernel first tries to re-parent the orphan child process to one of the executing threads in the parent’s thread group. If it doesn’t find any suitable thread in the parent thread group, then it tries to re-parent the orphan process to any subreaper in the parent ancestor’s hierarchy or init process.
Basically, the kernel tries to perform the following steps for re-parenting:

Re-parent the orphan child process to a suitable thread (thread which is not marked EXITING) in the parent thread group
Re-parent it to the first ancestor process, which prctl’d itself as a child_subreaper for its children
Re-parent it to the init process (PID 1)

Subreapers

So now we understand that upon termination of a child process, its parent’s responsibility is to reap its child process (by invoking the wait(2) series system call). If the parent terminates/exits before the child process, the kernel re-parents the orphan child process to the init(1) process (in most cases). Now, the init(1) process will further reap the zombie child processes.

Suppose we have an application/daemon process, which forks the multilevel hierarchy of the parent child process— for example, (parent)-(child)-(grandchild), etc. In this case, if the child terminates first, the kernel will re-parent the grandchild to the init(1) process. And when grandchild will exit, init(1) process will further reap the grandchild process. Hence, all the information about the grandchild process will be lost the moment init(1) cleans up the re-parented processes.

To remove such limitations/flaws in the re-parenting approach, Linux kernel 3.4 has modified the prctl(2) system call and implemented a new flag ‘PR_SET_CHILD_SUBREAPER’.

With this modification in the prctl(2) system call, a process can define itself as a subreaper with prctl(PR_SET_CHILD_SUBREAPER). If so, it’s not init(1) that will become the parent of orphaned child processes; instead, the nearest living grandparent that is marked as a subreaper will become the new parent. If there is no living grandparent, init(1) will become the parent.

Please check the prtcl(2) man page for details about the prctl(2) system call and how to set/unset the PR_SET_CHILD_SUBREAPER flag (see the box below).

Here is what the man page of prctl(2) explains:

“A subreaper fulfils the role of init(1) for its descendant processes. When a process becomes orphaned (i.e., its immediate parent terminates), then that process will be re-parented to the nearest still living ancestor subreaper. Subsequently, calls to getppid(2) in the orphaned process will now return the PID of the subreaper process, and when the orphan terminates, it is the subreaper process that will receive a SIGCHLD signal and will be able to wait(2) on the process to discover its termination status. The setting of the ‘child subreaper’ attribute is not inherited by children created by fork(2) and clone(2). The setting is preserved across execve(2).

“Establishing a subreaper process is useful in session management frameworks where a hierarchical group of processes is managed by a subreaper process that needs to be informed when one of the processes—for example, a double-forked daemon—terminates (perhaps so that it can restart that process). Some init(1) frameworks (e.g., systemd(1)) employ a subreaper process for similar reasons.”

Now let’s summarise the subreaper concept with a sample program, which will perform the sequence of activities given below:

fork() Parent-Child-Grandchild processes
Parent will wait(2) for all its descendants
Child process will terminate first and grandchild will become orphan
Parent will return from wait(2) as child has been exited
Kernel will perform the re-parenting of grandchild with init(1)
After that, grandchild will terminate
Parent process will invoke prctl(2) with PR_SET_CHILD_SUBREAPER flag
Parent process will become subreaper
Parent will again invoke fork() to create child and grandchild in a similar manner: Parent-Child-Grandchild
Parent will wait(2) for all its descendants
Child process will terminate first and grandchild will become orphan
Parent will return from wait(2) as child has been exited
Since the parent process can act as a subreaper, the kernel will re-parent the grandchild process with the parent process
After that, grandchild will terminate
Parent will return from wait(2) as grandchild (new child) has been exited
After that, parent process will also exit

We create a file subreaper.c and type the following code in it:

int fork_grandchild(void)
{
int pid,ppid;
printf (“FORKING GRANDCHILD\n”);
pid = fork();
switch(pid)
{
case -1:
printf(“FORK FAILED\n”);
exit(-1);
case 0:
ppid = getppid();
sleep(2);
printf(“HIERARCHY AFTER CHILD(%d) EXITED\n”,ppid);
printf(“PARENT(%d)===>GRANDCHILD(%d)\n”, getppid(), getpid());
printf(“GRANDCHILD(%d): EXITED\n”, getpid());
exit(0);
default:
return pid;
}
}

int fork_child(void)

{
int pid;
printf (“FORKING CHILD \n”);
pid = fork();
switch(pid)
{
case -1:
printf(“FORK FAILED\n”);
exit(-1);
case 0:
sleep(1);
pid = fork_grandchild();
printf(“INITIAL HIERARCHY\n”);
printf(“ PARENT(%d)===>CHILD(%d)===>GRANDCHILD(%d)\n”,getppid(), getpid(), pid);
sleep(1);
printf(“ CHILD(%d): EXITED\n”, getpid());
exit(0);
default:
return pid;
}
}

void wait_for_descendents()

{
while(1)
{
int pid = wait(NULL);
if(pid == -1)
{
printf(“PARENT(%d): NO MORE CHILD \n”, getpid());
break;
} else
printf(“PARENT(%d)===>CHILD(%d) EXITED\n”, getpid(), pid);

}
}

int main(void)
{
int pid;

printf(“PARENT(%d)\n”, getpid());

pid = fork_child();
printf(“PARENT(%d)====>CHILD(%d)\n”, getpid(), pid);
wait_for_descendents();
sleep(1);

/* set subreaper property*/

prctl(PR_SET_CHILD_SUBREAPER, 1, 0, 0, 0);
printf(“PARENT(%d): I AM SUBREAPER DONT RE PARENT TO INIT\n”, getpid());
pid = fork_child();
printf(“PARENT(%d)====>CHILD(%d)\n”, getpid(), pid);
wait_for_descendents();
printf(“PARENT(%d): exiting \n”, getpid());
return 0;
}

Next, we will compile and execute the above code:

[shwetabh@localhost reaper]$ gcc subreaper.c
[shwetabh@localhost reaper]$ ./a.out
PARENT(31156)
FORKING CHILD
PARENT(31156)====>CHILD(31157)
FORKING GRANDCHILD
INITIAL HIERARCHY
PARENT(31156)===>CHILD(31157)===>GRANDCHILD(31179)
CHILD(31157): EXITED
PARENT(31156)===>CHILD(31157) EXITED
PARENT(31156): NO MORE CHILD
HIERARCHY AFTER CHILD(31157) EXITED
PARENT(1)===>GRANDCHILD(31179) /*init(1) become the parent*/
GRANDCHILD(31179): EXITED
PARENT(31156): I AM SUBREAPER DONT RE PARENT TO INIT
FORKING CHILD
PARENT(31156)====>CHILD(31222)
FORKING GRANDCHILD
INITIAL HIERARCHY
PARENT(31156)===>CHILD(31222)===>GRANDCHILD(31243)
CHILD(31222): EXITED
PARENT(31156)===>CHILD(31222) EXITED
HIERARCHY AFTER CHILD(31222) EXITED
PARENT(31156)===>GRANDCHILD(31243) /*subreaper become the parent */
GRANDCHILD(31243): EXITED
PARENT(31156)===>CHILD(31243) EXITED
PARENT(31156): NO MORE CHILD
PARENT(31156): exiting

I do hope these insights will be helpful in handling zombie and re-parenting related challenges while developing or designing system software.

Orphan processes

Subreapers

LEAVE A REPLY Cancel reply

Thought Leaders

HOW TOs

MOST POPULAR

Open Journey

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY