Support for chdir(2) in posix_spawn(3)


June 10, 2021 posted by Martin Husemann

This post was written by Piyush Sachdeva:

What really happens when you double click an icon on your desktop?

Support for chdir(2) in posix_spawn(3)

Processes are the bread and butter of your operating system. The moment you double click an icon, that particular program gets loaded in your Random Access Memory (RAM) and your operating system starts to run it. At this moment the program becomes a process. Though you can only see the execution of your process, the operating system (the Kernel) is always running a lot of processes in the background to facilitate you.

From the moment you hit that power button, everything that happens on the screen is the result of some or the other process. In this post we are going to talk about one such interface which helps in creation of your programs.

The fork() & exec() shenanigans

The moment a computer system comes alive, it launches a bunch of processes. For the purpose of this blog let’s call them, ‘the master processes’. These processes run in perpetuity, provided the computer is switched on. One such process is init/systemd/launchd (depending on your OS). This ‘init’ master process owns all the other processes in the computer, either directly or indirectly.

Operating systems are elegant, majestic software that work seamlessly under the hood. They do so much without even breaking a sweat (unless it’s Windows). Let's consider a scenario where you have decided to take a trip down memory lane and burst open those old photos. The ‘init master process’ just can’t terminate itself and let you look at your photos. What if you unknowingly open a malicious file, which corrupts all your data? So init doesn’t just exit, rather it employs fork() and exec() to start a new process. The fork() function is used to create child processes which are an exact copy of their parents. Whichever process calls fork, gets duplicated. The newly created process becomes the child of the original running process and the original running process is called the parent. Just how parents look after their kids, the parent process makes sure that the child process doesn't do any mischief. So now you have two exactly similar processes running in your computer.

Flowchart of fork() + exec()

One might think that the newly created child process doesn’t really help us. But actually, it does. Now exec() comes into the picture. What exec() does is, it replaces any process which calls it. So what if we replace the child process, the one we just thought to be useless, with our photos? That's exactly what we are going to do indeed. This will result in replacement of the fork() created child process with your photos. Therefore, the master init process is still running and you can also enjoy your photos with no threat to your data.

“Neither abstraction nor simplicity is a substitute for getting it right. Sometimes, you just have to do the right thing, and when you do, it is way better than the alternatives. There are lots of ways to design APIs for process creation; however, the combination of fork() and exec() is simple and immensely powerful. Here, the UNIX designers simply got it right.” Lampson’s Law - Getting it Right

Now you could ask me, `But what about the title, some ‘posix_spawn()’ thing?´ Don’t worry, that’s next.

posix_spawn()

posix_spawn() is an alternative to the fork() + exec() routine. It implements fork() and exec(), but not directly (as that would make it slow, and we all need everything to be lightning fast). What actually happens is that posix_spawn() only implements the functionality of the fork() + exec() routines, but in one single call. However, because fork() + exec() is a combination of two different calls, there is a lot of room for customization. Whatever software you are running on your computer, calls these routines on its own and does the necessary. Meanwhile a lot is cooking in the background. Between the call to fork() and exec() there is plenty of leeway for tweaking different aspects of the exec-ing process. But posix_spawn doesn’t bear this flexibility and therefore has a lot of limitations. It does take a lot of parameters to give the caller some flexibility, but it is not enough.

Now the question before us is, “If fork() + exec() is so much more powerful, then why have, or use the posix_spawn() routine?” The answer to that is, that fork() and exec() are UNIX system routines. They are not present in operating systems that are not a derivative of UNIX. Eg- Windows implements a family of spawn functions.
There is another issue with fork() (not exec() ), which in reality is one of the biggest reasons behind the growth of posix_spawn(). The outline of the issue is that, creating child processes in multi-threaded programs is a whole another ball game altogether.

Concurrency is one of those disciplines in operating systems where the order in which the cards are going to unravel is not always how you expect them to. Multi-threading in a program is a way to do different and independent tasks of a program simultaneously, to save time. No matter how jazzy or intelligent the above statement looks, multi-threaded programs require an eagle’s eye as they can often have a lot of holes. Though the “tasks” are different and independent, they often share a few common attributes. When these different tasks due to concurrency start running in parallel, a data race begins to access those shared attributes. To not wreak havoc, there are mechanisms through which, when modifying/accessing these common attributes (Critical Section) we can provide a sort of mutual exclusion (locks/conditional variables) - only letting one of the processes modify the shared attribute at a time. Here when things are already so intricate due to multithreading, and to top it off, we start creating child processes. Complications are bound to arise. When one of the threads from the multi-threaded program calls fork() to create a child process, fork() does clone everything (variables, their states, functions, etc) but it fails to clone other threads (though this is not required at all times).

The child process now knows only about that one thread which called fork(). But all the other attributes of the child that were inherited from the parent (locks, mutexes) are set from the parent’s address space (considering multiple threads). So there is no way for the child process to know which attributes conform to which parts of the parent. Also, those mechanisms that we used to provide mutual exclusion, like locks and conditional variables, need to be reset. This reset step is essential in letting the parent access it’s attributes. Failing this reset can cause deadlocks. To put it simply, you can see how difficult things have become all of a sudden. The posix_spawn() call is free from these limitations of fork() encountered in multi-threaded programs. However, as mentioned by me earlier, there needs to be enough rope to meet all the requirements before posix_spawn() does the implicit exec().

About my Project

Hi, I am Piyush Sachdeva and I am going to start a project which will focus on relaxing one limitation of posix_spawn - changing the current directory of the child process, before the said call to exec() is made. This is not going to restrict it to the parent’s current working directory. Just passing the new directory as one of the parameters will do the trick. Resolving all the impediments would definitely be marvelous. Alas! That is not possible. Every attempt to resolve even a single hindrance can create plenty of new challenges.

As already mentioned by me, posix_spawn() is a POSIX standard. Hence the effect of my project will probably be reflected in the next POSIX release. I came across this project through Google Summer of Code 2021. It was being offered by The NetBSD Foundation Inc. However, as the slots for Google Summer of Code were numbered, my project didn’t make the selection. Nevertheless, the Core Team at The NetBSD Foundation offered me to work on the project and even extended a handsome stipend. I will forever be grateful to The NetBSD Foundation for this opportunity.

Notes

  • init, systemd & launchd are system daemon processes. init is the historical process present since System III UNIX. systemd was the replacement for the authentic init which was written for the Linux kernel. launchd is the MacOS alternative of init/systemd.
  • This is taken from Operating Systems: The Three Easy Pieces book by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau.
  • UNIX is the original AT&T UNIX operating system developed at the Bell Labs research center, headed by Ken Thompson and Dennis Ritchie.

References

  1. Operating Systems: Three Easy Pieces by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau.
  2. Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen A. Rago.
  3. UNIX and Linux System Administration Handbook by Evi Nemeth, Garth Synder, Trent R. Hein, Ben Whaley and Dan Mackin.
[1 comment]

 



Comments:

I hope you're aware of this: https://www.austingroupbugs.net/view.php?id=1208 which is the (accepted, I believe) proposal for adding chdir and fchdir facilities to posix_spawn in the next posix release.

Posted by RhodiumToad on June 12, 2021 at 11:26 AM UTC #

Post a Comment:
Comments are closed for this entry.