GSoC Reports: Make system(3) and popen(3) use posix_spawn(3) internally, Part 1


July 13, 2020 posted by Kamil Rytarowski

This report was prepared by Nikita Gillmann as a part of Google Summer of Code 2020

This is my first report for the Google Summer of Code project. I am working on for NetBSD.

Prior work: In GSoC 2012 Charles Zhang added the posix_spawn syscall which according to its SF repository at the time (maybe even now, I have not looked very much into comparing all other systems and libcs + kernels) is an in-kernel implementation of posix_spawn which provides performance benefits compared to FreeBSD and other systems which had a userspace implementation (in 2012).

After 1 week of reading POSIX and writing code, 2 weeks of coding and another 1.5 weeks of bugfixes I have successfully implemented posix_spawn in usage in system(3) and popen(3) internally.

The biggest challenge for me was to understand POSIX, to read the standard. I am used to reading more formal books, but I can't remember working with the posix standard directly before.

The next part of my Google Summer of Code project will focus on similar rewrites of NetBSD's sh(1).

system(3)

The prototype

int system(const char *command);

remains the same. Below I'm just commenting on the differences, not the whole function, and therefore only include code block where the versions differ the most. The full work can be found at gsoc2020 as well as src and will be submitted for inclusion later in the project.

Previously we'd use vfork, sigaction, and execve in this stdlib function.

The biggest difference to the 2015 version of our system version is the usage of posix_spawnattr_ where we'd use sigaction before, and posix_spawn where execve executes the command in a vfork'd child:

   posix_spawnattr_init(&attr);
   posix_spawnattr_setsigmask(&attr, &omask);
   posix_spawnattr_setflags(&attr, POSIX_SPAWN_SETSIGDEF|POSIX_SPAWN_SETSIGMASK);
   (void)__readlockenv();
   status = posix_spawn(&pid, _PATH_BSHELL, NULL, &attr, __UNCONST(argp), environ);
   (void)__unlockenv();
   posix_spawnattr_destroy(&attr);

The full version can be found here.

The prototype of posix_spawn is:

int posix_spawn(pid_t *restrict pid, const char *restrict path, const posix_spawn_file_actions_t *file_actions, const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]);

We first initialize a spawn attributes object with the default value for all of its attributes set. A spawn attributes object is used to modify the behavior of posix_spawn().

The previous fork-exec switch included calls to sigaction to set the behavior associated with SIGINT and SIGQUIT as defined by POSIX:

The system() function shall ignore the SIGINT and SIGQUIT signals, and shall block the SIGCHLD signal, while waiting for the command to terminate. If this might cause the application to miss a signal that would have killed it, then the application should examine the return value from system() and take whatever action is appropriate to the application if the command terminated due to receipt of a signal. source: https://pubs.opengroup.org/onlinepubs/9699919799/functions/system.html

This has been achieved with a combination of posix_spawnattr_setsigmask() and posix_spawnattr_setflags() in the initialized attributes object referenced by attr.

As before we call __readlockenv() and then call posix_spawn() which returns the process ID of the child process in the variable pointed to by 'pid', and returns zero as the function return value.

The old code:

   (void)__readlockenv();
   switch(pid = vfork()) {
   case -1:                        /* error */
           (void)__unlockenv();
           sigaction(SIGINT, &intsa, NULL);
           sigaction(SIGQUIT, &quitsa, NULL);
           (void)sigprocmask(SIG_SETMASK, &omask, NULL);
           return -1;
   case 0:                         /* child */
           sigaction(SIGINT, &intsa, NULL);
           sigaction(SIGQUIT, &quitsa, NULL);
           (void)sigprocmask(SIG_SETMASK, &omask, NULL);
           execve(_PATH_BSHELL, __UNCONST(argp), environ);
           _exit(127);
   }
   (void)__unlockenv();

popen(3), popenve(3)

As with system, the prototype of both functions remains the same:

FILE * popenve(const char *cmd, char *const *argv, char *const *envp, const char *type); FILE * popen(const char *cmd, const char *type);

Since the popen and popenve implementation in NetBSD's libc use a couple of shared helper functions, I was able to change both functions while keeping the majority of the changes focused on (some of ) the helper functions (pdes_child).

pdes_child, an internal function in popen.c, now takes one more argument (const char *cmd) for the command to pass to posix_spawn which is called in pdes_child.

pdes_child previously looked like this:

static void
pdes_child(int *pdes, const char *type)
{
        struct pid *old;

        /* POSIX.2 B.3.2.2 "popen() shall ensure that any streams
           from previous popen() calls that remain open in the 
           parent process are closed in the new child process. */
        for (old = pidlist; old; old = old->next)
#ifdef _REENTRANT
                (void)close(old->fd); /* don't allow a flush */
#else
                (void)close(fileno(old->fp)); /* don't allow a flush */
#endif

        if (type[0] == 'r') {
                (void)close(pdes[0]);
                if (pdes[1] != STDOUT_FILENO) {
                        (void)dup2(pdes[1], STDOUT_FILENO);
                        (void)close(pdes[1]);
                }
                if (type[1] == '+')
                        (void)dup2(STDOUT_FILENO, STDIN_FILENO);
        } else {
                (void)close(pdes[1]);
                if (pdes[0] != STDIN_FILENO) {
                        (void)dup2(pdes[0], STDIN_FILENO);
                        (void)close(pdes[0]);
                }
        }
}

This is the new version (the whole file is here):

static int
pdes_child(int *pdes, const char *type, const char *cmd)
{
        struct pid *old;
        posix_spawn_file_actions_t file_action_obj;
        pid_t pid;
        const char *argp[] = {"sh", "-c", NULL, NULL};
        argp[2] = cmd;
        int error;

        error = posix_spawn_file_actions_init(&file_action_obj);
        if (error) {
                goto fail;
        }
        /* POSIX.2 B.3.2.2 "popen() shall ensure that any streams
           from previous popen() calls that remain open in the
           parent process are closed in the new child process. */
        for (old = pidlist; old; old = old->next)
#ifdef _REENTRANT
        error = posix_spawn_file_actions_addclose(&file_action_obj, old->fd); /* don't allow a flush */
        if (error) {
                goto fail;
        }
#else
        error = posix_spawn_file_actions_addclose(&file_action_obj, fileno(old->fp)); /* don't allow a flush */
        if (error) {
                goto fail;
        }
#endif
        if (type[0] == 'r') {
                error = posix_spawn_file_actions_addclose(&file_action_obj, pdes[0]);
                if (error) {
                        goto fail;
                }
                if (pdes[1] != STDOUT_FILENO) {
                        error = posix_spawn_file_actions_adddup2(&file_action_obj, pdes[1], STDOUT_FILENO);
                        if (error) {
                                goto fail;
                        }
                        error = posix_spawn_file_actions_addclose(&file_action_obj, pdes[1]);
                        if (error) {
                                goto fail;
                        }
                }
                if (type[1] == '+') {
                        error = posix_spawn_file_actions_adddup2(&file_action_obj, STDOUT_FILENO, STDIN_FILENO);
                        if (error) {
                                goto fail;
                        }
                }
        } else {
                error = posix_spawn_file_actions_addclose(&file_action_obj, pdes[1]);
                if (error) {
                        goto fail;
                }
                if (pdes[0] != STDIN_FILENO) {
                        error = posix_spawn_file_actions_adddup2(&file_action_obj, pdes[0], STDIN_FILENO);
                        if (error) {
                                goto fail;
                        }
                        error = posix_spawn_file_actions_addclose(&file_action_obj, pdes[0]);
                        if (error) {
                                goto fail;
                        }
                }
        }
        (void)__readlockenv();
        error = posix_spawn(&pid, _PATH_BSHELL, &file_action_obj, 0, __UNCONST(argp), environ);
        if (error) {
                (void)__unlockenv();
                goto fail;
        }
        (void)__unlockenv();
        error = posix_spawn_file_actions_destroy(&file_action_obj);
        /*
         * TODO: if _destroy() fails we have to go on, otherwise we
         * leak the pid.
         */
        if (error) {
                errno = error;
                return -1;
        }
        return pid;

fail:
        errno = error;
        posix_spawn_file_actions_destroy(&file_action_obj);
        return -1;
}

On a high level what happens in pdes_child() and popen is that we first lock the pidlist_mutex. Then we create a file file action list for all concurrent popen() / popenve() instances and the side of the pipe not necessary, and the move to stdin/stdout. We unlock the pidlist_mutex. Finally we return the list and destroy.

In the new version of this helper function which now handles the majority of what popen/popenve did, we have to initialize a file_actions object which by default contains no file actions for posix_spawn() to perform. Since we have to have error handling and a common return value for the functions calling pdes_child() and deconstruction, we make use of goto in some parts of this function.

The close() and dup2() actions now get replaced by corresponding file_actions syscalls, they are used to specify a series of actions to be performed by a posix_spawn operation.

After this series of actions, we call _readlockenv(), and call posix_spawn with the file_action object and the other arguments to be executed. If it succeeds, we return the pid of the child to popen, otherwise we return -1, in both cases we destroy the file_action object before we proceed.

In popen and popenve our code has been reduced to just the 'pid == -1' branch, everything else happens in pdes_child() now.

After readlockenv we call pdes_child and pass it the command to execute in the posix_spawn'd child process; if pdes_child returns -1 we run the old error handling code. Likewise for popenve.

popen, old:

FILE *
popen(const char *cmd, const char *type)
{
        struct pid *cur;
        int pdes[2], serrno;
        pid_t pid;

        _DIAGASSERT(cmd != NULL);
        _DIAGASSERT(type != NULL);

        if ((cur = pdes_get(pdes, &type)) == NULL)
                return NULL;

        MUTEX_LOCK();
        (void)__readlockenv();
        switch (pid = vfork()) {
        case -1:                        /* Error. */
                serrno = errno;
                (void)__unlockenv();
                MUTEX_UNLOCK();
                pdes_error(pdes, cur);
                errno = serrno;
                return NULL;
                /* NOTREACHED */
        case 0:                         /* Child. */
                pdes_child(pdes, type);
                execl(_PATH_BSHELL, "sh", "-c", cmd, NULL);
                _exit(127);
                /* NOTREACHED */
        }
        (void)__unlockenv();

        pdes_parent(pdes, cur, pid, type);

        MUTEX_UNLOCK();

        return cur->fp;
}

popen, new:

FILE *
popen(const char *cmd, const char *type)
{
        struct pid *cur;
        int pdes[2], serrno;
        pid_t pid;

        _DIAGASSERT(cmd != NULL);
        _DIAGASSERT(type != NULL);

        if ((cur = pdes_get(pdes, &type)) == NULL)
                return NULL;

        MUTEX_LOCK();
        (void)__readlockenv();
        pid = pdes_child(pdes, type, cmd);
        if (pid == -1) {
                /* Error. */
                serrno = errno;
                (void)__unlockenv();
                MUTEX_UNLOCK();
                pdes_error(pdes, cur);
                errno = serrno;
                return NULL;
                /* NOTREACHED */
        }
        (void)__unlockenv();

        pdes_parent(pdes, cur, pid, type);

        MUTEX_UNLOCK();

        return cur->fp;
}

popenve, old:

FILE *
popenve(const char *cmd, char *const *argv, char *const *envp, const char *type)
{
        struct pid *cur;
        int pdes[2], serrno;
        pid_t pid;

        _DIAGASSERT(cmd != NULL);
        _DIAGASSERT(type != NULL);

        if ((cur = pdes_get(pdes, &type)) == NULL)
                return NULL;

        MUTEX_LOCK();
        switch (pid = vfork()) {
        case -1:                        /* Error. */
                serrno = errno;
                MUTEX_UNLOCK();
                pdes_error(pdes, cur);
                errno = serrno;
                return NULL;
                /* NOTREACHED */
        case 0:                         /* Child. */
                pdes_child(pdes, type);
                execve(cmd, argv, envp);
                _exit(127);
                /* NOTREACHED */
        }

        pdes_parent(pdes, cur, pid, type);

        MUTEX_UNLOCK();

        return cur->fp;
}

popenve, new:

FILE *
popenve(const char *cmd, char *const *argv, char *const *envp, const char *type)
{
        struct pid *cur;
        int pdes[2], serrno;
        pid_t pid;

        _DIAGASSERT(cmd != NULL);
        _DIAGASSERT(type != NULL);

        if ((cur = pdes_get(pdes, &type)) == NULL)
                return NULL;

        MUTEX_LOCK();
        pid = pdes_child(pdes, type, cmd);
        if (pid == -1) {
                /* Error. */
                serrno = errno;
                MUTEX_UNLOCK();
                pdes_error(pdes, cur);
                errno = serrno;
                return NULL;
                /* NOTREACHED */
        }

        pdes_parent(pdes, cur, pid, type);

        MUTEX_UNLOCK();

        return cur->fp;
}

Originally published on https://n0.is/gsoc-2020-report-1.html.

[0 comments]

 



Post a Comment:
  • HTML Syntax: NOT allowed