Date: Mon, 31 Mar 2003 11:32:29 -0500 From: Joseph Formoso Subject: Possible patch for stunnel 4.04 running on IRIX Folks, First, much praise for providing stunnel. It's way cool. The "patch" I've got is extremely experimental (though it also has the benefit of being very small), and likely applies only to people running stunnel on an IRIX system. We compiled stunnel 4.04 against a patched OpenSSL 0.9.7a, and it exhibited odd behavior -- it keeled over dead after the first connection (we're using it to wrap IMAP and POP3 connections). After poking around a little, we traced the process and found that, on the exit of the first child process, there were many hundreds of "SIGCHLD received" messages in the trace log, and then a SIGKILL. The cause for this, it seems, is that IRIX (for reasons unknown but likely dubious) chose, as its means of compatibility with other systems, to implement SYSV SIGCLD signal semantics and map SIGCHLD to it. In short, at the time that a signal handler for SIGCHLD (or SIGCLD; under IRIX, they're synonyms) is established, the kernel checks to see if a SIGCHLD is pending for the process. The sigchld_handler in sselect.c of stunnel is thus: -------------------------------------------------------------- static void sigchld_handler(int sig) { /* SIGCHLD detected */ int save_errno=errno; write(signal_pipe[1], signal_buffer, 1); signal(SIGCHLD, sigchld_handler); errno=save_errno; } -------------------------------------------------------------- Since there's no wait() before the signal() call reestablishes the handler, reestablishing the handler causes the kernel to check immediately to see if a SIGCHLD is pending (which it is), and fire *another* SIGCHLD signal. Which is similarly caught and causes another one, until the stack blows and the kernel kills the process. The simplest way to fix this is to make sure there's some style of wait() before the handler is reestablished, like this: -------------------------------------------------------------- static void sigchld_handler(int sig) { /* SIGCHLD detected */ int save_errno=errno; pid_t pid; int status; write(signal_pipe[1], signal_buffer, 1); pid = waitpid(-1, &status, WNOHANG); while (pid > 0) { pid = waitpid(-1, &status, WNOHANG); } signal(SIGCHLD, sigchld_handler); errno=save_errno; } -------------------------------------------------------------- This, it seems, allows stunnel to run normally on our system (at least, so far as we've tested it). I'm not sure how you get away without having *any* wait()-type call, so I'm obviously not entirely clear on how everything is working. Still, the (trivial) patch to make the change is offered (into the public domain) in case it happens to be useful to you or anyone else: