[BusyBox] Init does not always reap child processes

Jason Schoon floydpink at gmail.com
Fri Jul 29 10:02:35 MDT 2005


I agree, it's a weird scenario.

We actually want init to wait, because the one action following the
wait call is a call to reboot.  Essentially, if anything goes wrong or
causes the main app to exit, we want the thing to reboot.  This is an
embedded system, and in production mode we don't want them
unceremoniously ending up at a shell prompt.



On 7/29/05, Chris Kottaridis <chriskot at quietwind.net> wrote:
> I'm a bit confused. If you tell init to wait then it seems like it ought
> to wait. If your application is going to run forever couldn't it fork
> itself and have the parent exit. That would free up the init to
> continue.
> 
> Standard systems use wait to run startup scripts that execute
> applications in the background and then exit themselves. Seems like you
> could add a small script to fire up your application from a startup
> script and have init wait on that. Of course then you might need to
> compile in the ash shell, which you may not otherwise want.
> 
> Also, seems like you could use "respawn". Seems like respawn allows init
> to go on to the next step while the application continues running. If
> your application runs forever it will only get run the one time.
> 
> It seems wrong to change the way wait works, because there may be people
> that use wait that actually want init to wait and not reap children.
> 
> Just my opinion.
> 
>     Chris Kottaridis    (chris.kottaridis at windriver.com)
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> 
> 
> On Fri, 2005-07-29 at 10:02 -0500, Jason Schoon wrote:
> > We had some issues on our embedded system where init did not always
> > reap child processes, resulting in zombies.
> >
> > This happened because in a production mode we launch our main
> > application from a wait line in inittab, and run that forever.  This
> > caused the init code to never actually reach the loop at the bottom
> > that reaps child processes, because it is stopped processing the wait
> > command.
> >
> > This is almost exactly the scenario desribed in this post from a year ago:
> > http://www.busybox.net/lists/busybox/2004-August/012420.html
> >
> > My fix was to enable SIGCHLD handling while in a wait action, and then
> > disable it again upon leaving the wait action.  The child handler
> > simply does a waitpid() without hanging around.
> >
> > diff -ruN --strip-trailing-cr init-trunk/init.c init/init.c
> > --- init-trunk/init.c 2005-07-28 16:07:06.000000000 -0500
> > +++ init/init.c       2005-07-28 16:13:52.000000000 -0500
> > @@ -185,6 +185,7 @@
> >
> >  /* Function prototypes */
> >  static void delete_init_action(struct init_action *a);
> > +static void child_handler(int sig);
> >  static int waitfor(const struct init_action *a);
> >  static void halt_signal(int sig);
> >
> > @@ -625,6 +626,12 @@
> >       return pid;
> >  }
> >
> > +static void child_handler(int sig)
> > +{
> > +    /* Reap any already available children */
> > +    (void)waitpid (-1, NULL, WNOHANG);
> > +}
> > +
> >  static int waitfor(const struct init_action *a)
> >  {
> >       int pid;
> > @@ -654,7 +661,11 @@
> >               tmp = a->next;
> >               if (a->action == action) {
> >                       if (a->action & (SYSINIT | WAIT | CTRLALTDEL | SHUTDOWN | RESTART)) {
> > +                if (a->action & WAIT)
> > +                    signal(SIGCHLD, child_handler);
> >                               waitfor(a);
> > +                if (a->action & WAIT)
> > +                    signal(SIGCHLD, SIG_DFL);
> >                               delete_init_action(a);
> >                       } else if (a->action & ONCE) {
> >                               run(a);
> > _______________________________________________
> > busybox mailing list
> > busybox at mail.busybox.net
> > http://busybox.net/mailman/listinfo/busybox
>


More information about the busybox mailing list