Quantcast
Viewing all articles
Browse latest Browse all 5091

Troubleshooting • Re: watchdog process restart

Watchdog isn't really the right tool for the job here, I think, at least not on its own. The watchdog package and daemon are really intended to deal with hangs and crashes at a system level, not a process/service level. It's also not going to detect all hangs of your daemon/service process, the pidfile option will only detect crashes, not a service process that has stopped responding but is still running from an OS point of view.

Systemd has more useful functionality for this. You can set "Restart=on-failure", for example, to automatically restart the process if it exits with an unclean exit code. It also has its own watchdog functionality, which can actually detect a hung service, but that requires the service process to be systemd-aware and actually have systemd-specific watchdog code in the service daemon that sends a periodic ping from the daemon to systemd to let it know it's still alive. See systemd.service(5) for details.

If the server daemon does not have support for systemd's service watchdog, you need a second daemon process which checks that the main process is alive and accepting/handling connections (and optionally tell systemd to restart the service if it detects a problem). The standalone watchdog package does not do this, it's not really designed or intended to properly monitor services. You might want to take a look at the check_mumble plugin for Nagios, for example. I'm not sure if check_mumble is any good, Nagios plugins vary in functionality and quality, it's just to point you in the general direction of solutions which do real service monitoring. At a minimum, for an external monitoring process, you need something that actually connects to the service daemon like a client and checks that it accepts the connection and gives a normal response.

Statistics: Posted by Murph9000 — Tue Sep 03, 2024 8:45 pm



Viewing all articles
Browse latest Browse all 5091

Trending Articles