Rexdf

The devil is in the Details.

记录一次ovz下的ssh服务启动失败

| Comments

本周的最近一次升级服务器后,某些国内nat主机商的ovz出现了不能登录的现象。

因为里面跑的其他几个服务都还正常,如果不是因为习惯登录进服务器进行apt update && apt -y dist-upgrade还不会发现这个问题。

环境详情: 国外的几家ovz都没有这个现象,估计还和母鸡有关。遇到问题的是Ubuntu 16.04,其openssh-server版本如下

openssh-server/xenial-updates,xenial-security,now 1:7.2p2-4ubuntu2.6

刚某个服务商支持串口,所以研究了如何绕过这个问题。通过串口登录进去,查看,ssh启动失败的日志如下,以及尝试手动启动ssh服务。

$ systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
Active: failed (Result: start-limit-hit) since Thu 2018-11-22 01:47:12 EST; 4min 0s ago
Process: 304 ExecStartPre=/usr/sbin/sshd -t (code=exited, status=255)
Nov 22 01:47:12 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 22 01:47:12 systemd[1]: ssh.service: Unit entered failed state.
Nov 22 01:47:12 systemd[1]: ssh.service: Failed with result 'exit-code'.
Nov 22 01:47:12 systemd[1]: ssh.service: Service hold-off time over, scheduling restart.
Nov 22 01:47:12 systemd[1]: Stopped OpenBSD Secure Shell server.
Nov 22 01:47:12 systemd[1]: ssh.service: Start request repeated too quickly.
Nov 22 01:47:12 systemd[1]: Failed to start OpenBSD Secure Shell server.
Nov 22 01:47:12 systemd[1]: ssh.service: Unit entered failed state.
Nov 22 01:47:12 systemd[1]: ssh.service: Failed with result 'start-limit-hit'.
$ systemctl start ssh
Job for ssh.service failed because the control process exited with error code. See "systemctl status ssh.service" and "journalctl -xe" for details.
$ journalctl -xe
--
-- Unit ssh.service has finished shutting down.
Nov 22 01:52:13 systemd[1]: Starting OpenBSD Secure Shell server...
-- Subject: Unit ssh.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ssh.service has begun starting up.
Nov 22 01:52:13 sshd[509]: Missing privilege separation directory: /var/run/sshd
Nov 22 01:52:13 systemd[1]: ssh.service: Control process exited, code=exited status=255
Nov 22 01:52:13 systemd[1]: Failed to start OpenBSD Secure Shell server.
-- Subject: Unit ssh.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ssh.service has failed.
--
-- The result is failed.
Nov 22 01:52:13 systemd[1]: ssh.service: Unit entered failed state.

其中最关键的是Missing privilege separation directory: /var/run/sshd,发现这个目录确实不存在,创建这个目录,然后systemctl start ssh居然就成功了! 重启,sshd依然没有启动! 看了下这个目录又没有了!

由于没有心情研究具体出问题的是在openssh-server这个包上,还是其他库比如libc上(升级时提示libc版本号不足)。接下来就是如何把这个过程自动化。 我的解决办法是在/etc/rc.local加上如下内容:

sleep 15
if [ ! -d /var/run/sshd ]; then mkdir /var/run/sshd; fi
/bin/systemctl start ssh

然后就好了。算了,能用就好!

Comments