zookeeper各类报错、缘由及解决方法汇总(持续更新)

[root@iZ23nn1p4mjZ zookeeper-3.4.10]# bin/zkCli.sh -server localhost:2181,localhost:2182,localhost:2183
Connecting to localhost:2181,localhost:2182,localhost:2183
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Welcome to ZooKeeper!
JLine support is enabled
[zk: localhost:2181,localhost:2182,localhost:2183(CONNECTING) 0] create -s /[sPacktPub]
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3
    at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:706)
    at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:599)
    at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:371)
    at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:331)
    at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:290)
[root@iZ23nn1p4mjZ zookeeper-3.4.10]# ls /

  上述问题近发生在单机三节点模式时。html

ZooKeeper自定义数据日志目录

  ZooKeeper在启动时会将zookeeper.out输出到当前目录,不只不友好,有时候可能会由于目录权限问题引起一些没必要要的麻烦。java

  解决方案:设置ZOO_LOG_DIR环境变量,也能够改在shell中写,以下:shell

if [ $ZOO_LOG_DIR ] ; then
export ZOO_LOG_DIR=/usr/local/zookeeper/logs
fi

if [ -z $ZOO_LOG_DIR ];then
export ZOO_LOG_DIR=/usr/local/zookeeper/logs
fi

  conf/log4j.properties中定义:apache

zookeeper.log.dir=/usr/local/zookeeper/logs
zookeeper.tracelog.dir=/usr/local/zookeeper/logs

  有时候在网络变化、抖动或者服务器负载高的状况下会出现下列异常:服务器

14:18:06.812 [main] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (127.0.0.1:2181) and timeout (5000) / elapsed (59220)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

  重启应用后仍然如此,重启zk后便自动恢复。网络

  最近有个奇怪的现象{20200420更新},只要链接非本地,客户端IDEA中老是第一次链接的时候超时,第二次就成功了,每一个客户端均如此。以下:session

19:51:53.245 [main] INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=xx.xx.xx.xx:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@310d05c5
19:51:58.334 [main] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (xx.xx.xx.xx:2181) and timeout (5000) / elapsed (5104)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225) [curator-client-2.13.0.jar:?]
	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94) [curator-client-2.13.0.jar:?]
	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117) [curator-client-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:489) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:740) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:723) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) [curator-client-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:720) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:484) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:474) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:454) [curator-framework-2.13.0.jar:?]
	at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44) [curator-framework-2.13.0.jar:?]
	at com.alibaba.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient.createPersistent(CuratorZookeeperClient.java:96) [classes/:1.0.17]

  链接超时时间过短,加大便可,有需求可有偿解决。app

 

curator KeeperErrorCode = Unimplemented异常socket

ZooKeeper 3.5.x

  • Curator 4.0 has a hard dependency on ZooKeeper 3.5.x
  • If you are using ZooKeeper 3.5.x there's nothing additional to do - just use Curator 4.0

ZooKeeper 3.4.x

Curator 4.0 supports ZooKeeper 3.4.x ensembles in a soft-compatibility mode. To use this mode you must exclude ZooKeeper when adding Curator to your dependency management tool.oop

Curator 4.0默认不支持zk 3.4,必须兼容性处理,以下:

        <dependency>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
            <version>3.4.12</version>
        </dependency>
        <dependency>
            <groupId>org.apache.curator</groupId>
            <artifactId>curator-recipes</artifactId>
            <version>4.0.0</version>
            <!-- 4.0.0原生不兼容zk 3.4, 必须进行兼容性处理 -->
            <exclusions>
                <exclusion>
                    <groupId>org.apache.zookeeper</groupId>
                    <artifactId>zookeeper</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

  见http://curator.apache.org/zk-compatibility.html

  其余zk自身及zk与dubbo相关的异常可参见:http://www.noobyard.com/article/p-ktmuvyjz-dn.html

2019-11-04 12:05:17,165 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30003, likely client has closed socket
2019-11-04 12:05:17,166 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22804 which had sessionid 0x1001e6b86a30003
2019-11-04 12:05:17,240 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30000, likely client has closed socket
2019-11-04 12:05:17,240 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22665 which had sessionid 0x1001e6b86a30000
2019-11-04 12:05:17,241 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30001, likely client has closed socket
2019-11-04 12:05:17,241 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22672 which had sessionid 0x1001e6b86a30001
2019-11-04 12:05:17,241 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x1001e6b86a30002, likely client has closed socket
2019-11-04 12:05:17,242 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /10.2.104.6:22673 which had sessionid 0x1001e6b86a30002
2019-11-04 12:05:47,455 [myid:] - INFO  [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30002, timeout of 40000ms exceeded
2019-11-04 12:05:47,456 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30002
2019-11-04 12:05:51,456 [myid:] - INFO  [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30003, timeout of 40000ms exceeded
2019-11-04 12:05:51,456 [myid:] - INFO  [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30000, timeout of 40000ms exceeded
2019-11-04 12:05:51,456 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30003
2019-11-04 12:05:51,457 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30000
2019-11-04 12:05:53,456 [myid:] - INFO  [SessionTracker:ZooKeeperServer@354] - Expiring session 0x1001e6b86a30001, timeout of 40000ms exceeded
2019-11-04 12:05:53,457 [myid:] - INFO  [ProcessThread(sid:0 cport:2181)::PrepRequestProcessor@487] - Processed session termination for sessionid: 0x1001e6b86a30001

  该异常看起来和https://www.cnblogs.com/zhyg/p/11089736.html所述很像。

  zookeeper服务器日志报链接数超出:

2020-04-16 17:16:11,135 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,241 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,371 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,381 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,463 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,527 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,553 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,580 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,609 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,686 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,796 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,818 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60
2020-04-16 17:16:11,853 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@211] - Too many connections from /10.20.25.225 - max is 60

  该缘由大几率是客户端有链接泄露,可经过netstat -ano | grep 2181分析,以下:

[root@hs-10-20-37-72 logs]# lsof -i:2181
COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
java     2706 root   24u  IPv6   18738      0t0  TCP *:eforward (LISTEN)
java     2706 root   25u  IPv6 3786862      0t0  TCP 10.20.25.225:eforward->10.20.25.225:63464 (ESTABLISHED)
java     2706 root   26u  IPv6 4748980      0t0  TCP 10.20.25.225:eforward->10.20.25.225:59063 (ESTABLISHED)
java     2706 root   27u  IPv6 6781917      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51577 (ESTABLISHED)
java     2706 root   28u  IPv6 6781920      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51578 (ESTABLISHED)
java     2706 root   30u  IPv6 6784242      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51880 (ESTABLISHED)
java     2706 root   31u  IPv6 6784604      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51989 (ESTABLISHED)
java     2706 root   32u  IPv6 6784610      0t0  TCP 10.20.25.225:eforward->10.20.25.225:51990 (ESTABLISHED)
java     2706 root   33u  IPv6 6788885      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52069 (ESTABLISHED)
java     2706 root   34u  IPv6 4412727      0t0  TCP 10.20.25.225:eforward->10.20.25.225:12441 (ESTABLISHED)
java     2706 root   35u  IPv6 6789241      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52127 (ESTABLISHED)
java     2706 root   36u  IPv6 6789263      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52130 (ESTABLISHED)
java     2706 root   37u  IPv6 6789329      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52140 (ESTABLISHED)
java     2706 root   38u  IPv6 6789422      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52149 (ESTABLISHED)
java     2706 root   39u  IPv6 6789572      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52166 (ESTABLISHED)
java     2706 root   40u  IPv6 6789693      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52183 (ESTABLISHED)
java     2706 root   41u  IPv6 6789807      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52200 (ESTABLISHED)
java     2706 root   42u  IPv6 6789981      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52220 (ESTABLISHED)
java     2706 root   43u  IPv6 6790066      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52239 (ESTABLISHED)
java     2706 root   44u  IPv6 6790173      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52250 (ESTABLISHED)
java     2706 root   45u  IPv6 6790238      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52259 (ESTABLISHED)
java     2706 root   46u  IPv6 6790287      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52272 (ESTABLISHED)
java     2706 root   47u  IPv6 6790356      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52285 (ESTABLISHED)
java     2706 root   48u  IPv6 6790447      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52297 (ESTABLISHED)
java     2706 root   49u  IPv6 6790485      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52307 (ESTABLISHED)
java     2706 root   50u  IPv6 6790597      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52337 (ESTABLISHED)
java     2706 root   51u  IPv6 6790638      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52343 (ESTABLISHED)
java     2706 root   52u  IPv6 6790677      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52356 (ESTABLISHED)
java     2706 root   53u  IPv6 6790726      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52368 (ESTABLISHED)
java     2706 root   54u  IPv6 6790842      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52387 (ESTABLISHED)
java     2706 root   55u  IPv6 6790871      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52395 (ESTABLISHED)
java     2706 root   56u  IPv6 6790883      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52402 (ESTABLISHED)
java     2706 root   57u  IPv6 6790992      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52415 (ESTABLISHED)
java     2706 root   58u  IPv6 6791084      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52433 (ESTABLISHED)
java     2706 root   59u  IPv6 6791118      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52449 (ESTABLISHED)
java     2706 root   60u  IPv6 6791187      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52460 (ESTABLISHED)
java     2706 root   61u  IPv6 6791270      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52479 (ESTABLISHED)
java     2706 root   62u  IPv6 6791334      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52488 (ESTABLISHED)
java     2706 root   63u  IPv6 6791427      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52501 (ESTABLISHED)
java     2706 root   64u  IPv6 6791470      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52518 (ESTABLISHED)
java     2706 root   65u  IPv6 6791542      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52531 (ESTABLISHED)
java     2706 root   66u  IPv6 6791631      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52545 (ESTABLISHED)
java     2706 root   67u  IPv6 6791669      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52555 (ESTABLISHED)
java     2706 root   68u  IPv6 6791719      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52566 (ESTABLISHED)
java     2706 root   69u  IPv6 6791797      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52581 (ESTABLISHED)
java     2706 root   70u  IPv6 6791847      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52592 (ESTABLISHED)
java     2706 root   71u  IPv6 6791883      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52602 (ESTABLISHED)
java     2706 root   72u  IPv6 6791931      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52615 (ESTABLISHED)
java     2706 root   73u  IPv6 6792017      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52627 (ESTABLISHED)
java     2706 root   74u  IPv6 6792056      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52641 (ESTABLISHED)
java     2706 root   75u  IPv6 6792090      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52648 (ESTABLISHED)
java     2706 root   76u  IPv6 6792199      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52661 (ESTABLISHED)
java     2706 root   77u  IPv6 6792283      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52681 (ESTABLISHED)
java     2706 root   78u  IPv6 6792299      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52688 (ESTABLISHED)
java     2706 root   79u  IPv6 6792345      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52698 (ESTABLISHED)
java     2706 root   80u  IPv6 6792413      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52709 (ESTABLISHED)
java     2706 root   81u  IPv6 6792495      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52727 (ESTABLISHED)
java     2706 root   82u  IPv6 6792563      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52736 (ESTABLISHED)
java     2706 root   83u  IPv6 6792625      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52745 (ESTABLISHED)
java     2706 root   84u  IPv6 6792674      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52758 (ESTABLISHED)
java     2706 root   85u  IPv6 6792755      0t0  TCP 10.20.25.225:eforward->10.20.25.225:52775 (ESTABLISHED)
java     2706 root   86u  IPv6 6290356      0t0  TCP 10.20.25.225:eforward->192.168.166.145:34608 (ESTABLISHED)
java     2871 root  176u  IPv6 4748979      0t0  TCP 10.20.25.225:59063->10.20.25.225:eforward (ESTABLISHED)

  找到对应的进程和服务器。而后分析泄露,一般大几率是由于本身手写zk相关链接和监控、而不是使用客户端库如curator所致。

21:12:03.568 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.20.25.225/10.20.25.225:2181. Will not attempt to authenticate using SASL (unknown error)
21:12:03.568 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to 10.20.25.225/10.20.25.225:2181, initiating session
21:12:03.576 [pool-taskSchedule-thread-54-SendThread(10.20.25.225:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.20.25.225/10.20.25.225:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_191]
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_191]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_191]
	at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_191]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_191]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar!/:3.4.8--1]
21:12:03.729 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.20.25.225/10.20.25.225:2181. Will not attempt to authenticate using SASL (unknown error)
21:12:03.730 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to 10.20.25.225/10.20.25.225:2181, initiating session
21:12:03.730 [pool-taskSchedule-thread-52-SendThread(10.20.25.225:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.20.25.225/10.20.25.225:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_191]
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_191]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_191]
	at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_191]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_191]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.8.jar!/:3.4.8--1]

Caused by: org.apache.zookeeper.KeeperException$OperationTimeoutException: KeeperErrorCode = OperationTimeout

at com.dangdang.ddframe.job.reg.zookeeper.ZookeeperRegistryCenter.init(ZookeeperRegistryCenter.java:102) ~[elastic-job-common-core-2.1.5.jar:?]

检查网络是否正常,ping/telnet看下,基本上就这一个缘由。

执行[root@192.168.1.1 bin]# echo conf | nc localhost 2181报下列输出

   This ZooKeeper instance is not currently serving requests

说明zookeeper还活着,但不提供服务了。此时要查看zk的日志,dataLogDir指定的目录,通常里面能够看到为何zk不提供服务了。以下:

2020-08-17 00:13:20,050 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:20,252 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:20,654 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:21,463 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:23,065 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:26,271 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:32,679 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 2 (n.leader), 0x103c00000494 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LEADING (my state)
2020-08-17 00:13:40,040 [myid:0] - WARN  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@998] - Unexpected exception
java.lang.InterruptedException: Timeout while waiting for epoch to be acked by quorum
    at org.apache.zookeeper.server.quorum.Leader.waitForEpochAck(Leader.java:936)
    at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:415)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:995)
2020-08-17 00:13:40,040 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@520] - Shutting down
2020-08-17 00:13:40,040 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@526] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Forcing shutdown
    at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:526)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1001)
2020-08-17 00:13:40,041 [myid:0] - INFO  [LearnerCnxAcceptor-/10.20.30.17:2182:Leader$LearnerCnxAcceptor@340] - exception while shutting down acceptor: java.net.SocketException: Socket closed
2020-08-17 00:13:40,041 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@910] - LOOKING
2020-08-17 00:13:40,042 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@813] - New election. My id =  0, proposed zxid=0x103c00000496
2020-08-17 00:13:40,042 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 0 (n.leader), 0x103c00000496 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 0 (n.sid), 0x103c (n.peerEpoch) LOOKING (my state)
2020-08-17 00:13:40,043 [myid:0] - WARN  [WorkerSender[myid=0]:QuorumCnxManager@584] - Cannot open channel to 1 at election address/*缘由和解决方法可见http://www.noobyard.com/article/p-ckeababk-hn.html*/ /10.20.30.13:2183
java.net.ConnectException: Connection refused (Connection refused)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
    at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
    at java.lang.Thread.run(Thread.java:748)
2020-08-17 00:13:40,043 [myid:0] - INFO  [WorkerSender[myid=0]:QuorumPeer$QuorumServer@185] - Resolved hostname: 10.20.30.13 to address: /10.20.30.13
2020-08-17 00:13:40,045 [myid:0] - INFO  [WorkerReceiver[myid=0]:FastLeaderElection@595] - Notification: 1 (message format version), 0 (n.leader), 0x103c00000496 (n.zxid), 0x17af2 (n.round), 
LOOKING (n.state), 2 (n.sid), 0x103c (n.peerEpoch) LOOKING (my state)
2020-08-17 00:13:40,245 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:QuorumPeer@992] - LEADING
2020-08-17 00:13:40,263 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@174] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datad
ir /home/component/zookeeper/log/version-2 snapdir /home/component/zookeeper/data/version-2
2020-08-17 00:13:40,264 [myid:0] - INFO  [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:Leader@380] - LEADING - LEADER ELECTION TOOK - 222
2020-08-17 00:13:41,271 [myid:0] - INFO  [LearnerHandler-/10.20.30.18:11030:LearnerHandler@346] - Follower sid: 2 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1efeac
2020-08-17 00:13:41,307 [myid:0] - ERROR [LearnerHandler-/10.20.30.18:11030:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:85)
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
    at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:373)
2020-08-17 00:13:41,308 [myid:0] - WARN  [LearnerHandler-/10.20.30.18:11030:LearnerHandler@661] - ******* GOODBYE /10.20.30.18:11030 ********

因此,此处根本缘由是集群的其余节点连不上,致使节点异常退出。可是socket没有关闭,这样是为了在其余节点恢复后,集群总体自动恢复。