Redis的CLUSTERDOWN问题现场
2016-10-30 05:28,redis集群出现一次CLUSTERDOWN问题,看起来是因为网络抖动引起的,记录现场信息备用。
Redis 3.0.7 64bit cluster mode
三台服务器(285/286/287),每台服务两个实例(30001/30002)
285上3001实例当时的日志9d21b96013bbee9319a2387a243271c255b411dd
[code]
31672:S 30 Oct 05:28:11.809 # Cluster state changed: fail
31672:S 30 Oct 05:28:30.551 * FAIL message received from d4a1b5802d51faa245e1f7e2723f05521faa0c2c about 94bd2201144028727f5560b3e088b9224f08d5b3
31672:S 30 Oct 05:28:30.551 * FAIL message received from d4a1b5802d51faa245e1f7e2723f05521faa0c2c about 4a0d258e31dd4220fbe6d08b06ff2bb63e4cb3ed
31672:S 30 Oct 05:28:30.939 # Cluster state changed: ok
31672:S 30 Oct 05:28:31.941 * Clear FAIL state for node 94bd2201144028727f5560b3e088b9224f08d5b3: slave is reachable again.
31672:S 30 Oct 05:28:31.941 * Clear FAIL state for node 4a0d258e31dd4220fbe6d08b06ff2bb63e4cb3ed: slave is reachable again.
[/code]
285上30002实例日志d4a1b5802d51faa245e1f7e2723f05521faa0c2c
[code]
31722:M 30 Oct 05:28:13.388 # Cluster state changed: fail
31722:M 30 Oct 05:28:30.531 * Marking node 94bd2201144028727f5560b3e088b9224f08d5b3 as failing (quorum reached).
31722:M 30 Oct 05:28:30.531 * Marking node 4a0d258e31dd4220fbe6d08b06ff2bb63e4cb3ed as failing (quorum reached).
31722:M 30 Oct 05:28:30.551 * Clear FAIL state for node 4a0d258e31dd4220fbe6d08b06ff2bb63e4cb3ed: slave is reachable again.
31722:M 30 Oct 05:28:31.553 * Clear FAIL state for node 94bd2201144028727f5560b3e088b9224f08d5b3: slave is reachable again.
31722:M 30 Oct 05:28:35.459 # Cluster state changed: ok
[/code]