刚好这几天有童鞋产生了这一系列误解,备忘一下。

1、info信息的最后一行
# Keyspace
db0:keys=645145,expires=585678,avg_ttl=15585373
keys现存的key数量,这个没问题,expires是当前存在的key里带过期时间的数量,很容易误解为已经过期的数量。
已经过期的数量是Stats区的expired_keys:
expired_keys:243043954

2、slowlog,官方文档https://redis.io/commands/slowlog
初次查很容易被吓到,比如官方举的这个例子,get请求居然花了30毫秒,太慢了
[code]
redis 127.0.0.1:6379> slowlog get 2
1) 1) (integer) 14
2) (integer) 1309448221
3) (integer) 15
4) 1) "ping"
2) 1) (integer) 13
2) (integer) 1309448128
3) (integer) 30
4) 1) "slowlog"
2) "get"
3) "100"
[/code]

- 阅读剩余部分 -

2016-10-30 05:28,redis集群出现一次CLUSTERDOWN问题,看起来是因为网络抖动引起的,记录现场信息备用。
Redis 3.0.7 64bit cluster mode
三台服务器(285/286/287),每台服务两个实例(30001/30002)

285上3001实例当时的日志9d21b96013bbee9319a2387a243271c255b411dd
[code]
31672:S 30 Oct 05:28:11.809 # Cluster state changed: fail
31672:S 30 Oct 05:28:30.551 * FAIL message received from d4a1b5802d51faa245e1f7e2723f05521faa0c2c about 94bd2201144028727f5560b3e088b9224f08d5b3
31672:S 30 Oct 05:28:30.551 * FAIL message received from d4a1b5802d51faa245e1f7e2723f05521faa0c2c about 4a0d258e31dd4220fbe6d08b06ff2bb63e4cb3ed
31672:S 30 Oct 05:28:30.939 # Cluster state changed: ok
31672:S 30 Oct 05:28:31.941 * Clear FAIL state for node 94bd2201144028727f5560b3e088b9224f08d5b3: slave is reachable again.
31672:S 30 Oct 05:28:31.941 * Clear FAIL state for node 4a0d258e31dd4220fbe6d08b06ff2bb63e4cb3ed: slave is reachable again.
[/code]

285上30002实例日志d4a1b5802d51faa245e1f7e2723f05521faa0c2c
[code]
31722:M 30 Oct 05:28:13.388 # Cluster state changed: fail
31722:M 30 Oct 05:28:30.531 * Marking node 94bd2201144028727f5560b3e088b9224f08d5b3 as failing (quorum reached).
31722:M 30 Oct 05:28:30.531 * Marking node 4a0d258e31dd4220fbe6d08b06ff2bb63e4cb3ed as failing (quorum reached).
31722:M 30 Oct 05:28:30.551 * Clear FAIL state for node 4a0d258e31dd4220fbe6d08b06ff2bb63e4cb3ed: slave is reachable again.
31722:M 30 Oct 05:28:31.553 * Clear FAIL state for node 94bd2201144028727f5560b3e088b9224f08d5b3: slave is reachable again.
31722:M 30 Oct 05:28:35.459 # Cluster state changed: ok
[/code]

- 阅读剩余部分 -

有一段时间饱受syn flooding的困惑
[code]
kernel: possible SYN flooding on port 80. Sending cookies.
[/code]

偶然见到tengine的reuse_port,便决定尝试一下。貌似已经解决了这个问题。
后记:经过一次晚高峰的洗礼,已经确认此问题解决。

[code]
events {
use epoll;
reuse_port on;
worker_connections 655350;
}
[/code]

reuse_port on打开之前:
[code]
# ss -lnt
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 65535 *:80 *:*
[/code]

reuse_port on打开之后(80端口的listen数量跟worker数量一致):
[code]
# ss -lnt
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 65535 *:80 *:*
LISTEN 0 65535 *:80 *:*
LISTEN 0 65535 *:80 *:*
LISTEN 0 65535 *:80 *:*
LISTEN 0 65535 *:80 *:*
LISTEN 0 65535 *:80 *:*
LISTEN 0 65535 *:80 *:*
LISTEN 0 65535 *:80 *:*
[/code]

- 阅读剩余部分 -

分析SYN flooding问题时,需要知道当前有多少SYN_RECV状态的连接,总结了几种方法,供以后参考使用。

1、ss -s 这个命令最快,几乎是立即得到结果,但synrecv一直显示为0,所以没法用。除此之外,其它信息是完整的。
[code]
# ss -s
Total: 30234 (kernel 30462)
TCP: 115175 (estab 30148, closed 77237, orphaned 7771, synrecv 0, timewait 77237/0), ports 1139

Transport Total IP IPv6
* 30462 - -
RAW 0 0 0
UDP 1 1 0
TCP 37938 37938 0
INET 37939 37939 0
FRAG 0 0 0
[/code]

- 阅读剩余部分 -

nginx服务器的/var/log/message里出现这个问题
[code]
kernel: possible SYN flooding on port 80. Sending cookies.
[/code]

sys + cookies 去查ip-sysctl文档(https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt)
找到这个东西
[code]
tcp_syncookies - BOOLEAN
Only valid when the kernel was compiled with CONFIG_SYN_COOKIES
Send out syncookies when the syn backlog queue of a socket
overflows. This is to prevent against the common 'SYN flood attack'
Default: 1

Note, that syncookies is fallback facility.
It MUST NOT be used to help highly loaded servers to stand
against legal connection rate. If you see SYN flood warnings
in your logs, but investigation shows that they occur
because of overload with legal connections, you should tune
another parameters until this warning disappear.
See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.

syncookies seriously violate TCP protocol, do not allow
to use TCP extensions, can result in serious degradation
of some services (f.e. SMTP relaying), visible not by you,
but your clients and relays, contacting you. While you see
SYN flood warnings in logs not being really flooded, your server
is seriously misconfigured.

If you want to test which effects syncookies have to your
network connections you can set this knob to 2 to enable
unconditionally generation of syncookies.
[/code]
文中提到了另外三个配置:tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow

- 阅读剩余部分 -