先是收到线上某java服务异常日志警报
[code]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method) [na:1.7.0_79]
at java.lang.Thread.start(Thread.java:714) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949) [na:1.7.0_79]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) [na:1.7.0_79]
at com.squareup.okhttp.ConnectionPool.put(ConnectionPool.java:189) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.OkHttpClient$1.put(OkHttpClient.java:89) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:179) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:283) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.Call.getResponse(Call.java:286) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205) ~[okhttp-2.7.1.jar!/:na]
at com.squareup.okhttp.Call.execute(Call.java:80) ~[okhttp-2.7.1.jar!/:na]
at com.qiniu.http.Client.send(Client.java:195) ~[qiniu-java-sdk-7.0.10.jar!/:na]
at com.qiniu.http.Client.post(Client.java:132) ~[qiniu-java-sdk-7.0.10.jar!/:na]
at com.qiniu.http.Client.post(Client.java:115) ~[qiniu-java-sdk-7.0.10.jar!/:na]
at com.qiniu.storage.BucketManager.post(BucketManager.java:319) ~[qiniu-java-sdk-7.0.10.jar!/:na]
at com.qiniu.storage.BucketManager.ioPost(BucketManager.java:309) ~[qiniu-java-sdk-7.0.10.jar!/:na]
at com.qiniu.storage.BucketManager.fetch(BucketManager.java:263) ~[qiniu-java-sdk-7.0.10.jar!/:na]
[/code]
登录上服务器,直接报了个异常:无法分配内存
[code]
[root@*****85 logs]# less stdout.log
-bash: fork: Cannot allocate memory
[/code]
看起来真是内存不足了。用free -m试了一下,还有1G多。
[code]
[root@*****85 logs]# free -m
total used free shared buffers cached
Mem: 31996 30343 1652 3 47 16751
-/+ buffers/cache: 13544 18451
Swap: 20479 10 20469
[/code]

- 阅读剩余部分 -

今天遭遇HBase超负载问题,已经处置,备忘一下过程。

问题表现:
nginx:每个worker的cpu负载为10%到20%之间。平时为5%左右。
主服务:cpu于5%到200%之间波动。平时为200%左右。dmesg和messages中发现possible SYN flooding on port xxx. Sending cookies.
HBase:其中一台regionserver满载2400%CPU(24核),wa为0%,sys为3%,其它节点负载正常。重启此结点后,恢复正常,但会有一个其它结点变成这样。

问题定位:
首先怀疑是gc的问题,之前碰到过hbase的内存不足,一直gc,CPU超高无io负载。找到gc日志(通过jstat也可以实时监控到):
[code]
2016-08-01T09:31:44.167+0800: 23811.563: [GC2016-08-01T09:31:44.167+0800: 23811.563: [ParNew: 442182K->22966K(471872K), 0.0379880 secs] 7170427K->6752954K(25113408K), 0.0381800 secs] [Times: user=0.65 sys=0.00, real=0.03 secs]
2016-08-01T09:31:55.191+0800: 23822.587: [GC2016-08-01T09:31:55.191+0800: 23822.587: [ParNew: 442422K->24807K(471872K), 0.0374700 secs] 7172410K->6757782K(25113408K), 0.0376640 secs] [Times: user=0.65 sys=0.00, real=0.03 secs]
2016-08-01T09:32:06.618+0800: 23834.014: [GC2016-08-01T09:32:06.618+0800: 23834.014: [ParNew: 444263K->25807K(471872K), 0.0333280 secs] 7177238K->6759917K(25113408K), 0.0335430 secs] [Times: user=0.57 sys=0.00, real=0.04 secs]
Heap
par new generation total 471872K, used 381454K [0x00000001f8000000, 0x0000000218000000, 0x0000000218000000)
eden space 419456K, 84% used [0x00000001f8000000, 0x000000020db4fe68, 0x00000002119a0000)
from space 52416K, 49% used [0x0000000214cd0000, 0x0000000216603d68, 0x0000000218000000)
to space 52416K, 0% used [0x00000002119a0000, 0x00000002119a0000, 0x0000000214cd0000)
concurrent mark-sweep generation total 24641536K, used 6734110K [0x0000000218000000, 0x00000007f8000000, 0x00000007f8000000)
concurrent-mark-sweep perm gen total 131072K, used 46044K [0x00000007f8000000, 0x0000000800000000, 0x0000000800000000)
[/code]
能过gc日志发现不是这个问题,后来重启regionserver结点后更验证了不是gc的问题。

- 阅读剩余部分 -

今天有同事反馈Spring的@Async注解无效,调用结果还是同步的。
调用方式大体如下:
[code]
public void TestAsync(){
...
this.testPrivate();
...
}

@Async
private void testPrivate(){
...需长时间运行的代码
}
[/code]

- 阅读剩余部分 -

偶然在一篇文章中看到的,答案跟我的第一感觉相反,记录一下。

米国的综艺节目,有三个门,其中一个后面是车,另两个后面都是羊。被测试人选一个门后,主持人会打开剩下两个中的一个,这一个门后是羊。主持人会问被测试人,要不要换一个门。

第一反应:公开了一个答案,剩下的两个门中,一个是羊,一个是车,各一半的机率。

实际的答案是:不换门选到车的机率是1/3,换了以后是2/3。

后来想了一下,也挺好理解的:主持人公开了一个错误的门,等于是把剩下的那个门的机率提高了一倍。

高并发服务器,忽然性能变差,几经周折,某大牛发现一些内核参数因iptables的调整被自动重置了。
比如:/proc/sys/net/netfilter/nf_conntrack_max 被复位成65535了

解决办法:
修改/etc/sysconfig/iptables-config中的IPTABLES_MODULES_UNLOAD为no即可解决:
[code]
# Unload modules on restart and stop
# Value: yes|no, default: yes
# This option has to be 'yes' to get to a sane state for a firewall
# restart or stop. Only set to 'no' if there are problems unloading netfilter
# modules.
IPTABLES_MODULES_UNLOAD="no"
[/code]

iptables的调整很常见,我从来没意识到会引起netfilter模块参数的重置。估计会有很多小白像我一样~

之前搞定了HBase的绑定Host问题,今天在Kafka上用的时候,发现还是有问题,于是又换了一种更彻底的实现,可以直接取代之前的搞法。
本次问题表现:
[code]
Caused by: java.net.UnknownHostException: hostxxx: hostxxx
at java.net.InetAddress.getLocalHost(InetAddress.java:1496) ~[na:1.7.0_101]
at kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:119) ~[kafka_2.10-0.8.2.2.jar!/:na]
at kafka.javaapi.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:66) ~[kafka_2.10-0.8.2.2.jar!/:na]
at kafka.javaapi.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:69) ~[kafka_2.10-0.8.2.2.jar!/:na]
at kafka.consumer.Consumer$.createJavaConsumerConnector(ConsumerConnector.scala:105) ~[kafka_2.10-0.8.2.2.jar!/:na]
at kafka.consumer.Consumer.createJavaConsumerConnector(ConsumerConnector.scala) ~[kafka_2.10-0.8.2.2.jar!/:na]
... 53 common frames omitted
Caused by: java.net.UnknownHostException: hostxxx
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[na:1.7.0_101]
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:922) ~[na:1.7.0_101]
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1316) ~[na:1.7.0_101]
at java.net.InetAddress.getLocalHost(InetAddress.java:1492) ~[na:1.7.0_101]
... 69 common frames omitted
[/code]

- 阅读剩余部分 -