2016年2月

线上服务器发现有Cookie的解析异常,而我们又不需要用Cookie(我们自己管理会话,提供无状态http api服务),准备禁用cookie。异常信息如下:
[code]
2016-02-21 13:37:21.248 ERROR 7114 [http-nio-xxxx-exec-121] --- o.a.coyote.http11.Http11NioProcessor : Error processing request
java.lang.IllegalArgumentException: Control character in cookie value or attribute.
at org.apache.tomcat.util.http.CookieSupport.isHttpSeparator(CookieSupport.java:193)
at org.apache.tomcat.util.http.Cookies.getTokenEndPosition(Cookies.java:502)
at org.apache.tomcat.util.http.Cookies.processCookieHeader(Cookies.java:349)
at org.apache.tomcat.util.http.Cookies.processCookies(Cookies.java:168)
at org.apache.tomcat.util.http.Cookies.getCookieCount(Cookies.java:106)
at org.apache.catalina.connector.CoyoteAdapter.parseSessionCookiesId(CoyoteAdapter.java:1010)
at org.apache.catalina.connector.CoyoteAdapter.postParseRequest(CoyoteAdapter.java:764)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:416)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1736)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1695)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
[/code]

Tomcat禁用cookie相对简单,改conf/context.xml中的Context配置就成:
[code]
<Context cookies = "false">...</Context>
[/code]

我们用的是Spring集成的Tomcat,对tomcat的可配置参数非常有限,当然了,没有这个禁用cookie的选项:
[code]
server.tomcat.accesslog.directory=logs # Directory in which log files are created. Can be relative to the tomcat base dir or absolute.
server.tomcat.accesslog.enabled=false # Enable access log.
server.tomcat.accesslog.pattern=common # Format pattern for access logs.
server.tomcat.accesslog.prefix=access_log # Log file name prefix.
server.tomcat.accesslog.suffix=.log # Log file name suffix.
server.tomcat.background-processor-delay=30 # Delay in seconds between the invocation of backgroundProcess methods.
server.tomcat.basedir= # Tomcat base directory. If not specified a temporary directory will be used.
server.tomcat.internal-proxies=10\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}|\\
192\\.168\\.\\d{1,3}\\.\\d{1,3}|\\
169\\.254\\.\\d{1,3}\\.\\d{1,3}|\\
127\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}|\\
172\\.1[6-9]{1}\\.\\d{1,3}\\.\\d{1,3}|\\
172\\.2[0-9]{1}\\.\\d{1,3}\\.\\d{1,3}|\\
172\\.3[0-1]{1}\\.\\d{1,3}\\.\\d{1,3} # regular expression matching trusted IP addresses.
server.tomcat.max-http-header-size=0 # Maximum size in bytes of the HTTP message header.
server.tomcat.max-threads=0 # Maximum amount of worker threads.
server.tomcat.port-header=X-Forwarded-Port # Name of the HTTP header used to override the original port value.
server.tomcat.protocol-header= # Header that holds the incoming protocol, usually named "X-Forwarded-Proto".
server.tomcat.protocol-header-https-value=https # Value of the protocol header that indicates that the incoming request uses SSL.
server.tomcat.remote-ip-header= # Name of the http header from which the remote ip is extracted. For instance `X-FORWARDED-FOR`
server.tomcat.uri-encoding=UTF-8 # Character encoding to use to decode the URI.
[/code]
完整参数配置参见:http://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#appendix

- 阅读剩余部分 -

[code]
[email protected] hbase-1.1.3
[]# bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
...
File System Counters
FILE: Number of bytes read=321913437
FILE: Number of bytes written=327363226
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=10
Map output records=10
Map output bytes=160
Map output materialized bytes=240
Input split bytes=1470
Combine input records=0
Combine output records=0
Reduce input groups=10
Reduce shuffle bytes=240
Reduce input records=10
Reduce output records=10
Spilled Records=20
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1326
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=3442950144
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
HBase Performance Evaluation
Elapsed time in milliseconds=13909
Row count=1048570
File Input Format Counters
Bytes Read=34515
File Output Format Counters
Bytes Written=126
[/code]

- 阅读剩余部分 -

小视频采集服务,之前的重复过滤机制,是根据文件的hash值精确匹配的。
目前碰到新情况:同一个视频,被不同的网站打了不同的水印,被当成了多个视频。
需要做的事情:识别这类相似的小视频,去重。

整体思路:
取视频的关键帧(比如第一帧)的关键区域(比如:九宫格的中间一行),取一些(比如1024个)关键点(比如:均匀取),数字化(比如:每个点转成一个64进制字符:256灰阶/4)存成可以比对的指纹信息。

例外的情况:
电影的首帧一般是出品方的固定信息,小视频这种情况比较少见。

最初的办法是精确匹配,可以在数据库里精确搜索。
实践证明:不靠谱。转码的色阶数值,不管怎么切割,总会有跨域问题。甚至变成黑白两色也不一致。此路不通。

简单描述一下跨域问题:
视频加水印时,会进行有损转码,灰阶值有机率发生小范围浮动。两个带水印的视频比对,就是两个灰阶浮动后对比,是浮动的叠加。还有一些是被多次加水印的视频,也是多次浮动的叠加。

想到GPS的geohash的网格问题处理方案,在256灰阶/4时搞了个去余、进位、四舍五入三指纹,交叉比对,用以解决跨域问题。
实践证明:同样不靠谱。浮动偶尔会有超过范围,变成极端的黑白也不成。

基本上得出结论:想精确匹配,不靠谱。

- 阅读剩余部分 -