2016-01-09 2 views
0

Après la question All thread are blocked,like STW,and no responding, j'ai ajouté quelques JAVA_OPTS.Le journal GC me confondre

Voici mes nouveaux opts.

 
-server -Xms32g -Xmx32g \ 
-XX:NewSize=10g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly \ 
-XX:+CMSParallelRemarkEnabled -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 \ 
-XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime \ 
-XX:+PrintGCCause -Xloggc:/var/log/flume-ng/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M 

Et l'application a cessé de fonctionner tout à l'heure. Voici les dernières lignes de gc log (le dernier mot du gc log était stopped:).

 
2016-01-09T18:37:14.193+0800: 18879.031: Application time: 0.0071232 seconds 
2016-01-09T18:37:14.193+0800: 18879.032: Total time for which application threads were stopped: 0.0003769 seconds, Stopping threads took: 0.0000800 seconds 
2016-01-09T18:37:21.704+0800: 18886.542: Application time: 7.5105681 seconds 
2016-01-09T18:37:21.705+0800: 18886.543: [GC (Allocation Failure) 2016-01-09T18:37:21.705+0800: 18886.543: [ParNew: 8412583K->36040K(9437184K), 0.0159936 secs] 21350038K->12973495K(32505856K), 0.0162151 secs] [Times: user=0.34 sys=0.00, real=0.02 secs] 
2016-01-09T18:37:21.721+0800: 18886.559: Total time for which application threads were stopped: 0.0172971 seconds, Stopping threads took: 0.0001286 seconds 
2016-01-09T18:37:37.830+0800: 18902.668: Application time: 16.1087180 seconds 
2016-01-09T18:42:43.301+0800: 19208.140: Total time for which application threads were stopped: 

Puis, je l'ai utilisé jstack -m command.Then l'application est revenue à la normale.

Voici le prochain journal de gc (305.4698351 second de première ligne est que j'exécute le temps de jstack command soustraire l'application a cessé de travailler le temps). l'application a cessé de fonctionner à nouveau

 
2016-01-09T18:42:43.301+0800: 19208.140: Total time for which application threads were stopped: 305.4715152 seconds, Stopping threads took: 305.4698351 seconds 

skip some Application time log 


2016-01-09T18:42:44.340+0800: 19209.179: [GC (Allocation Failure) 2016-01-09T18:42:44.340+0800: 19209.179: [ParNew: 8424650K->127192K(9437184K), 0.0331325 secs] 21362105K->13064648K(32505856K), 0.0333109 secs] [Times: user=0.72 sys=0.00, real=0.03 secs] 
2016-01-09T18:42:44.374+0800: 19209.212: Total time for which application threads were stopped: 0.0338915 seconds, Stopping threads took: 0.0000998 seconds 
2016-01-09T18:42:44.374+0800: 19209.212: Application time: 0.0001254 seconds 
2016-01-09T18:42:44.375+0800: 19209.213: [GC (GCLocker Initiated GC) 2016-01-09T18:42:44.375+0800: 19209.213: [ParNew: 133268K->152503K(9437184K), 0.0335528 secs] 13070724K->13089959K(32505856K), 0.0336755 secs] [Times: user=0.69 sys=0.00, real=0.04 secs] 
2016-01-09T18:42:44.408+0800: 19209.247: Total time for which application threads were stopped: 0.0345724 seconds, Stopping threads took: 0.0001147 seconds 
2016-01-09T18:42:46.920+0800: 19211.759: Application time: 0.8197406 seconds 
2016-01-09T18:42:46.921+0800: 19211.760: [GC (Allocation Failure) 2016-01-09T18:42:46.921+0800: 19211.760: [ParNew: 8541111K->441557K(9437184K), 0.0993869 secs] 21478567K->13379018K(32505856K), 0.0995702 secs] [Times: user=2.25 sys=0.00, real=0.10 secs] 
2016-01-09T18:42:49.571+0800: 19214.409: Application time: 0.6178457 seconds 
2016-01-09T18:42:49.572+0800: 19214.410: [GC (Allocation Failure) 2016-01-09T18:42:49.572+0800: 19214.410: [ParNew: 8830165K->812963K(9437184K), 0.1568946 secs] 21767626K->13750433K(32505856K), 0.1571136 secs] [Times: user=3.58 sys=0.00, real=0.16 secs] 
2016-01-09T18:42:49.729+0800: 19214.568: Total time for which application threads were stopped: 0.1581075 seconds, Stopping threads took: 0.00seconds 

2016-01-09T18:42:52.286+0800: 19217.125: [GC (GCLocker Initiated GC) 2016-01-09T18:42:52.286+0800: 19217.125: [ParNew: 9201571K->973385K(9437184K), 0.3264006 secs] 22139041K->14095607K(32505856K), 0.3265784 secs] [Times: user=4.46 sys=0.00, real=0.33 secs] 

2016-01-09T18:42:55.279+0800: 19220.117: [GC (Allocation Failure) 2016-01-09T18:42:55.279+0800: 19220.117: [ParNew: 9361993K->1048576K(9437184K), 0.3140569 secs] 22484215K->14506810K(32505856K), 0.3142791 secs] [Times: user=5.36 sys=0.00, real=0.31 secs] 

2016-01-09T18:42:58.222+0800: 19223.061: Application time: 0.0000301 seconds 
2016-01-09T18:42:58.223+0800: 19223.061: [GC (GCLocker Initiated GC) 2016-01-09T18:42:58.223+0800: 19223.061: [ParNew: 9437184K->1048576K(9437184K), 0.3384320 secs] 22895418K->14892045K(32505856K), 0.3386214 secs] [Times: user=5.71 sys=0.00, real=0.34 secs] 

2016-01-09T18:43:01.166+0800: 19226.005: [GC (GCLocker Initiated GC) 2016-01-09T18:43:01.166+0800: 19226.005: [ParNew: 9437184K->1048576K(9437184K), 0.3425942 secs] 23280717K->15266436K(32505856K), 0.3427797 secs] [Times: user=5.89 sys=0.00, real=0.35 secs] 

2016-01-09T18:43:04.112+0800: 19228.950: Application time: 0.6938848 seconds 

Au moment 2016-01-09 18:43.

Ensuite, j'ai utilisé à nouveau la commande jstack -m, et l'application est revenue à ok.

Voici le journal gc.

 
2016-01-09T18:46:18.871+0800: 19423.709: [GC (Allocation Failure) 2016-01-09T18:46:18.871+0800: 19423.710: [ParNew: 9437184K->1048576K(9437184K), 0.3188298 secs] 23655044K->15632857K(32505856K), 0.3191140 secs] [Times: user=5.55 sys=0.00, real=0.32 secs] 
2016-01-09T18:46:19.190+0800: 19424.029: Total time for which application threads were stopped: 195.0782503 seconds, Stopping threads took: 194.7573545 seconds 


2016-01-09T18:46:22.197+0800: 19427.035: [GC (Allocation Failure) 2016-01-09T18:46:22.197+0800: 19427.036: [ParNew: 9437184K->1048576K(9437184K), 0.3175688 secs] 24021465K->16017865K(32505856K), 0.3177865 secs] [Times: user=5.85 sys=0.00, real=0.32 secs] 

2016-01-09T18:46:25.111+0800: 19429.950: [GC (Allocation Failure) 2016-01-09T18:46:25.112+0800: 19429.950: [ParNew: 9437184K->1048576K(9437184K), 0.2808750 secs] 24406473K->16384589K(32505856K), 0.2810767 secs] [Times: user=5.04 sys=0.00, real=0.28 secs] 

2016-01-09T18:46:27.974+0800: 19432.813: [GC (Allocation Failure) 2016-01-09T18:46:27.975+0800: 19432.813: [ParNew: 9437184K->1048576K(9437184K), 0.32secs] 24773197K->16833663K(32505856K), 0.3203207 secs] [Times: user=5.53 sys=0.00, real=0.32 secs] 
......... 
2016-01-09T18:55:25.249+0800: 19970.088: [GC (Allocation Failure) 2016-01-09T18:55:25.249+0800: 19970.088: [ParNew: 8419546K->31852K(9437184K), 0.0155374 secs] 13769434K->5382550K(32505856K), 0.0157411 secs] [Times: user=0.33 sys=0.00, real=0.02 secs] 

je trouve, après que j'exécuter la commande jstack -m, le GC log ParNew: from size->to size(9437184K) était anormal, le to size est devenu de plus en plus comme une mémoire ne peut pas être released.But i exercerai jstack -m la deuxième fois, la taille est devenue à la suite d'un petit période de temps.

Lorsque le système n'était pas normal, la commande jstack doit ajouter l'option -F et jmap doit effectuer quelques heures.

Voici un fil spécial dans le fichier de résultats jstack -F.Et tous les threads sont bloqués sauf in_vm.

 
Attaching to process ID 23694, please wait... 
Debugger attached successfully. 
Server compiler detected. 
JVM version is 25.66-b17 
Deadlock Detection: 

No deadlocks found. 

Thread 143283: (state = IN_VM) 
- sun.misc.Unsafe.freeMemory(long) @bci=0 (Compiled frame; information may be imprecise) 
- java.nio.DirectByteBuffer$Deallocator.run() @bci=17, line=94 (Compiled frame) 
- sun.misc.Cleaner.clean() @bci=12, line=143 (Compiled frame) 
- io.netty.util.internal.Cleaner0.freeDirectBuffer(java.nio.ByteBuffer) @bci=34, line=66 (Compiled frame) 
- io.netty.util.internal.PlatformDependent0.freeDirectBuffer(java.nio.ByteBuffer) @bci=1, line=147 (Compiled frame) 
- io.netty.util.internal.PlatformDependent.freeDirectBuffer(java.nio.ByteBuffer) @bci=13, line=281 (Compiled frame) 
- io.netty.buffer.UnpooledUnsafeDirectByteBuf.freeDirect(java.nio.ByteBuffer) @bci=1, line=115 (Compiled frame) 
- io.netty.buffer.UnpooledUnsafeDirectByteBuf.deallocate() @bci=24, line=508 (Compiled frame) 
- io.netty.buffer.AbstractReferenceCountedByteBuf.release() @bci=39, line=106 (Compiled frame) 
- io.netty.util.ReferenceCountUtil.release(java.lang.Object) @bci=11, line=59 (Compiled frame) 
- io.netty.util.ReferenceCountUtil.safeRelease(java.lang.Object) @bci=1, line=84 (Compiled frame) 
- io.netty.channel.ChannelOutboundBuffer.remove() @bci=40, line=258 (Compiled frame) 
- io.netty.channel.ChannelOutboundBuffer.removeBytes(long) @bci=83, line=334 (Compiled frame) 
- io.netty.channel.socket.nio.NioSocketChannel.doWrite(io.netty.channel.ChannelOutboundBuffer) @bci=238, line=317 (Compiled frame) 
- io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() @bci=89, line=750 (Compiled frame) 
- io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() @bci=9, line=303 (Compiled frame) 
- io.netty.channel.AbstractChannel$AbstractUnsafe.flush() @bci=15, line=719 (Compiled frame) 
- io.netty.channel.DefaultChannelPipeline$HeadContext.flush(io.netty.channel.ChannelHandlerContext) @bci=4, line=1119 (Compiled frame) 
- io.netty.channel.AbstractChannelHandlerContext.invokeFlush() @bci=8, line=735 (Compiled frame) 
- io.netty.channel.AbstractChannelHandlerContext.access$1500(io.netty.channel.AbstractChannelHandlerContext) @bci=1, line=32 (Compiled frame) 
- io.netty.channel.AbstractChannelHandlerContext$16.run() @bci=4, line=723 (Compiled frame) 
- io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(long) @bci=26, line=357 (Compiled frame) 
- io.netty.channel.nio.NioEventLoop.run() @bci=106, line=357 (Compiled frame) 
- io.netty.util.concurrent.SingleThreadEventExecutor$2.run() @bci=13, line=111 (Interpreted frame) 
- java.lang.Thread.run() @bci=11, line=745 (Interpreted frame) 

Voici mon environnement.

 
java version "1.8.0_66" 
Linux version 2.6.32-504.el6.x86_64 

Répondre

1

version Linux 2.6.32-504.el6.x86_64

On dirait que vous utilisez un noyau souffrant de la futex_wait bug.

La solution est de mettre à jour vers un noyau plus récent.

+0

Pourquoi le journal ParNew était anormal. – famoss

-2

Cela semble être un problème de mémoire.

Cas:

  • Vous avez peut-être certaines variables déclarées à l'intérieur des boucles. Créez ces variables dans une boucle extérieure.
  • Vous devez augmenter la taille de votre tas.
+0

J'ai ajouté la taille de tas à 50G avant, mais l'application a arrêté la même ..Lorsque l'application a cessé de fonctionner, elle ne pouvait pas se récupérer avant d'utiliser la commande jstack. Donc, je veux savoir quel code a un problème. – famoss

+0

@famoss Profil de votre application. – Kayaman

+0

@ Kayaman, mon application est une application flume, mais l'évier n'est pas l'évier officiel, il codé par moi-même. – famoss