一个 JVM 参数引发的频繁 CMS GC

前言了解 CMS GC 的同学 , 一定知道 -XX:CMSScavengeBeforeRemark 参数 , 它是用来开启或关闭在 CMS-remark 阶段之前的清除(Young GC)尝试 。
大家都知道CMS GC 只会回收 OldGen 的对象 , 那为什么需要这个参数?由于 YoungGen 存在引用 OldGen 对象的情况 , 因此 CMS-remark 阶段会将 YoungGen 作为 OldGen 的 “GC ROOTS” 进行扫描 , 防止回收了不该回收的对象 。 而配置 -XX:+CMSScavengeBeforeRemark 参数 , 在 CMS GC 的 CMS-remark 阶段开始前先进行一次 Young GC , 有利于减少 Young Gen 对 Old Gen 的无效引用 , 降低 CMS-remark 阶段的时间开销 。
这篇文章的内容是业务开发同学遇到的奇怪的频繁 CMS GC 问题 , 我们一起定位排查 , 最终发现跟 -XX:CMSScavengeBeforeRemark 参数相关 。
问题频繁 Full GC业务开发同学通过监控发现线上一台机器频繁 CMS GC , 下图是 CMS GC 监控图 , 大约从 20 点 5-15 分 , 每分钟 8-11 次的持续 CMS GC 。
一个 JVM 参数引发的频繁 CMS GC文章插图
说明:公司监控对 Old GC 与 Full GC 是不区分的 , 案例中讲的其实是 CMS GC 。
OldGen 使用空间占比从下图 OldGen 的使用监控图来看 , 刚开始 OldGen 对象占用 OldGen 约 80% 的空间 , 经过 CMS GC 后 , 几乎立马空间使用的占用比例约在 30% 以下 。
一个 JVM 参数引发的频繁 CMS GC文章插图
//JDK 版本 "1.8.0_45"-Xms5324m -Xmx5324m -Xss512k -XX:PermSize=384m -XX:MaxPermSize=384m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=9 -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSScavengeBeforeRemark -XX:+ScavengeBeforeFullGC -XX:+UseCMSCompactAtFullCollection -XX:+CMSParallelRemarkEnabled -XX:CMSFullGCsBeforeCompaction=9 -XX:CMSInitiatingOccupancyFraction=80 -XX:+CMSClassUnloadingEnabled -XX:SoftRefLRUPolicyMSPerMB=0 -XX:-ReduceInitialCardMarks -XX:+CMSPermGenSweepingEnabled -XX:CMSInitiatingPermOccupancyFraction=80 -XX:+ExplicitGCInvokesConcurrent -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -Xloggc:/data/applogs/heap_trace.txt -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/applogs/HeapDumpOnOutOfMemoryError -XX:+IgnoreUnrecognizedVMOptions 结合 OldGen 的使用空间占比与 JVM 参数(-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80) , 几乎可以断定第一次 CMS GC 是因为 OldGen 的使用占比到达了 OldGen 总量的 80% 。
疑惑第一次触发 CMS GC 可以通过 OldGen 的使用占比到达了 OldGen 总量的 80% 来解释 , 但是通过监控可以看到后来 OldGen 使用占比降低到 30% 以下 , 为什么还一直频繁进行 CMS GC?
分析GC 监控图展示的还不够全面 , 具体问题还是要通过 GC 日志进行定位 , 因为 GC 日志中的信息更丰富 。
GC 日志为了分析问题 , 这里选取了第一次、第二次、第三次-第 N 次的 CMS GC 日志 。
第一次 CMS GC日志
2019-03-28T20:05:06.906+0800: 3644459.373: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2935428K(3354624K)] 3160044K(5242112K), 0.0586708 secs] [Times: user=0.22 sys=0.00, real=0.06 secs]2019-03-28T20:05:06.965+0800: 3644459.432: Total time for which application threads were stopped: 0.0616049 seconds, Stopping threads took: 0.0001381 seconds2019-03-28T20:05:06.965+0800: 3644459.432: [CMS-concurrent-mark-start]2019-03-28T20:05:08.066+0800: 3644460.533: [CMS-concurrent-mark: 1.101/1.101 secs] [Times: user=1.57 sys=0.05, real=1.10 secs]2019-03-28T20:05:08.066+0800: 3644460.533: [CMS-concurrent-preclean-start]2019-03-28T20:05:08.076+0800: 3644460.543: [CMS-concurrent-preclean: 0.010/0.010 secs] [Times: user=0.01 sys=0.01, real=0.01 secs]2019-03-28T20:05:08.076+0800: 3644460.543: [CMS-concurrent-abortable-preclean-start]2019-03-28T20:05:10.177+0800: 3644462.645: Application time: 3.2124140 seconds{Heap before GC invocations=18476 (full 731): par new generationtotal 1887488K, used 1887488K [0x0000000673400000, 0x00000006f3400000, 0x00000006f3400000)eden space 1677824K, 100% used [0x0000000673400000, 0x00000006d9a80000, 0x00000006d9a80000)from space 209664K, 100% used [0x00000006e6740000, 0x00000006f3400000, 0x00000006f3400000)tospace 209664K,0% used [0x00000006d9a80000, 0x00000006d9a80000, 0x00000006e6740000) concurrent mark-sweep generation total 3354624K, used 2935428K [0x00000006f3400000, 0x00000007c0000000, 0x00000007c0000000) Metaspaceused 90225K, capacity 91504K, committed 91776K, reserved 1130496Kclass spaceused 9517K, capacity 9806K, committed 9856K, reserved 1048576K2019-03-28T20:05:10.179+0800: 3644462.647: [GC (Allocation Failure) 3644462.647: [ParNew: 1887488K->201195K(1887488K), 0.4228807 secs] 4822916K->3279195K(5242112K), 0.4231546 secs] [Times: user=1.54 sys=0.00, real=0.42 secs] Heap after GC invocations=18477 (full 731): par new generationtotal 1887488K, used 201195K [0x0000000673400000, 0x00000006f3400000, 0x00000006f3400000)eden space 1677824K,0% used [0x0000000673400000, 0x0000000673400000, 0x00000006d9a80000)from space 209664K,95% used [0x00000006d9a80000, 0x00000006e5efae68, 0x00000006e6740000)tospace 209664K,0% used [0x00000006e6740000, 0x00000006e6740000, 0x00000006f3400000) concurrent mark-sweep generation total 3354624K, used 3078000K [0x00000006f3400000, 0x00000007c0000000, 0x00000007c0000000) Metaspaceused 90225K, capacity 91504K, committed 91776K, reserved 1130496Kclass spaceused 9517K, capacity 9806K, committed 9856K, reserved 1048576K}2019-03-28T20:05:10.603+0800: 3644463.070: Total time for which application threads were stopped: 0.4258929 seconds, Stopping threads took: 0.0001722 seconds2019-03-28T20:05:10.904+0800: 3644463.372: [CMS-concurrent-abortable-preclean: 2.397/2.828 secs] [Times: user=6.22 sys=0.10, real=2.83 secs]2019-03-28T20:05:10.904+0800: 3644463.372: Application time: 0.3012271 seconds2019-03-28T20:05:10.907+0800: 3644463.374: [GC (CMS Final Remark) [YG occupancy: 434406 K (1887488 K)]{Heap before GC invocations=18477 (full 731): par new generationtotal 1887488K, used 434406K [0x0000000673400000, 0x00000006f3400000, 0x00000006f3400000)eden space 1677824K,13% used [0x0000000673400000, 0x00000006817bed10, 0x00000006d9a80000)from space 209664K,95% used [0x00000006d9a80000, 0x00000006e5efae68, 0x00000006e6740000)tospace 209664K,0% used [0x00000006e6740000, 0x00000006e6740000, 0x00000006f3400000) concurrent mark-sweep generation total 3354624K, used 3078000K [0x00000006f3400000, 0x00000007c0000000, 0x00000007c0000000) Metaspaceused 90225K, capacity 91504K, committed 91776K, reserved 1130496Kclass spaceused 9517K, capacity 9806K, committed 9856K, reserved 1048576K2019-03-28T20:05:10.907+0800: 3644463.375: [GC (CMS Final Remark) 3644463.375: [ParNew (promotion failed): 434406K->315478K(1887488K), 5.8407801 secs] 3512406K->3486710K(5242112K), 5.8410096 secs] [Times: user=6.84 sys=1.31, real=5.84 secs]Heap after GC invocations=18478 (full 731): par new generationtotal 1887488K, used 315478K [0x0000000673400000, 0x00000006f3400000, 0x00000006f3400000)eden space 1677824K,13% used [0x0000000673400000, 0x00000006817bed10, 0x00000006d9a80000)from space 209664K,39% used [0x00000006e6740000, 0x00000006eb796e60, 0x00000006f3400000)tospace 209664K,95% used [0x00000006d9a80000, 0x00000006e5efae68, 0x00000006e6740000) concurrent mark-sweep generation total 3354624K, used 3171231K [0x00000006f3400000, 0x00000007c0000000, 0x00000007c0000000) Metaspaceused 90225K, capacity 91504K, committed 91776K, reserved 1130496Kclass spaceused 9517K, capacity 9806K, committed 9856K, reserved 1048576K}3644469.216: [Rescan (parallel) , 0.3096135 secs]3644469.525: [weak refs processing, 0.0009228 secs]3644469.526: [class unloading, 0.0797710 secs]3644469.606: [scrub symbol table, 0.0229535 secs]3644469.629: [scrub string table, 0.0020416 secs][1 CMS-remark: 3171231K(3354624K)] 3486710K(5242112K), 6.2593934 secs] [Times: user=8.10 sys=1.36, real=6.26 secs]2019-03-28T20:05:17.166+0800: 3644469.634: Total time for which application threads were stopped: 6.2622888 seconds, Stopping threads took: 0.0002099 seconds2019-03-28T20:05:17.167+0800: 3644469.634: [CMS-concurrent-sweep-start]2019-03-28T20:05:17.176+0800: 3644469.644: Application time: 0.0100218 seconds2019-03-28T20:05:17.179+0800: 3644469.647: Total time for which application threads were stopped: 0.0025500 seconds, Stopping threads took: 0.0001934 seconds2019-03-28T20:05:18.179+0800: 3644470.647: Application time: 1.0001731 seconds2019-03-28T20:05:18.182+0800: 3644470.649: Total time for which application threads were stopped: 0.0026811 seconds, Stopping threads took: 0.0001358 seconds2019-03-28T20:05:21.000+0800: 3644473.468: Application time: 2.8185985 seconds2019-03-28T20:05:21.003+0800: 3644473.471: Total time for which application threads were stopped: 0.0029238 seconds, Stopping threads took: 0.0001172 seconds2019-03-28T20:05:21.013+0800: 3644473.481: Application time: 0.0097451 seconds2019-03-28T20:05:21.019+0800: 3644473.487: Total time for which application threads were stopped: 0.0060990 seconds, Stopping threads took: 0.0002775 seconds2019-03-28T20:05:21.734+0800: 3644474.201: Application time: 0.7144315 seconds2019-03-28T20:05:21.736+0800: 3644474.204: Total time for which application threads were stopped: 0.0026804 seconds, Stopping threads took: 0.0001238 seconds2019-03-28T20:05:22.203+0800: 3644474.671: [CMS-concurrent-sweep: 5.019/5.037 secs] [Times: user=5.28 sys=0.27, real=5.03 secs]2019-03-28T20:05:22.204+0800: 3644474.671: [CMS-concurrent-reset-start]2019-03-28T20:05:22.211+0800: 3644474.678: [CMS-concurrent-reset: 0.007/0.007 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]2019-03-28T20:05:22.238+0800: 3644474.706: Application time: 0.5016696 seconds2019-03-28T20:05:22.241+0800: 3644474.708: Total time for which application threads were stopped: 0.0026876 seconds, Stopping threads took: 0.0001305 seconds2019-03-28T20:05:22.438+0800: 3644474.905: Application time: 0.1970764 seconds2019-03-28T20:05:22.440+0800: 3644474.908: Total time for which application threads were stopped: 0.0027034 seconds, Stopping threads took: 0.0001344 seconds2019-03-28T20:05:23.441+0800: 3644475.908: Application time: 1.0001304 seconds2019-03-28T20:05:23.443+0800: 3644475.911: Total time for which application threads were stopped: 0.0024875 seconds, Stopping threads took: 0.0001316 seconds2019-03-28T20:05:24.210+0800: 3644476.678: Application time: 0.7671567 seconds


推荐阅读