轉 JVM jstack命令用法及相關知識小結
https://www.jianshu.com/p/99562e9391b5
前言
很久沒有寫過與JVM相關的文章了。今天搬磚有點累,不太想啃原始碼,寫一篇實用性比較強的吧。
在日常工作中,我們如果遇到JVM方面的問題,一般是採用各種現成的工具輔助定位解決,如VisualVM、JProfiler、Eclipse MAT、Arthas等。但是,我們也有必要了解JVM原生提供的那些命令列工具,本文就針對常用的jstack命令做個簡單的總結,順便聊一些與Java執行緒與併發相關的周邊知識。今後有時間的話,也會將jmap、jhat、jstat等命令的用法總結出來。
jstack簡介
jstack命令用來生成JVM中的執行緒快照(thread dump),其中包含有每個執行緒的方法呼叫棧以及其狀態、鎖資訊等。其用法說明如下所示。
~ jstack -h Usage: jstack [-l] <pid> (to connect to running process) jstack -F [-m] [-l] <pid> (to connect to a hung process) jstack [-m] [-l] <executable> <core> (to connect to a core file) jstack [-m] [-l] [server_id@]<remote server IP or hostname> (to connect to a remote debug server) Options: -F to force a thread dump. Use when jstack <pid> does not respond (process is hung) -m to print both java and native frames (mixed mode) -l long listing. Prints additional information about locks -h or -help to print this help message
說明一下三個引數的含義:
- -F:如果正常執行jstack命令沒有響應(比如程序hung住了),可以加上此引數強制執行thread dump。
- -m:除了列印Java的方法呼叫棧之外,還會輸出native方法的棧幀。
- -l:列印與鎖有關的附加資訊。使用此引數會導致JVM停止時間變長,在生產環境需慎用。
jstack是線上程級別定位JVM問題的利器,但前提是得讀懂thread dump,我們舉例說明。
執行緒快照
下面是對一個正常執行的Spark Streaming作業執行jstack命令產生的執行緒快照的一部分。
~ jstack 18747 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode): "SparkUI-1510250" #1510250 daemon prio=5 os_prio=0 tid=0x00007f9a6c00c800 nid=0x45cc waiting on condition [0x00007f9ce86e7000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000c0420db8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at org.spark_project.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392) at org.spark_project.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563) at org.spark_project.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48) at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626) at java.lang.Thread.run(Thread.java:748) "shuffle-server-6-7" #190 daemon prio=5 os_prio=0 tid=0x00007f9b44009000 nid=0x4d80 runnable [0x00007f9ce99f8000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000000c58e6498> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x00000000c59c1528> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000c59c1450> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748) "SparkUI-84-acceptor-3@3b331d23-ServerConnector@9826a7d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #84 daemon prio=3 os_prio=0 tid=0x00007f9d7decc800 nid=0x4500 waiting for monitor entry [0x00007f9d112c8000] java.lang.Thread.State: BLOCKED (on object monitor) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234) - waiting to lock <0x00000000c045e868> (a java.lang.Object) at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:373) at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:593) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #25 daemon prio=5 os_prio=0 tid=0x00007f9d7d8da000 nid=0x44c2 in Object.wait() [0x00007f9d19dc0000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144) - locked <0x00000000c0031c98> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165) at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3213) at java.lang.Thread.run(Thread.java:748) "RecurringTimer - JobGenerator" #120 daemon prio=5 os_prio=0 tid=0x00007f9b04045000 nid=0x4cc3 waiting on condition [0x00007f9d10cd4000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.spark.util.SystemClock.waitTillTime(Clock.scala:63) at org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:93) at org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106) at org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)
在每個執行緒的快照的第一行,包含有執行緒名、是否為守護執行緒、優先順序、執行緒ID等資訊,第二行則是執行緒狀態,下面就是方法呼叫棧了。下圖是Java執行緒狀態轉換的示意,老生常談。
jstack執行緒快照中的狀態與圖示相同,只是沒有NEW狀態而已。我們逐一進行分析,在分析之前,先放出Java管程物件ObjectMonitor的簡圖。看官也可以通過我之前寫的這篇文章來了解管程。
RUNNABLE
執行緒正在執行。如果在其呼叫棧中看到locked <地址>
的提示,表示該方法正持有鎖,即該執行緒位於Owner區內。
BLOCKED
執行緒處於阻塞狀態,即正在等待鎖被其他執行緒釋放。在其呼叫棧的棧頂方法會看到waiting to lock <地址>
的提示,表示該方法試圖持有鎖,執行緒正在Entry Set區等待。
WAITING
執行緒處於無限等待的狀態。又分為兩種情況:
- on object monitor:執行緒已經獲得鎖,呼叫了不帶超時引數的Object.wait()/Thread.join()方法,執行緒進入管程的Wait Set區。在其呼叫棧中會看到
locked <地址>
的提示。 - parking:呼叫了LockSupport.park()方法,執行緒直接進入掛起狀態(park是Unsafe提供的低階原語)。在其呼叫棧的棧頂方法會看到
parking to wait for <地址>
的提示。
TIMED_WAITING
執行緒處於有限等待的狀態。它分為三種情況,除了與WAITING相同的on object monitor(獲得鎖並呼叫帶超時的Object.wait()/Thread.join()方法)和parking(呼叫帶超時的LockSupport.parkNanos()/parkUntil()方法)之外,還有一種sleep,即通過Thread.sleep()使執行緒睡眠。
通過分析執行緒快照的狀態和呼叫棧,可以讓我們快速地定位造成Java程式表現異常的癥結,如死鎖、熱鎖(很多執行緒競爭同一塊臨界區造成大量BLOCKED)、高CPU佔用、I/O長時間阻塞(注意此時執行緒狀態可能是RUNNABLE)等。下面舉兩個具體的例子。
用jstack診斷死鎖
死鎖(deadlock)是作業系統理論中的基礎概念,即在併發環境下,一個或多個執行緒在等待資源,但該資源又被其他程序所佔用的困局。死鎖的四個必要條件是:
- 互斥(mutual exclusion)
- 不可搶佔(no preemption)
- 持有並等待(hold and wait)
- 迴圈等待(circular wait)
下面我們用Java造一個死鎖,並用jstack來診斷它。
public class DeadlockExample {
private static final Object lock1 = new Object();
private static final Object lock2 = new Object();
public static void main(String[] args) throws Exception {
new Thread(() -> {
for (int i = 0; i < 100; i++) {
synchronized (lock1) {
System.out.println("thread1 synchronized lock1");
synchronized (lock2) {
System.out.println("thread1 synchronized lock2");
}
}
}
}, "thread1").start();
new Thread(() -> {
for (int i = 0; i < 100; i++) {
synchronized (lock2) {
System.out.println("thread2 synchronized lock2");
synchronized (lock1) {
System.out.println("thread2 synchronized lock1");
}
}
}
}, "thread2").start();
}
}
執行之,發現只輸出了幾句就停止了。
thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread2 synchronized lock2
用jstack列印執行緒快照,節選部分如下。
"thread2" #20 prio=5 os_prio=31 tid=0x00007fad74020800 nid=0x6203 waiting for monitor entry [0x0000700006364000]
java.lang.Thread.State: BLOCKED (on object monitor)
at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
- waiting to lock <0x00000007157d2a58> (a java.lang.Object)
- locked <0x00000007157d2a68> (a java.lang.Object)
at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
"thread1" #19 prio=5 os_prio=31 tid=0x00007fad7401c000 nid=0x9903 waiting for monitor entry [0x0000700006261000]
java.lang.Thread.State: BLOCKED (on object monitor)
at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
- waiting to lock <0x00000007157d2a68> (a java.lang.Object)
- locked <0x00000007157d2a58> (a java.lang.Object)
at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Found one Java-level deadlock:
=============================
"thread2":
waiting to lock monitor 0x00007fad65004168 (object 0x00000007157d2a58, a java.lang.Object),
which is held by "thread1"
"thread1":
waiting to lock monitor 0x00007fad650056b8 (object 0x00000007157d2a68, a java.lang.Object),
which is held by "thread2"
Java stack information for the threads listed above:
===================================================
"thread2":
at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
- waiting to lock <0x00000007157d2a58> (a java.lang.Object)
- locked <0x00000007157d2a68> (a java.lang.Object)
at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
"thread1":
at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
- waiting to lock <0x00000007157d2a68> (a java.lang.Object)
- locked <0x00000007157d2a58> (a java.lang.Object)
at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)
Found 1 deadlock.
可見,我們不僅能夠發現兩個執行緒都處於BLOCKED狀態,並且jstack還直接給出了死鎖的詳細資訊,方便我們修改程式碼消除死鎖。
用jstack診斷高CPU佔用
接下來造一個有死迴圈的程式,模擬異常的CPU佔用。
public class InfiniteLoopExample {
private static final Object lock = new Object();
static class InfiniteLoopRunnable implements Runnable {
@Override
public void run() {
synchronized (lock) {
long l = 0;
while (true) {
l++;
}
}
}
}
public static void main(String[] args) throws Exception {
new Thread(new InfiniteLoopRunnable(), "thread1").start();
new Thread(new InfiniteLoopRunnable(), "thread2").start();
}
}
執行該程式,用jps命令找出該Java程序的PID,然後利用top -Hp <PID>
命令找出吃CPU最多的那個執行緒。
使用jstack匯出執行緒快照到檔案中。由於執行緒ID是十六進位制表示的,所以我們要將執行緒ID轉換成十六進位制再grep。
這下我們就可以定位到異常程式碼的位置,並進行修改了。
The End
晚安晚安。
作者:LittleMagic
連結:https://www.jianshu.com/p/99562e9391b5
來源:簡書
著作權歸作者所有。商業轉載請聯絡作者獲得授權,非商業轉載請註明出處。