1. 程式人生 > 實用技巧 >轉 JVM jstack命令用法及相關知識小結

轉 JVM jstack命令用法及相關知識小結

https://www.jianshu.com/p/99562e9391b5

前言

很久沒有寫過與JVM相關的文章了。今天搬磚有點累,不太想啃原始碼,寫一篇實用性比較強的吧。

在日常工作中,我們如果遇到JVM方面的問題,一般是採用各種現成的工具輔助定位解決,如VisualVM、JProfiler、Eclipse MAT、Arthas等。但是,我們也有必要了解JVM原生提供的那些命令列工具,本文就針對常用的jstack命令做個簡單的總結,順便聊一些與Java執行緒與併發相關的周邊知識。今後有時間的話,也會將jmap、jhat、jstat等命令的用法總結出來。

jstack簡介

jstack命令用來生成JVM中的執行緒快照(thread dump),其中包含有每個執行緒的方法呼叫棧以及其狀態、鎖資訊等。其用法說明如下所示。

~ jstack -h
Usage:
    jstack [-l] <pid>
        (to connect to running process)
    jstack -F [-m] [-l] <pid>
        (to connect to a hung process)
    jstack [-m] [-l] <executable> <core>
        (to connect to a core file)
    jstack [-m] [-l] [server_id@]<remote server IP or hostname>
        (to connect to a remote debug server)

Options:
    -F  to force a thread dump. Use when jstack <pid> does not respond (process is hung)
    -m  to print both java and native frames (mixed mode)
    -l  long listing. Prints additional information about locks
    -h or -help to print this help message

說明一下三個引數的含義:

  • -F:如果正常執行jstack命令沒有響應(比如程序hung住了),可以加上此引數強制執行thread dump。
  • -m:除了列印Java的方法呼叫棧之外,還會輸出native方法的棧幀。
  • -l:列印與鎖有關的附加資訊。使用此引數會導致JVM停止時間變長,在生產環境需慎用。

jstack是線上程級別定位JVM問題的利器,但前提是得讀懂thread dump,我們舉例說明。

執行緒快照

下面是對一個正常執行的Spark Streaming作業執行jstack命令產生的執行緒快照的一部分。

~ jstack 18747
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.172-b11 mixed mode):

"SparkUI-1510250" #1510250 daemon prio=5 os_prio=0 tid=0x00007f9a6c00c800 nid=0x45cc waiting on condition [0x00007f9ce86e7000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c0420db8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at org.spark_project.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)
        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
        at java.lang.Thread.run(Thread.java:748)

"shuffle-server-6-7" #190 daemon prio=5 os_prio=0 tid=0x00007f9b44009000 nid=0x4d80 runnable [0x00007f9ce99f8000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked <0x00000000c58e6498> (a io.netty.channel.nio.SelectedSelectionKeySet)
        - locked <0x00000000c59c1528> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000c59c1450> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:409)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:748)

"SparkUI-84-acceptor-3@3b331d23-ServerConnector@9826a7d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}" #84 daemon prio=3 os_prio=0 tid=0x00007f9d7decc800 nid=0x4500 waiting for monitor entry [0x00007f9d112c8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:234)
        - waiting to lock <0x00000000c045e868> (a java.lang.Object)
        at org.spark_project.jetty.server.ServerConnector.accept(ServerConnector.java:373)
        at org.spark_project.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:593)
        at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
        at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
        at java.lang.Thread.run(Thread.java:748)

"org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" #25 daemon prio=5 os_prio=0 tid=0x00007f9d7d8da000 nid=0x44c2 in Object.wait() [0x00007f9d19dc0000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
        - locked <0x00000000c0031c98> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
        at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:3213)
        at java.lang.Thread.run(Thread.java:748)

"RecurringTimer - JobGenerator" #120 daemon prio=5 os_prio=0 tid=0x00007f9b04045000 nid=0x4cc3 waiting on condition [0x00007f9d10cd4000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.spark.util.SystemClock.waitTillTime(Clock.scala:63)
        at org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:93)
        at org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106)
        at org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)

在每個執行緒的快照的第一行,包含有執行緒名、是否為守護執行緒、優先順序、執行緒ID等資訊,第二行則是執行緒狀態,下面就是方法呼叫棧了。下圖是Java執行緒狀態轉換的示意,老生常談。

jstack執行緒快照中的狀態與圖示相同,只是沒有NEW狀態而已。我們逐一進行分析,在分析之前,先放出Java管程物件ObjectMonitor的簡圖。看官也可以通過我之前寫的這篇文章來了解管程。

RUNNABLE

執行緒正在執行。如果在其呼叫棧中看到locked <地址>的提示,表示該方法正持有鎖,即該執行緒位於Owner區內。

BLOCKED

執行緒處於阻塞狀態,即正在等待鎖被其他執行緒釋放。在其呼叫棧的棧頂方法會看到waiting to lock <地址>的提示,表示該方法試圖持有鎖,執行緒正在Entry Set區等待。

WAITING

執行緒處於無限等待的狀態。又分為兩種情況:

  • on object monitor:執行緒已經獲得鎖,呼叫了不帶超時引數的Object.wait()/Thread.join()方法,執行緒進入管程的Wait Set區。在其呼叫棧中會看到locked <地址>的提示。
  • parking:呼叫了LockSupport.park()方法,執行緒直接進入掛起狀態(park是Unsafe提供的低階原語)。在其呼叫棧的棧頂方法會看到parking to wait for <地址>的提示。
TIMED_WAITING

執行緒處於有限等待的狀態。它分為三種情況,除了與WAITING相同的on object monitor(獲得鎖並呼叫帶超時的Object.wait()/Thread.join()方法)和parking(呼叫帶超時的LockSupport.parkNanos()/parkUntil()方法)之外,還有一種sleep,即通過Thread.sleep()使執行緒睡眠。

通過分析執行緒快照的狀態和呼叫棧,可以讓我們快速地定位造成Java程式表現異常的癥結,如死鎖、熱鎖(很多執行緒競爭同一塊臨界區造成大量BLOCKED)、高CPU佔用、I/O長時間阻塞(注意此時執行緒狀態可能是RUNNABLE)等。下面舉兩個具體的例子。

用jstack診斷死鎖

死鎖(deadlock)是作業系統理論中的基礎概念,即在併發環境下,一個或多個執行緒在等待資源,但該資源又被其他程序所佔用的困局。死鎖的四個必要條件是:

  • 互斥(mutual exclusion)
  • 不可搶佔(no preemption)
  • 持有並等待(hold and wait)
  • 迴圈等待(circular wait)

下面我們用Java造一個死鎖,並用jstack來診斷它。

public class DeadlockExample {
  private static final Object lock1 = new Object();
  private static final Object lock2 = new Object();

  public static void main(String[] args) throws Exception {
    new Thread(() -> {
      for (int i = 0; i < 100; i++) {
        synchronized (lock1) {
          System.out.println("thread1 synchronized lock1");
          synchronized (lock2) {
            System.out.println("thread1 synchronized lock2");
          }
        }
      }
    }, "thread1").start();

    new Thread(() -> {
      for (int i = 0; i < 100; i++) {
        synchronized (lock2) {
          System.out.println("thread2 synchronized lock2");
          synchronized (lock1) {
            System.out.println("thread2 synchronized lock1");
          }
        }
      }
    }, "thread2").start();
  }
}

執行之,發現只輸出了幾句就停止了。

thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread1 synchronized lock2
thread1 synchronized lock1
thread2 synchronized lock2

用jstack列印執行緒快照,節選部分如下。

"thread2" #20 prio=5 os_prio=31 tid=0x00007fad74020800 nid=0x6203 waiting for monitor entry [0x0000700006364000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
    - waiting to lock <0x00000007157d2a58> (a java.lang.Object)
    - locked <0x00000007157d2a68> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

"thread1" #19 prio=5 os_prio=31 tid=0x00007fad7401c000 nid=0x9903 waiting for monitor entry [0x0000700006261000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
    - waiting to lock <0x00000007157d2a68> (a java.lang.Object)
    - locked <0x00000007157d2a58> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

Found one Java-level deadlock:
=============================
"thread2":
  waiting to lock monitor 0x00007fad65004168 (object 0x00000007157d2a58, a java.lang.Object),
  which is held by "thread1"
"thread1":
  waiting to lock monitor 0x00007fad650056b8 (object 0x00000007157d2a68, a java.lang.Object),
  which is held by "thread2"

Java stack information for the threads listed above:
===================================================
"thread2":
    at me.lmagics.DeadlockExample.lambda$main$1(DeadlockExample.java:28)
    - waiting to lock <0x00000007157d2a58> (a java.lang.Object)
    - locked <0x00000007157d2a68> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$2/501263526.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)
"thread1":
    at me.lmagics.DeadlockExample.lambda$main$0(DeadlockExample.java:17)
    - waiting to lock <0x00000007157d2a68> (a java.lang.Object)
    - locked <0x00000007157d2a58> (a java.lang.Object)
    at me.lmagics.DeadlockExample$$Lambda$1/1394438858.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.

可見,我們不僅能夠發現兩個執行緒都處於BLOCKED狀態,並且jstack還直接給出了死鎖的詳細資訊,方便我們修改程式碼消除死鎖。

用jstack診斷高CPU佔用

接下來造一個有死迴圈的程式,模擬異常的CPU佔用。

public class InfiniteLoopExample {
  private static final Object lock = new Object();

  static class InfiniteLoopRunnable implements Runnable {
    @Override
    public void run() {
      synchronized (lock) {
        long l = 0;
        while (true) {
          l++;
        }
      }
    }
  }

  public static void main(String[] args) throws Exception {
    new Thread(new InfiniteLoopRunnable(), "thread1").start();
    new Thread(new InfiniteLoopRunnable(), "thread2").start();
  }
}

執行該程式,用jps命令找出該Java程序的PID,然後利用top -Hp <PID>命令找出吃CPU最多的那個執行緒。

使用jstack匯出執行緒快照到檔案中。由於執行緒ID是十六進位制表示的,所以我們要將執行緒ID轉換成十六進位制再grep。

這下我們就可以定位到異常程式碼的位置,並進行修改了。

The End

晚安晚安。



作者:LittleMagic
連結:https://www.jianshu.com/p/99562e9391b5
來源:簡書
著作權歸作者所有。商業轉載請聯絡作者獲得授權,非商業轉載請註明出處。