1. 程式人生 > >linux 時間相關的一些總結

linux 時間相關的一些總結

僅作為核心程式碼中時間管理模組的筆記,3.10核心,很亂,不喜勿噴。

先有time,後有timer。

常用的time結構有哪些?除了大名鼎鼎的jiffies和jiffies64之外,還有常用的一些結構如下:

ktime_t 經常用在timer中,
union ktime {
    s64    tv64;
#if BITS_PER_LONG != 64 && !defined(CONFIG_KTIME_SCALAR)
    struct {
# ifdef __BIG_ENDIAN
    s32    sec, nsec;
# else
    s32    nsec, sec;
# endif
    } tv;
#endif }; typedef union ktime ktime_t; /* Kill this */

經常用在fs中的timespec,低一點精度的timeval,以及時區結構timezone。主要用來做時間戳等。

struct timespec {
    __kernel_time_t    tv_sec;            /* seconds */
    long        tv_nsec;        /* nanoseconds */
};
struct timeval {     __kernel_time_t     tv_sec;     /* seconds */     __kernel_suseconds_t    tv_usec;    /* microseconds */ };
struct timezone {     int tz_minuteswest; /* minutes west of Greenwich */     int tz_dsttime; /* type of dst correction */ };
 

這些結構之間的常用轉換函式:

/* convert a timespec to ktime_t format: */
static inline ktime_t timespec_to_ktime(struct timespec ts)
{
    return ktime_set(ts.tv_sec, ts.tv_nsec);
}

/* convert a timespec64 to ktime_t format: */
static inline ktime_t timespec64_to_ktime(struct timespec64 ts)
{
    return ktime_set(ts.tv_sec, ts.tv_nsec);
}

/* convert a timeval to ktime_t format: */ static inline ktime_t timeval_to_ktime(struct timeval tv) { return ktime_set(tv.tv_sec, tv.tv_usec * NSEC_PER_USEC); } /* Map the ktime_t to timespec conversion to ns_to_timespec function */ #define ktime_to_timespec(kt) ns_to_timespec((kt).tv64) /* Map the ktime_t to timespec conversion to ns_to_timespec function */ #define ktime_to_timespec64(kt) ns_to_timespec64((kt).tv64) /* Map the ktime_t to timeval conversion to ns_to_timeval function */ #define ktime_to_timeval(kt) ns_to_timeval((kt).tv64) /* Convert ktime_t to nanoseconds - NOP in the scalar storage format: */ #define ktime_to_ns(kt) ((kt).tv64)

比如有時候自己不想那麼高精度的時間戳怎麼辦呢?核心還提供了這個函式,取到秒級,最方便的是這個函式還被匯出了,很好用。

unsigned long get_seconds(void)
{
    struct timekeeper *tk = &timekeeper;

    return tk->xtime_sec;
}
EXPORT_SYMBOL(get_seconds);

還有個有趣的問題是,這個時間的維護,精度要更高的話,就需要用順序鎖去讀取 timekeeper 變數。

struct timespec current_kernel_time(void)
{
    struct timekeeper *tk = &timekeeper;
    struct timespec64 now;
    unsigned long seq;

    do {
        seq = read_seqcount_begin(&timekeeper_seq);

        now = tk_xtime(tk);
    } while (read_seqcount_retry(&timekeeper_seq, seq));

    return timespec64_to_timespec(now);
}
EXPORT_SYMBOL(current_kernel_time);

好了,time除了用來做時間戳之前,另外一個大的應用就是timer的超時時間了。在描述timer之前,有必要描述linux 關於時間管理的幾個大的概念,

低精度的timer定義:

crash> tvec_base
struct tvec_base {
    spinlock_t lock;
    struct timer_list *running_timer;
    unsigned long timer_jiffies;
    unsigned long next_timer;
    unsigned long active_timers;
    struct tvec_root tv1;
    struct tvec tv2;
    struct tvec tv3;
    struct tvec tv4;
    struct tvec tv5;
    unsigned long all_timers;
}

 低精度定時器結構:

struct timer_list {---------------------低精度定時器結構,
    struct list_head entry;-------------用這個掛入到時間輪的連結串列中,與高精度的rb_node類比
    unsigned long expires;--------------超期時間
    struct tvec_base *base;-------------指向某個cpu的 tvec_base
    void (*function)(unsigned long);----回撥
    unsigned long data;
    int slack;
    int start_pid;
    void *start_site;
    char start_comm[16];
}

常用的配套函式有:add_timer,mod_timer,add_timer_on(指定cpu新增timer),del_timer,DEFINE_TIMER,setup_timer等,這些在協議棧程式碼裡面非常常見,一般用來等待超時。既然是超時,那麼對時間精度要求就不那麼高了,所以實現的時候,用了著名的定時器輪。

add_timer的流程和mod_timer的流程差不多,先判斷該timer是不是pending,pending的意思就是從定時器輪已經摘取了,可能正在執行中,它的特徵就是 該timer的 entry的next是否為NULL

static inline int timer_pending(const struct timer_list * timer)
{
    return timer->entry.next != NULL;
}

一句話總結:正等待被排程執行的定時器物件就是pending的。如果一個定時器不是pending的,那麼肯定在定時器輪上。

接下來,自然要先從原來的位置摘除,

static inline void detach_timer(struct timer_list *timer, bool clear_pending)
{
    struct list_head *entry = &timer->entry;

    debug_deactivate(timer);

    __list_del(entry->prev, entry->next);-----如果timer以前沒加入在定時器輪中,則這個啥都不做。
    if (clear_pending)
        entry->next = NULL;
    entry->prev = LIST_POISON2;
}

然後根據這個定時器的超時時間,加入到定時器輪中對應的vec中,主要改動兩個,一個是timer的base,還有一個是timer的entry的所處的位置。

crash> p tvec_bases:0
per_cpu(tvec_bases, 0) = $30 = (struct tvec_base *) 0xffffffff81ea71c0 <boot_tvec_bases>
crash> tvec_bases
PER-CPU DATA TYPE:
  struct tvec_base *tvec_bases;
PER-CPU ADDRESSES:
  [0]: ffff8827dca13948
  [1]: ffff8827dca53948
  [2]: ffff8827dca93948
  [3]: ffff8827dcad3948
  [4]: ffff8827dcb13948
  [5]: ffff8827dcb53948
  [6]: ffff8827dcb93948
。。。。

 這裡還有一個細節,就是timer的base,由於這個是一個指標,所以至少是4位元組對齊的,也就是後面兩位肯定為0,被用來做標記了,當從timer中取這個base指標的時候,就需要將這兩

位處理掉,不能直接用來解引用,否則會出現訪問錯誤。

 

由於低精度的定時器是以jiffies來作為最低精度的,所以精度有限制,但隨著硬體以及多媒體發展的實時性較高的要求,後來,又引入了高精度定時器。它是以納秒為精度的。高精度定時器結構如下:

crash> hrtimer
struct hrtimer {
    struct timerqueue_node node;---------------------------用來插入到紅黑樹中
    ktime_t _softexpires;----------------------------------超期的時間
    enum hrtimer_restart (*function)(struct hrtimer *);----回撥函式,肯定都有,不過它的返回值只有兩個
    struct hrtimer_clock_base *base;-----------------------和低精度定時器類似,也有指向一個percpu的base的一個指標,不過base結構與低精度定時器time_list不同
    unsigned long state;
    int start_pid;
    void *start_site;
    char start_comm[16];
}
SIZE: 96

它指向的base是percpu的 hrtimer_bases,注意和低精度定時器的base相區別,因為低精度的base是percpu的 tvec_base

 而高精度定時器的索引,也不是低精度那個vec管理,而是紅黑樹來管理的。

crash> timerqueue_head
struct timerqueue_head {
    struct rb_root head;
    struct timerqueue_node *next;
}
SIZE: 16
crash> hrtimer_clock_base
struct hrtimer_clock_base {
    struct hrtimer_cpu_base *cpu_base;
    int index;
    clockid_t clockid;
    struct timerqueue_head active;------------管理同類型的hrtimer的紅黑樹封裝
    ktime_t resolution;
    ktime_t (*get_time)(void);
    ktime_t rh_reserved_softirq_time;
    ktime_t offset;
}

crash> hrtimer_cpu_base
struct hrtimer_cpu_base {
    raw_spinlock_t lock;
    unsigned int active_bases;
    unsigned int clock_was_set;
    ktime_t expires_next;
    int hres_active;
    int hang_detected;
    unsigned long nr_events;
    unsigned long nr_retries;
    unsigned long nr_hangs;
    ktime_t max_hang_time;
    struct hrtimer_clock_base clock_base[4];-------------它的地位,和時間輪的vec相當,是用來管理timer的,通過clockid來分類

int cpu;
    int in_hrtirq; 
}

相應的percpu管理結構,與低精度的tvec_base相對比:

crash> hrtimer_bases------------整個hrtimer_interrupt都是以這個變數為基礎
PER-CPU DATA TYPE:
  struct hrtimer_cpu_base hrtimer_bases;
PER-CPU ADDRESSES:
  [0]: ffff8827dca13960
  [1]: ffff8827dca53960
  [2]: ffff8827dca93960
  [3]: ffff8827dcad3960
  [4]: ffff8827dcb13960
  [5]: ffff8827dcb53960
  [6]: ffff8827dcb93960
  [7]: ffff8827dcbd3960
  [8]: ffff8827dcc13960
  [9]: ffff8827dcc53960
  [10]: ffff8827dcc93960
。。。。
 

crash> p hrtimer_bases:0
per_cpu(hrtimer_bases, 0) = $16 = {
  lock = {
    raw_lock = {
      val = {
        counter = 0
      }
    }
  },
  active_bases = 3,
  clock_was_set = 6,
  expires_next = {
    tv64 = 558945095814132
  },
  hres_active = 1,
  hang_detected = 0,
  nr_events = 2303159495,
  nr_retries = 5938805,
  nr_hangs = 5,
  max_hang_time = {
    tv64 = 21681
  },
  clock_base = {{
      cpu_base = 0xffff8827dca13960,
      index = 0,
      clockid = 1,
      active = {
        head = {
          rb_node = 0xffff881677e57e88
        },
        next = 0xffffe8d01d20f220
      },
      resolution = {
        tv64 = 1
      },
      get_time = 0xffffffff810f0670 <ktime_get>,
      rh_reserved_softirq_time = {
        tv64 = 0
      },
      offset = {
        tv64 = 0
      }
    }, {
      cpu_base = 0xffff8827dca13960,
      index = 1,
      clockid = 0,
      active = {
        head = {
          rb_node = 0xffff881c433fbd38
        },
        next = 0xffff884a744a7d38
      },
      resolution = {
        tv64 = 1
      },
      get_time = 0xffffffff810f0ad0 <ktime_get_real>,
      rh_reserved_softirq_time = {
        tv64 = 0
      },
      offset = {
        tv64 = 1540819482621868102
      }
    }, {
      cpu_base = 0xffff8827dca13960,
      index = 2,
      clockid = 7,
      active = {
        head = {
          rb_node = 0x0
        },
        next = 0x0
      },
      resolution = {
        tv64 = 1
      },
      get_time = 0xffffffff810f0c40 <ktime_get_boottime>,
      rh_reserved_softirq_time = {
        tv64 = 0
      },
      offset = {
        tv64 = 0
      }
    }, {
      cpu_base = 0xffff8827dca13960,
      index = 3,
      clockid = 11,
      active = {
        head = {
          rb_node = 0x0
        },
        next = 0x0
      },
      resolution = {
        tv64 = 1
      },
      get_time = 0xffffffff810f08f0 <ktime_get_clocktai>,
      rh_reserved_softirq_time = {
        tv64 = 0
      },
      offset = {
        tv64 = 1540819482621868102
      }
    }},
  cpu = 0,
  in_hrtirq = 0
}
 

 

兩類定時器模組的初始化,在start_kernel中,

asmlinkage void __init start_kernel(void)
{
。。。。    
    init_timers();//定時器模組初始化
    hrtimers_init();//高精度定時器模組初始化
。。。。
}

對比了兩類定時器的定義,從定時器的執行再來對比一下,會加深印象。

對於低精度來說,

void __init init_timers(void)
{
    int err;

    /* ensure there are enough low bits for flags in timer->base pointer */
    BUILD_BUG_ON(__alignof__(struct tvec_base) & TIMER_FLAG_MASK);

    err = timer_cpu_notify(&timers_nb, (unsigned long)CPU_UP_PREPARE,
                   (void *)(long)smp_processor_id());
    init_timer_stats();

    BUG_ON(err != NOTIFY_OK);
    register_cpu_notifier(&timers_nb);
    open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
}
在收到TIMER_SOFTIRQ 之後,run_timer_softirq-->__run_timers,這個函式會執行所有到期的定時的回撥函式。執行回撥的時候都持有 base->lock 這把自旋鎖,所以也要求執行函式不能耗時太多。   對於高精度定時器來說,由於有兩種模式,所以需要單獨說明呼叫流程,

如果是處於低解析度模式,則會在週期性的 update_process_times-->run_local_timers-->hrtimer_run_queues-->__hrtimer_run_queues 來把這些高精度定時器回撥來執行;

update_process_times 呼叫 run_local_timers 來觸發TIMER_SOFTIRQ軟中斷,run_timer_softirq負責呼叫__run_timers處理 TIMER_SOFTIRQ軟中斷。

 run_local_timers  除了觸發軟中斷,還呼叫   hrtimer_run_queues();看能否從低解析度定時器切換到高解析度。 run_local_timers | -->hrtimer_run_queues 負責解析度切換--->hrtimer_switch_to_hres-->tick_setup_sched_timer                            |-->raise_softirq(TIMER_SOFTIRQ)

如果是處於高精度模式,則雖然週期性的 update_process_times-->run_local_timers-->hrtimer_run_queues 會執行,但不會呼叫 __hrtimer_run_queues ,而是在 hrtimer_interrupt

函式中呼叫 __hrtimer_run_queues->__run_hrtimer 來完成定時器的呼叫。

呼叫鏈如下:

hrtimer_interrupt-->__hrtimer_run_queues-->__run_hrtimer-->執行回撥。

這點和網上的不一致,因為網上大多是2.6的核心描述,其實在哪處理不是很關鍵,主要是理解資料結構和呼叫。

 

 總結一下:

  • 每個cpu有一個tvec_base結構;

  • tvec_base結構管理著5個不同超時時間的陣列,它採用的基準時間是jiffies。

  • 加入時間輪的時候,通過timer_list的超時時間,來指定它vec,

  • 時間輪,按到期時間進行處理,第一輪vec處理完畢,會在第二輪中取一個數組元素填充第一輪的256個到底的元素,

  • 它通過__run_timers來執行所有到期的低精度定時器

  • 每個cpu有一個hrtimer_cpu_base結構;
  • hrtimer_cpu_base結構管理著3種不同的時間基準系統的hrtimer,分別是:實時時間,啟動時間和單調時間;它的基準時間是納秒。
  • 每種時間基準系統通過它的active欄位(timerqueue_head結構指標),指向它們各自的紅黑樹;
  • 紅黑樹上,按到期時間進行排序,最先到期的hrtimer位於最左下的節點,並被記錄在active.next欄位中;
  • 3中時間基準的最先到期時間可能不同,所以,它們之中最先到期的時間被記錄在hrtimer_cpu_base的expires_next欄位中。
 

有一點需要注意,高精度定時器要生效,意味著我們要有高精度的時鐘源,那麼當沒有這麼高精度的時鐘源的時候,高精度定時器的運轉,則精度會降低。

說到時鐘源:在我的機器上,3.10的核心,封裝了一個結構,叫clocksource如下:

crash> list clocksource.list -H clocksource_list
ffffffff81a273c0
ffffffff81a2bb40
ffffffff81aebb80
ffffffff81eb5980
ffffffff81a52c40
crash> clocksource ffffffff81a273c0
struct clocksource {
  read = 0xffffffff81032e20 <read_tsc>,-----------這個成員的位置放到第一個,因為它最頻繁使用,和2.6.18系列版本不一樣,大家定義結構的時候把最常使用的放前面,便於cache命中
  cycle_last = 2592996216546832,
  mask = 18446744073709551615,
  mult = 4194304,
  shift = 23,
  max_idle_ns = 428122390528,
  maxadj = 461373,
  archdata = {
    vclock_mode = 1
  },
  name = 0xffffffff819217b4 "tsc",
  list = {
    next = 0xffffffff81a2bb78 <clocksource_hpet+56>,
    prev = 0xffffffff81a52c30 <clocksource_list>
  },
  rating = 300,----------------精度最高
  enable = 0x0,
  disable = 0x0,
  flags = 35,
  suspend = 0x0,
  resume = 0x0,
  owner = 0x0
}
crash> clocksource ffffffff81a2bb40
struct clocksource {
  read = 0xffffffff81062430 <read_hpet>,
  cycle_last = 103666886,
  mask = 4294967295,
  mult = 2796202783,
  shift = 26,
  max_idle_ns = 69681373356,
  maxadj = 307582306,
  archdata = {
    vclock_mode = 2
  },
  name = 0xffffffff818ff927 "hpet",
  list = {
    next = 0xffffffff81aebbb8 <clocksource_acpi_pm+56>,
    prev = 0xffffffff81a273f8 <clocksource_tsc+56>
  },
  rating = 250,
  enable = 0x0,
  disable = 0x0,
  flags = 33,
  suspend = 0x0,
  resume = 0xffffffff810619e0 <hpet_resume_counter>,
  owner = 0x0
}
crash> clocksource ffffffff81aebb80
struct clocksource {
  read = 0xffffffff8153cb10 <acpi_pm_read>,
  cycle_last = 0,
  mask = 16777215,
  mult = 2343484437,
  shift = 23,
  max_idle_ns = 3649976793,
  maxadj = 257783288,
  archdata = {
    vclock_mode = 0
  },
  name = 0xffffffff8191e0b6 "acpi_pm",
  list = {
    next = 0xffffffff81eb59b8 <refined_jiffies+56>,
    prev = 0xffffffff81a2bb78 <clocksource_hpet+56>
  },
  rating = 200,
  enable = 0x0,
  disable = 0x0,
  flags = 33,
  suspend = 0x0,
  resume = 0x0,
  owner = 0x0
}
crash> clocksource ffffffff81eb5980
struct clocksource {
  read = 0xffffffff810f3290 <jiffies_read>,
  cycle_last = 0,
  mask = 4294967295,
  mult = 255961088,
  shift = 8,
  max_idle_ns = 3344197395684985,
  maxadj = 28155719,
  archdata = {
    vclock_mode = 0
  },
  name = 0xffffffff8191e0fb "refined-jiffies",
  list = {
    next = 0xffffffff81a52c78 <clocksource_jiffies+56>,
    prev = 0xffffffff81aebbb8 <clocksource_acpi_pm+56>
  },
  rating = 2,------------------------精度最低
  enable = 0x0,
  disable = 0x0,
  flags = 0,
  suspend = 0x0,
  resume = 0x0,
  owner = 0x0
}
crash> clocksource ffffffff81a52c40
struct clocksource {
  read = 0xffffffff810f3290 <jiffies_read>,
  cycle_last = 4294669298,
  mask = 4294967295,
  mult = 256000000,
  shift = 8,
  max_idle_ns = 3344705780981250,
  maxadj = 28160000,
  archdata = {
    vclock_mode = 0
  },
  name = 0xffffffff8191e103 "jiffies",
  list = {
    next = 0xffffffff81a52c30 <clocksource_list>,
    prev = 0xffffffff81eb59b8 <refined_jiffies+56>
  },
  rating = 1,
  enable = 0x0,
  disable = 0x0,
  flags = 0,
  suspend = 0x0,
  resume = 0x0,
  owner = 0x0
}

 使用者可以通過 手工來切換clocksource,比如我的環境上有tsc,hpet,acpi_pm三個可用的clocksource(這個比crash中列的少一些)

cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm
[[email protected] ~]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
[[email protected] ~]# cat /sys/devices/system/clocksource/clocksource0/unbind_clocksource
cat: /sys/devices/system/clocksource/clocksource0/unbind_clocksource: Permission denied
[[email protected] ~]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
[[email protected] ~]# ls -alrt /sys/devices/system/clocksource/clocksource0/current_clocksource
-rw-r--r-- 1 root root 4096 Oct 30 09:52 /sys/devices/system/clocksource/clocksource0/current_clocksource
[[email protected] ~]# echo hpet > /sys/devices/system/clocksource/clocksource0/current_clocksource
[[email protected] ~]# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
[[email protected] ~]# echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource
切換之後會有列印,有時候也可以在message中看到核心自動切換的列印。

[44890.290544] Switched to clocksource hpet
[44902.121090] Switched to clocksource tsc

介紹完時鐘源的定義和使用,有必要介紹下一個重要概念,時鐘事件裝置。

時間事件裝置允許註冊一個事件,在未來一個指定的時間點上發生,但與定時器實現相比,它只能儲存一個事件。

舉一個clock_event_device 的例子:

clock_event_device ffff8827dcf11140
struct clock_event_device {
  event_handler = 0xffffffff810b9890 <hrtimer_interrupt>,
  set_next_event = 0xffffffff81053df0 <lapic_next_deadline>,
  set_next_ktime = 0x0,
  next_event = {
    tv64 = 62900678796701
  },
  max_delta_ns = 2199023255551,
  min_delta_ns = 1000,
  mult = 8388608,
  shift = 27,
  mode = CLOCK_EVT_MODE_ONESHOT,
  features = 2,-------------------------------------------屬性,為2說明是oneshot模式
  retries = 19117,
  broadcast = 0xffffffff81053e30 <lapic_timer_broadcast>,
  set_mode = 0xffffffff81054620 <lapic_timer_setup>,
  suspend = 0x0,
  resume = 0x0,
  min_delta_ticks = 15,
  max_delta_ticks = 18446744073709551615,
  name = 0xffffffff818fefdd "lapic",------------------事件裝置的名稱
  rating = 150,
  irq = -1,
  bound_on = 0,
  cpumask = 0xffffffff816e7c60 <cpu_bit_bitmap+26240>,
  list = {
    next = 0xffff8857bc2d11d8,
    prev = 0xffff8827dcf511d8
  },
  owner = 0x0
}

 

/*
 * Clock event features
 */
#define CLOCK_EVT_FEAT_PERIODIC        0x000001
#define CLOCK_EVT_FEAT_ONESHOT        0x000002
#define CLOCK_EVT_FEAT_KTIME        0x000004
/*
 * x86(64) specific misfeatures:
 *
 * - Clockevent source stops in C3 State and needs broadcast support.
 * - Local APIC timer is used as a dummy device.
 */
#define CLOCK_EVT_FEAT_C3STOP        0x000008
#define CLOCK_EVT_FEAT_DUMMY        0x000010

/*
 * Core shall set the interrupt affinity dynamically in broadcast mode
 */
#define CLOCK_EVT_FEAT_DYNIRQ        0x000020

/*
 * Clockevent device is based on a hrtimer for broadcast
 */
#define CLOCK_EVT_FEAT_HRTIMER        0x000080

每個時鐘硬體設備註冊一個時鐘裝置tick_device 和一個時鐘事件裝置。

struct tick_device {     struct clock_event_device *evtdev;     enum tick_device_mode mode; }; 可以看出,時鐘裝置就是時鐘事件裝置的簡單封裝。  

為了精度,系統相容了兩套定時器,一套是時間輪的低精度定時器,一種是高精度的hrtimer。定時器軟中斷呼叫 hrtimer_run_queues 來處理高解析度定時器佇列,哪怕底層時鐘事件裝置只提供了低解析度,也是如此。這使得可以使用現存的框架,而無需關注時鐘的解析度。

為了節能,系統又引入了tickless模型,也就是nohz模型,其實就是將原來週期性的tick,變為按需觸發,對於需要模擬tick的週期性函式,則由相應的cpu來完成,其他cpu如果沒事可以

休息。 

nohz_mode目前包含三種模式,一種是未開啟nohz,一種是系統工作於低解析度模式下的動態時鐘,一種是系統工作於高精度模式下的動態時鐘。

struct tick_sched {
    struct hrtimer            sched_timer;---用於高解析度模式下,模擬週期時鐘的一個timer
    unsigned long            check_clocks;
    enum tick_nohz_mode        nohz_mode;---包含三種模式,
    ktime_t                last_tick;
    ktime_t                next_tick;
    int                inidle;
    int                tick_stopped;
    unsigned long            idle_jiffies;
    unsigned long            idle_calls;
    unsigned long            idle_sleeps;
    int                idle_active;
    ktime_t                idle_entrytime;
    ktime_t                idle_waketime;
    ktime_t                idle_exittime;
    ktime_t                idle_sleeptime;
    ktime_t                iowait_sleeptime;
    ktime_t                sleep_length;
    unsigned long            last_jiffies;
    u64                next_timer;
    ktime_t                idle_expires;
    int                do_timer_last;
};
 
crash> tick_sched
struct tick_sched {
    struct hrtimer sched_timer;
    unsigned long check_clocks;
    enum tick_nohz_mode nohz_mode;
    ktime_t last_tick;
    ktime_t next_tick;
    int inidle;
    int tick_stopped;
    unsigned long idle_jiffies;
    unsigned long idle_calls;
    unsigned long idle_sleeps;
    int idle_active;
    ktime_t idle_entrytime;
    ktime_t idle_waketime;
    ktime_t idle_exittime;
    ktime_t idle_sleeptime;
    ktime_t iowait_sleeptime;
    ktime_t sleep_length;
    unsigned long last_jiffies;
    u64 next_timer;
    ktime_t idle_expires;
    int do_timer_last;
}

tick_sched 中收集的統計資訊通過/proc/timer_list 匯出到使用者層。

 

crash> tick_cpu_sched
PER-CPU DATA TYPE:
  struct tick_sched tick_cpu_sched;
PER-CPU ADDRESSES:
  [0]: ffff8827dca13f20
  [1]: ffff8827dca53f20
  [2]: ffff8827dca93f20
  [3]: ffff8827dcad3f20
。。。。。。。。。。。。。

crash> p tick_cpu_sched:0
per_cpu(tick_cpu_sched, 0) = $18 = {
  sched_timer = {
    node = {
      node = {
        __rb_parent_color = 18446612303169076872,
        rb_right = 0x0,
        rb_left = 0x0
      },
      expires = {
        tv64 = 579956705000000
      }
    },
    _softexpires = {
      tv64 = 579956705000000
    },
    function = 0xffffffff810f9170 <tick_sched_timer>,
    base = 0xffff8827dca139a0,
    state = 1,
    start_pid = 0,
    start_site = 0xffffffff810f95c2 <tick_nohz_stop_sched_tick+690>,
    start_comm = "swapper/0\000\000\000\000\000\000"
  },
  check_clocks = 1,
  nohz_mode = NOHZ_MODE_HIGHRES,
  last_tick = {
    tv64 = 579956549000000
  },
  next_tick = {
    tv64 = 579956705000000
  },
  inidle = 1,
  tick_stopped = 1,
  idle_jiffies = 4874623845,
  idle_calls = 2616662184,
  idle_sleeps = 2409217702,
  idle_active = 1,
  idle_entrytime = {
    tv64 = 579956548218826
  },
  idle_waketime = {
    tv64 = 579956548210360
  },
  idle_exittime = {
    tv64 = 579956548213511
  },
  idle_sleeptime = {
    tv64 = 548323643001010
  },
  iowait_sleeptime = {
    tv64 = 53588522970
  },
  sleep_length = {
    tv64 = 515114
  },
  last_jiffies = 4874623845,
  next_timer = 579956705000000,
  idle_expires = {
    tv64 = 579956705000000
  },
  do_timer_last = 0
}

對時鐘的禁用是按cpu指定的,一般來說,所有cpu都空閒的概率還是比較低的。

 

crash> p tick_next_period
tick_next_period = $19 = {
  tv64 = 580792669000000
}
crash> p tick_next_period
tick_next_period = $20 = {
  tv64 = 580794124000000
}
crash> p tick_next_period
tick_next_period = $21 = {
  tv64 = 580795263000000
}
crash> p tick_next_period
tick_next_period = $22 = {
  tv64 = 580796247000000
}
crash> p last_jiffies_update
last_jiffies_update = $23 = {
  tv64 = 580801981000000
}
crash> p last_jiffies_update
last_jiffies_update = $24 = {
  tv64 = 580802792000000
}
crash> p last_jiffies_update
last_jiffies_update = $25 = {
  tv64 = 580803530000000
}

 

 時間相關係統呼叫及外部設定:

   adjtimex 系統呼叫,NTP設定,

 

核心的工作模式:

  1. 沒有動態時鐘的低解析度系統,總是用週期時鐘。這時不會支援單觸發模式

  2. 啟用了動態時鐘的低解析度系統,將以單觸發模式是用時鐘裝置

  3. 高解析度系統總是用單觸發模式,無論是否啟用了動態時鐘特性

  4. 高解析度時鐘系統,每個cpu會使用一個hrtimer來模擬週期時鐘,提供tick,畢竟精度高的要模擬精度低的比較容易,同時又能納入自己的高解析度框架。模擬的函式為:tick_sched_timer

非廣播時最終的處理函式:

高解析度動態時鐘:hrtimer_interrupt

高解析度週期時鐘:hrtimer_interrupt

低解析度動態時鐘:tick_nohz_handler

低解析度週期時鐘:tick_handle_periodic

廣播時最終的處理函式:

高解析度動態時鐘:tick_handle_oneshot_broadcast

高解析度週期時鐘:tick_handle_oneshot_broadcast

低解析度動態時鐘:tick_handle_oneshot_broadcast

低解析度週期時鐘:tick_handle_periodic_broadcast

 

參考資料:

linux 3.10核心原始碼
原文:https://blog.csdn.net/goodluckwhh/article/details/9048565