Linux 內核源碼情景分析 chap 2 存儲管理 (四)

阿新 • • 發佈：2017-08-10

void and lis turn tin fsm img 自身 swapon

物理頁面的使用和周轉

1. 幾個術語

1.1 虛存頁面

指虛擬地址空間中一個固定大小，邊界與頁面大小 4KB 對齊的區間及其內容

1.2 物理頁面

與虛存頁面相對的，須要映射到某種物理存儲介質上面的頁面。依據他是否在內存中。我們能夠分為內存頁面和盤上頁面。
另外。 通常說物理內存頁面的分配和釋放是指物理介質，而談及頁面的換入和換出的時候，是指他的內容。

1.3 交換技術

當系統內存不夠用的時候，我們能夠把臨時不用的信息放到磁盤上，為其它急用的信息騰出空間，到須要的時候，再從磁盤上讀進來。
（linux 中主要使用swap 分區。 windows 中使用虛擬內存技術）
早期是基於段式交換的，可是效率太低。於是發展成按需頁面交換技術。

這是一種典型的用時間換空間的做法。

2. 對物理頁面的抽象描寫敘述

2.1 內存物理頁面

在系統的初始化階段，內核會依據檢測到的物理內存的大小，為每一個頁面都建立一個page結構，形成一個page數組。並使用一個全局量 mem_map 指向這個數組。（只是個人感覺。這是對於UMA 均勻介質而言的。對於NUMA page 數組應該是從屬於某個node 的）
技術分享
同一時候。又依照須要將這些頁面拼合成物理地址連續的很多內存頁面塊。然後依據塊的大小建立起若幹管理區 zone，而在每一個管理區中則設置了一個空暇隊列，以便物理內存頁面的分配使用

2.2 交換設備物理頁面

2.2.1 swap_info_struct

內核中定義了一個swap_info_struct 數據結構， 用來描寫敘述和管理用於頁面交換的文件和設備。

==================== include/linux/swap.h 49 64 ====================
49  struct swap_info_struct {
50      unsigned int flags;
51      kdev_t swap_device;
52      spinlock_t sdev_lock;
53      struct dentry * swap_file;
54 
      struct vfsmount *swap_vfsmnt;
55      unsigned short * swap_map;
56      unsigned int lowest_bit;
57      unsigned int highest_bit;
58      unsigned int cluster_next;
59      unsigned int cluster_nr;
60      int prio; /* swap priority */
61      int pages;
62      unsigned long max;
63      int next; /* next entry on swap list */
64  };

當中， swap_map 指向一個數組，數組中的每一個值代表了盤上的一個物理頁面，數組下標決定了頁面在盤或者文件裏的位置。數組大小與pages 相關。
感覺這個swap_map 和我們的 mem_map 指針指向一個page 數組的效果很相似=_=!! <~~ ~.~

特別須要註意的是，設備上的第一個頁面， ie， swap_map[0]所代表的頁面時不用於做頁面交換的。 他包括了該設備或者文件自身的一些信息，以及表明哪些頁面是能夠使用的位圖。

我們利用 lowest_bit 和 highest_bit 字段，標記文件從什麽地方開始到什麽地方結束。

利用 max 字段，標記設備的物理大小。

因為。我們的磁盤通常都是轉動的，所以在分配盤面空間的時候，盡可能依照集群cluster 的方式進行， cluster_next 和 cluster_nr 就是為這個來設計的。

因為 linux 同意使用多個頁面交換設備(文件)，所以在內核中定義了一個 swap_info_struct 數組

    struct swap_info_struct swap_info[MAX_SWAPFILES];

同一時候，內核還建立了一個隊列 swap_list。將各個能夠分配物理頁面的磁盤設備或者文件的 swap_info_struct 結構按優先級高低連接在一起。

==================== mm/swapfile.c 23 23 ====================
23  struct swap_list_t swap_list = {-1, -1};

==================== include/linux/swap.h 153 156 ====================
153  struct swap_list_t {
154 int head; /* head of priority-ordered swapfile list */
155 int next; /* swapfile to be used next */
156  };

2.2.2 swap_entry_t 頁面交換項

相似於內存中的pte_t 數據結構。把物理內存頁面和虛存頁面建立聯系一樣。盤上頁面也有一個swp_entry_t 數據結構，實現相似功能。

==================== include/linux/shmem_fs.h 8 18 ====================
8  /*
9  * A swap entry has to fit into a "unsigned long", as
10   * the entry is hidden in the "index" field of the
11   * swapper address space.
12   *
13   * We have to move it here, since not every user of fs.h is including
14   * mm.h, but m.h is including fs.h via sched .h :-/
15   */
16  typedef struct {
17      unsigned long val;
18  } swp_entry_t;

技術分享

在這裏， offset 表示頁面在某個磁盤設備或者文件裏的位置。 ie，文件裏的邏輯頁面號。 直白點講，他相應著swap_map 所指向的數組中的下標。
而 type 則是指該頁面在哪個文件裏，是個序號。 直白點來講，相應的是swap_info。這個表征多個頁面交換設備的數組中的下標。

另外， swp_entry_t 結構和 pte_t 結構關系很密切。

他們有著同樣大小的數據結構。
當一個頁面在內存中的時候，最低位 P 為 1，其余各位描寫敘述該物理內存頁面的地址和頁面屬性。
而當這個頁面在磁盤上的時候。最低位P 為 0，其余位表示這個頁面的去向

3. 磁盤周轉

3.1 物理空間管理 __swap_free

==================== mm/swapfile.c 141 182 ====================
141  /*
142   * Caller has made sure that the swapdevice corresponding to entry
143   * is still around or has not been recycled.
144   */
145  void __swap_free(swp_entry_t entry, unsigned short count)
146  {
147     struct swap_info_struct * p;
148     unsigned long offset, type;
149
150     if (!entry.val)
151         goto out;
152
153     type = SWP_TYPE(entry);
154     if (type >= nr_swapfiles)
155         goto bad_nofile;
156     p = & swap_info[type];
157     if (!(p->flags & SWP_USED))
158         goto bad_device;
159     offset = SWP_OFFSET(entry);
160     if (offset >= p->max)
161         goto bad_offset;
162     if (!p->swap_map[offset])
163         goto bad_free;
164     swap_list_lock();
165     if (p->prio > swap_info[swap_list.next].prio)
166         swap_list.next = type;
167     swap_device_lock(p);
168     if (p->swap_map[offset] < SWAP_MAP_MAX) {
169         if (p->swap_map[offset] < count)
170             goto bad_count;
171         if (!(p->swap_map[offset] -= count)) {
172             if (offset < p->lowest_bit)
173                 p->lowest_bit = offset;
174             if (offset > p->highest_bit)
175                 p->highest_bit = offset;
176             nr_swap_pages++;
177         }
178     }
179     swap_device_unlock(p);
180     swap_list_unlock();
181  out:
182     return;

須要註意的是，釋放磁盤頁面內容的操作。實際上並不涉及磁盤操作，僅僅是內存中的 “賬面操作”, 表示磁盤上那個頁面的內容已經作廢了。

因而，花費是很小的。

3.2 內存頁面周轉的含義

含義有雙方面：
1. 頁面分配，使用和回收，並不一定涉及頁面的盤區交換
2. 盤區交換。終於目的是為了頁面的回收。

對於用戶空間中的頁面，及涉及分配。使用和回收，還涉及頁面的換入和換出，即使是進程的代碼段，從系統角度看待，都是動態分配的。

對於映射到系統空間的頁面都不會被換出。僅僅會實用完了之後。須要釋放的問題，有些頁面獲取比較費勁。可能還會採用 LRU 隊列。

3.2.1 頁面交換策略

最簡單的策略就是即用即分配，可是可想而知效率很低
使用LRU。 ie。近期最少用到的頁面交換策略，可是可能會引起頁面抖動。
為了降低抖動。引入暫存隊列
增加頁面臟，幹凈等狀態，進一步優化

3.2.2 物理內存頁面換入換出的周轉要點

空暇，此時page 在某個zone 管理區的free_area 隊列中。
頁面引用計數為 0.
分配。分配頁面。引用計數為 1， page 不在處於 free_area隊列中。
活躍狀態，通過 lru 結構連入 active_list, 遞增引用計數
不活躍狀態（臟），利用lru 連入 inactive_dirty_list, 遞減引用計數
將不活躍臟內容寫入交換設備。並將其移動到 inactive_clean_list 中
不活躍狀態(幹凈)
假設在轉入不活躍狀態後一段時間內收到訪問，轉入活躍狀態。恢復映射
假設須要，能夠從幹凈隊列中回收頁面，或者回到空暇隊列。或者另行分配。

用我自己的語言來解釋一下：
我們先分配了一個頁面，然後這個頁面處於活動狀態 active，然後。我們臨時不去訪問它了，他就開始老化，進入inactive 不活動（臟）狀態，但這時候，我們不是馬上寫入交換設備。等再過一段時間，確實沒人訪問，我們將它寫入交換設備，可是這部分頁面，我們還是沒有釋放哦。他被標記為 inactive 不活動(幹凈) 狀態，如今是由相應的存儲區 zone 來管理了，之前是由全局隊列管理的。

假設在這個頁面被用作其它用途之前，又被訪問了，直接建立映射就好了，通過這樣的方法，降低了頁面的抖動現象

3.2.3 策略實現

全局LRU 隊列， active_list 和 inactive_dirty_list
每一個頁面管理區設置 inactive_clean_list
全局 address_space 數據結構 swapper_space
為加快搜索。引入 page_hash_table

以下來看下，內核中交換的代碼

3.2.3.1 code

==================== mm/swap_state.c 54 70 ====================
54  void add_to_swap_cache(struct page *page, swp_entry_t entry)
55  {
56      unsigned long flags;
57
58  #ifdef SWAP_CACHE_INFO
59      swap_cache_add_total++;
60  #endif
61      if (!PageLocked(page))
62          BUG();
63      if (PageTestandSetSwapCache(page))
64          BUG();
65      if (page->mapping)
66          BUG();
67      flags = page->flags & ~((1 << PG_error) | (1 << PG_arch_1));
68      page->flags = flags | (1 << PG_uptodate);
69      add_to_page_cache_locked(page, &swapper_space, entry.val);
70  }

==================== mm/filemap.c 476 494 ====================
476  /*
477   * Add a page to the inode page cache.
478   *
479   * The caller must have locked the page and
480   * set all the page flags correctly..
481   */
482  void add_to_page_cache_locked(struct page * page, struct address_space *mapping, unsigned long index)
483  {
484     if (!PageLocked(page))
485         BUG();
486
487     page_cache_get(page);
488     spin_lock(&pagecache_lock);
489     page->index = index;
490     add_page_to_inode_queue(mapping, page);
491     add_page_to_hash_queue(page, page_hash(mapping, index));
492     lru_cache_add(page);
493     spin_unlock(&pagecache_lock);
494  }

==================== include/linux/fs.h 365 375 ====================
365  struct address_space {
366     struct list_head  clean_pages;  /* list of clean pages */
367     struct list_head  dirty_pages;  /* list of dirty pages */
368     struct list_head  locked_pages; /* list of locked pages */
369     unsigned long nrpages;  /* number of total pages */
370     struct address_space_operations *a_ops;  /* methods */
371     struct inode *host; /* owner: inode, block_device */
372     struct vm_area_struct  *i_mmap;  /* list of private mappings */
373     struct vm_area_struct  *i_mmap_shared; /* list of shared mappings */
374     spinlock_t i_shared_lock;  /* and spinlock protecting it */
375  };

==================== mm/swap_state.c 31 37 ====================
31  struct address_space swapper_space = {
32      LIST_HEAD_INIT(swapper_space.clean_pages),
33      LIST_HEAD_INIT(swapper_space.dirty_pages),
34      LIST_HEAD_INIT(swapper_space.locked_pages),
35      0, /* nrpages */
36      &swap_aops,
37  };

==================== include/linux/mm.h 150 150 ====================
150  #define get_page(p) atomic_inc(&(p)->count)

==================== include/linux/pagemap.h 31 31 ====================
31  #define page_cache_get(x)  get_page(x)

==================== mm/filemap.c 72 79 ====================
72  static inline void add_page_to_inode_queue(struct address_space *mapping, struct page * page)
73  {
74      struct list_head *head = &mapping->clean_pages;
75
76      mapping->nrpages++;
77      list_add(&page->list, head);
78      page->mapping = mapping;
79  }

==================== mm/filemap.c 58 70 ====================
58  static void add_page_to_hash_queue(struct page * page, struct page **p)
59  {
60      struct page *next = *p;
61
62      *p = page;
63      page->next_hash = next;
64      page->pprev_hash = p;
65      if (next)
66          next->pprev_hash = &page->next_hash;
67      if (page->buffers)
68          PAGE_BUG(page);
69      atomic_inc(&page_cache_size);
70  }

==================== include/linux/pagemap.h 68 68 ====================
68  #define page_hash(mapping,index) (page_hash_table+_page_hashfn(mapping,index))

==================== mm/swap.c 226 241 ====================
226  /**
227   * lru_cache_add: add a page to the page lists
228   * @page: the page to add
229   */
230  void lru_cache_add(struct page * page)
231  {
232     spin_lock(&pagemap_lru_lock);
233     if (!PageLocked(page))
234         BUG();
235     DEBUG_ADD_PAGE
236     add_page_to_active_list(page);
237     /* This should be relatively rare */
238     if (!page->age)
239         deactivate_page_nolock(page);
240     spin_unlock(&pagemap_lru_lock);
241  }

==================== include/linux/swap.h 209 215 ====================
209  #define add_page_to_active_list(page) { \
210     DEBUG_ADD_PAGE 211     ZERO_PAGE_BUG 212     SetPageActive(page); 213     list_add(&(page)->lru, &active_list); 214     nr_active_pages++; 215  }

從add_to_page_cache_locked 函數中，我們能夠知道，頁面page 被增加到了 3 個隊列中：
1. 利用 list 增加暫存隊列 swapper_space
2. 利用next_hash 和 pprev_hash 增加 hash_queue
3. 利用 lru 增加 LRU 隊列 active_list

3.3 用戶參與內存管理

特權用戶能夠通過 swapon, swapoff 參與存儲管理等。

Linux 內核源碼情景分析 chap 2 存儲管理 (四)

void and lis turn tin fsm img 自身 swapon 物理頁面的使用和周轉 1. 幾個術語 1.1 虛存頁面指虛擬地址空間中一個固定大小，邊界與頁面大小 4KB 對齊的區間及其內容 1.2 物理頁面與

Linux 內核源碼情景分析 chap 2 存儲管理 (四)

物理頁面的使用和周轉

1. 幾個術語

1.1 虛存頁面

1.2 物理頁面

1.3 交換技術

2. 對物理頁面的抽象描寫敘述

2.1 內存物理頁面

2.2 交換設備物理頁面

2.2.1 swap_info_struct

2.2.2 swap_entry_t 頁面交換項

3. 磁盤周轉

3.1 物理空間管理 __swap_free

3.2 內存頁面周轉的含義

3.2.1 頁面交換策略

3.2.2 物理內存頁面換入換出的周轉要點

3.2.3 策略實現

3.2.3.1 code

3.3 用戶參與內存管理

Linux 內核源碼情景分析 chap 2 存儲管理 (四)

Linux 內核源代碼分析 chap 2 存儲管理（5）

第一次作業：基於Linux內核源碼進程模型分析

centos的linux內核源碼下載方法

Linux內核(3) - 分析內核源碼如何入手(下)

Linux內核(2) - 分析內核源碼如何入手(上)

第一次作業：基於Linux2.6內核源碼進程模型分析

安裝Linux內核源代碼

Linux 內核源代碼根目錄

Linux內核哈希表分析與應用

如何更方便的查看Linux內核代碼的更新記錄【轉】

ubuntu16.04 內核源碼編譯

MySQL內核源碼解讀-SQL解析一

MySQL內核源碼解讀-SQL解析之解析器淺析

[daily] 如何用emacs+xcscope閱讀內核源碼

基於mykernel的一個簡單的時間片輪轉多道程序內核代碼的分析

再次挑戰UCOSII內核源碼

linux內核調度算法（2）--CPU時間片如何分配

Caffe源碼理解1：Blob存儲結構與設計

你為什麽看不懂Linux內核驅動源碼？

Linux 內核源碼情景分析 chap 2 存儲管理 (四)

物理頁面的使用和周轉

1. 幾個術語

1.1 虛存頁面

1.2 物理頁面

1.3 交換技術

2. 對物理頁面的抽象描寫敘述

2.1 內存物理頁面

2.2 交換設備物理頁面

2.2.1 swap_info_struct

2.2.2 swap_entry_t 頁面交換項

3. 磁盤周轉

3.1 物理空間管理 __swap_free

3.2 內存頁面周轉的含義

3.2.1 頁面交換策略

3.2.2 物理內存頁面換入換出的周轉要點

3.2.3 策略實現

3.2.3.1 code

3.3 用戶參與內存管理

相關推薦