1. 程式人生 > 其它 >MIT6.S081-Lab5 COW

MIT6.S081-Lab5 COW

開始日期:22.4.19

作業系統:Ubuntu20.0.4

Link:Lab COW

目錄

Lab COW

寫在前面

Virtual address

  • 關於虛擬地址的概念,一直不是很清晰,book-riscv-rev2中給出的概念是:虛擬地址是用來給xv6操作的地址

    P26

    The RISC-V page table translates (or “maps”) a virtual address (the address that an RISC-V instruction manipulates) to a physical address (an address that the CPU chip sends to main memory).

  • 事實上,這個說法沒有錯,但虛擬地址是怎麼來的卻沒有說清楚,於是我翻閱了Modern operating systemsComputer Systems. A Programmer’s Perspective 3rd Edition,查詢了虛擬記憶體的章節,裡面提到:

    Modern operating systemsP195

    ...Addresses can be generated using indexing, base registers, segment registers, and other ways.

    These program-generated addresses are called virtual addresses

    and form the virtual address space.

    On computers without virtual memory, the virtual address is put directly onto the memory bus and causes the physical memory word with the same address to be read or written. When virtual memory is used, the virtual addresses do not go directly to the memory bus. Instead, they go to an MMU (Memory Management Unit) that maps the virtual addresses onto the physical memory addresses, as illustrated in Fig. 3-8.

    Computer Systems. A Programmer’s Perspective 3rd Edition

    With virtual addressing, the CPU accesses main memory by generating a virtual address (VA), which is converted to the appropriate physical address before being sent to main memory.

    • 綜合來看, 程式呼叫cpu使用索引、基址暫存器、段暫存器或其它方式產生的地址就稱為虛擬地址,它們的集合構成了一個虛擬地址空間
      使用虛擬地址的方式,就是程式在cpu裡跑的時候,cpu會生成虛擬地址,再轉換為實體地址,傳送到實體記憶體,找到一樣的實體地址,從裡面拿出資料來。
  • 在我們的xv6,所謂MMU,就是我們的對使用者程式虛擬空間採用三級頁表,對核心虛擬空間採用直接對映

  • 從概念上想,一個CPU產生一個虛擬空間後,xv6會使用stap切換頁表,在相同的虛擬空間裡使用不同的頁表,虛擬空間指向的實體記憶體就不一樣了

參考連結

Copy-on-Write

  • 認真看hint和plan,大部分思路都提供,一部分思路需要自己多嘗試或者參考= =

  • 新增引用計數ref_count

    • 我們分頁時,只是在實體記憶體這一塊分頁,也就是KERBASE ~ PHYSTOP

    • 為什麼要用lock,因為ref_count是共享的,容易出現覆蓋,考慮如下例子

      Lab6: Copy-on-Write Fork for xv6

      這裡使用自旋鎖是考慮到這種情況:程序P1和P2共用記憶體M,M引用計數為2,此時CPU1要執行fork產生P1的子程序,CPU2要終止P2,那麼假設兩個CPU同時讀取引用計數為2,執行完成後CPU1中儲存的引用計數為3,CPU2儲存的計數為1,那麼後賦值的語句會覆蓋掉先賦值的語句,從而產生錯誤

    • 注意: ref_count是全域性變數,會自動初始化為0

     /* kernel/kalloc.c */
     struct {
       struct spinlock lock;
       struct run *freelist;
       uint8 ref_count[(PHYSTOP - KERNBASE) / PGSIZE]; 
       // just use KERBASE ~ PHYSTOP memory(128M)
       // 128*1024*1024 / 4096 = 32768 pages
     } kmem;
    

  • ref_count增加1

    • 記得要新增到宣告到defs.h
     /* kernel/kalloc.c */
     // Increment a page's reference count when fork causes a child to share the page, 
     void
     increment_refcount(uint64 pa){
       acquire(&kmem.lock);
       kmem.ref_count[(pa - KERNBASE) / PGSIZE]++;
       release(&kmem.lock);
     }
    
  • ref_count減少1

    • 每次取消對某個實體記憶體頁的引用,最終都會呼叫到kfree(),因此ref_count減少1kfree()實現比較合理。同時,當ref_count減少到0時,即可釋放這一物理頁。
    • 要先在freerange()中,將全部ref_count初始化為1,因為呼叫kfree()時,會自動減1,要抵消這個自動減1。我們希望空閒連結串列組建之後,在未分配之前,所有物理頁面的ref_count全為0
    /* kernel/kalloc.c */
    void
    freerange(void *pa_start, void *pa_end)
    {
      char *p;
      p = (char*)PGROUNDUP((uint64)pa_start);
      for(; p + PGSIZE <= (char*)pa_end; p += PGSIZE){
        acquire(&kmem.lock);
        kmem.ref_count[((uint64)p - KERNBASE) / PGSIZE] = 1;
        release(&kmem.lock);
        kfree(p);    
      }
    }
    
    void
    kfree(void *pa)
    {
      struct run *r;
    
      if(((uint64)pa % PGSIZE) != 0 || (char*)pa < end || (uint64)pa >= PHYSTOP)
        panic("kfree");
    
      // kfree() should only place a page back on the free list
      // if its reference count is zero.
      // decrement a page's count each time any process drops the page from its page table.
      // NOTE: if drops the page, we must call kfree() finally
      acquire(&kmem.lock);
      if(--kmem.ref_count[((uint64)pa - KERNBASE) / PGSIZE] == 0){
        release(&kmem.lock);
        // Fill with junk to catch dangling refs.
        memset(pa, 1, PGSIZE);
    
        r = (struct run*)pa;
        
        acquire(&kmem.lock);
        r->next = kmem.freelist;
        kmem.freelist = r;
        release(&kmem.lock);
      }
      else
        release(&kmem.lock);
    }
    
  • 分配kalloc一次物理頁,就將這個物理頁ref_count設定為1

    /* kernel/kalloc.c */
    void *
    kalloc(void)
    {
      struct run *r;
      acquire(&kmem.lock);
      r = kmem.freelist;
      if(r){
        // Set a page's reference count to one when kalloc() allocates it.  
        kmem.ref_count[((uint64)r - KERNBASE) / PGSIZE] = 1;
        kmem.freelist = r->next;
      }
      release(&kmem.lock);
    
      if(r)
        memset((char*)r, 5, PGSIZE); // fill with junk
      return (void*)r;
    }
    
  • 得到物理頁的ref_count

    • ref_count僅限於在 kernel/kalloc.c中使用,其它地方無法呼叫得到
    • 記得要新增到宣告到defs.h
    int
    get_refcount(uint64 pa)
    {
      return kmem.ref_count[(pa - KERNBASE) / PGSIZE];
    }
    
  • 新增PTE_COW,標識這個PTE是copy on write(寫時複製)的物理頁

    • riscv-privileged,P77,我們使用第8位當作標識位
    #define PTE_COW (1L << 8) // 1 -> COW page
    
  • 修改uvmcopy

    • 呼叫uvmcopy時,如果當前頁面可以寫,那就將其置為不可寫,同時將其標識為cowpage
    • 最終我們不再分配一個新的物理頁,而是直接對映到舊的物理頁
    • 當要寫這個不可寫但為cowpage的頁面時,啟動中斷page fault ,此時我們才分配新的物理頁
    /* kernel/vm.c */
    int
    uvmcopy(pagetable_t old, pagetable_t new, uint64 sz)
    {
      pte_t *pte;
      uint64 pa, i;
      uint flags;
    
      for(i = 0; i < sz; i += PGSIZE){
        if((pte = walk(old, i, 0)) == 0)
          panic("uvmcopy: pte should exist");
        if((*pte & PTE_V) == 0)
          panic("uvmcopy: page not present");
        pa = PTE2PA(*pte);
    
        // Increment a page's reference count when fork causes a child to share the page, 
        increment_refcount(PGROUNDDOWN(pa));
    
        /* just clear PTE_W for page with PTE_W */
        if (*pte & PTE_W){
          /* clear PTE_W */
          *pte &= (~PTE_W);
          /* set PTE_COW */
          *pte |= PTE_COW;
        }
        
        flags = PTE_FLAGS(*pte);
        if(mappages(new, i, PGSIZE, pa, flags) != 0){
          goto err;
        }
      }
      return 0;
    
     err:
      uvmunmap(new, 0, i / PGSIZE, 1);
      return -1;
    }
    
  • 修改usertrap,我們需要兩個輔助函式

    • r_scause() == 15的是我們要處理的store page fault

      • store是將資料從暫存器寫到記憶體當中,我們就是要將一些資料從暫存器寫到物理頁記憶體當中
    • stval()的值此時是發生錯誤的虛擬地址,即fault_va

      riscv-privileged P67

      When a hardware breakpoint is triggered, or an instruction, load, or store address-misaligned, access-fault, or page-fault exception occurs, stval is written with the faulting virtual address.

    • 如果遇到fault_va超過p -> sz,說明地址錯誤了,無法處理

      • 這一點是usertests測試的
    • is_cowpage用來判斷該頁面是不是cowpage

    • cow_page,為這個cowpage分配新的物理頁,該函式需要考慮兩個情況

      • 一是這個cowpage只有一個引用了,我們直接修改pte的值即可
      • 二是這個cowpage有多個引用,這時就要呼叫kalloc()
      • 呼叫mappages去對映時,將PTE_V清掉,防止panic: remap
      • 注意最後要將ref_count1
    • 如果不是cowpage或者分配失敗都會將其kill掉,之後變成zombie process

    • 處理完cowpage引發的page fault,要回到原來的程式計數器(pc)自己重新執行即可,
      萬萬不能將pc增加4

    • 記得要新增到宣告到defs.h

    /* kernel/trap.c */
    ...
    		syscall();
      } 
      else if(r_scause() == 15){  
        // This is "store page fault", because I want write a page without PTE_W  
        uint64 fault_va = r_stval();
        if(fault_va > p->sz ||
           is_cowpage(p->pagetable, fault_va) < 0 ||
           cow_alloc(p->pagetable, PGROUNDDOWN(fault_va)) == 0
        )
        p->killed = 1;
      }
      else if((which_dev = devintr()) != 0){
    ...
      
    /* It is cowpage? */
    /* if YES return 0; else return -1 */
    int 
    is_cowpage(pagetable_t pagetable, uint64 va) 
    {
      pte_t* pte = walk(pagetable, va, 0);
      return (*pte & PTE_COW ? 0 : -1);
    }
    
    /* allocte a phycial memory page for a cow page */
    /* if OK return memory pointer of void*; else return 0 */
    void*
    cow_alloc(pagetable_t pagetable, uint64 va)
    {
      pte_t *pte = walk(pagetable, va, 0);
      uint64 pa = PTE2PA(*pte);
    
      // refcount == 1, only a process use the cowpage
      // so we set the PTE_W of cowpage and clear PTE_COW of the cowpage
      if(get_refcount(pa) == 1){
        *pte |= PTE_W;
        *pte &= ~PTE_COW;
        return (void*)pa;
      }
    
      // refcount >= 2, some processes use the cowpage
      uint flags;
      char *new_mem;
      /* sets PTE_W */
      *pte |= PTE_W;
      flags = PTE_FLAGS(*pte);
      
      /* alloc and copy, then map */
      pa = PTE2PA(*pte);
      new_mem = kalloc();
    
      // If a COW page fault occurs and there's no free memory, the process should be killed.
      if(new_mem == 0)
        return 0;
    
      memmove(new_mem, (char*)pa, PGSIZE);
      /* clear PTE_V before map the page to avoid panic of 'remap'  */
      *pte &= ~PTE_V;
      /* note: new_mem is new address of phycial memory*/
      if(mappages(pagetable, va, PGSIZE, (uint64)new_mem, flags) != 0){
        /* set PTE_V, then kfree new_men, if map failed*/
        *pte |= PTE_V;
        kfree(new_mem);
        return 0;
      }
    
      /* decrement a ref_count */
      kfree((char*)PGROUNDDOWN(pa));
    
      return new_mem;
    }
    
  • copyout中,遇到cowpage時採用usertrap中處理page fault的方式

    • 注意如果處理失敗要返回錯誤值-1
    int
    copyout(pagetable_t pagetable, uint64 dstva, char *src, uint64 len)
    {
    ...
          n = len;
        
        if(is_cowpage(pagetable, va0) == 0)
          // if it is a cowpage, we need a new pa0 pointer to a new memory
          // and if it is a null pointer, we need return error of -1
          if ((pa0 = (uint64)cow_alloc(pagetable, va0)) == 0)
            return -1;
        
        memmove((void *)(pa0 + (dstva - va0)), src, n);
    ...
    }
    

總結

  • 完成日期22.4.20
  • 期間比較難以想到的就是實現ref_count減少1,在kfree中新增這一功能,所有的物理頁面取消對映時,最終都會呼叫kfree,因為要將其釋放掉。
  • debug一段時間,最好不要超過2小時,要立刻去休息,不然容易發懵
  • result
  • 最近在聽《萱草花》乃琳/珈樂和《北方》任素汐