mysql8.0原始碼解析 InnoDB redo log日誌寫 write ahead 巧用pageCache實現高效寫

阿新 • • 發佈：2020-12-05

pageCache背景

當往磁碟上寫檔案時，如果檔案內容還沒有被快取或者被置換出去了，在記憶體裡不存在對應的page cache，則需要先將對應page的內容從磁碟上讀到記憶體裡，修改要寫入的資料，然後再將整個page寫回到磁碟；在這種情況下，會有一次額外的讀IO開銷，IO的效能會有一定的損失。

mysql的整體效能高度依賴redo log寫IO的效能，InnoDB對對redo日誌的寫做了優化，redo log寫入是追加寫的模式（append write），引入了write ahead方法。巧用一個8192位元組大小的記憶體空間（log.write_ahead_buf)，實現pageCahe的高效寫入。

write ahead 基本原理

利用8192位元組大小的記憶體與ib_logfile 的pageCache 對齊，一次性生成pageCache，而不會出現先讀後寫的情況。
當第一次寫的時候，先生成8192位元組大小的快取。

redo日誌大小不同情況的說明

1 小於512位元組
2 連續幾次之和小於512位元組
3 大於512位元組，如600位元組
4 大於8192位元組，因引入write ahead，一次寫入最大值為8192
5 小於8192位元組

write ahead剩餘空間大小不同情況的說明

1 剩餘為0 ，即開啟新一輪的cache
2 剩餘空間不足一次寫入redo日誌大小
3 剩餘空間滿足一次寫入

對不同情況的畫圖描述

剩餘為0 寫入大於8192

redo日誌寫入剩餘為0 寫入大於8192

先從 write_ahead_buf 的log.write_ahead_end_offset 與 log.current_file_real_offset 相等開始，
有8192位元組 redo日誌需要寫入，因size >= 8192 直接寫入pageCache，write_ahead_buf的log.write_ahead_end_offset 不變，下一輪操作再更新
這時 write_ahead_buf 的log.write_ahead_end_offset 比 log.current_file_real_offset小8192，需要補齊

保證log.write_ahead_end_offset 與 log.current_file_real_offset 對齊

上次剩餘為0，連續兩次512位元組或者

redo日誌寫入上次剩餘為0，連續兩次512位元組或者
此時write_ahead_buf 的log.write_ahead_end_offset 與 log.current_file_real_offset 相等

現在寫入一個512位元組的redo日誌（大於512小於1024，則只寫入512位元組，下面程式碼中詳解）
現在write_ahead_buf 的空間為0（不是write_ahead_buf8192的位元組大小，而是邏輯上虛擬空間）
需要增加邏輯空間為8192，把buffer中的512位元組複製到write_ahead_buf 中，第一次要寫8192位元組
寫入pageCahe，形成一個8192位元組大小的快取區

再次寫入一個512位元組，直接把512位元組寫入的pageCache，繞開write_ahead_buf.如上圖

連續幾次寫入的大小之和小於512

redo日誌寫入連續幾次寫入的大小之和小於512
當write_ahead_buf 中還有虛擬空間時，寫入小於512位元組（假如寫入25位元組）
需要把redo日誌複製到write_ahead_buf中，剩餘空間補0，修改12位元組的block頭和4位元組的block尾，因不足512位元組，需要修改12位元組的表頭，不能在bufferCache中直接修改，會衝突（buffer Cache是併發寫，沒有鎖），這也是巧用write_ahead_buf 的好處
把512位元組的大小寫入pageCache中

再次寫入45位元組（兩次之和小於512），需要從上次的起始位置即需要複製25+45個位元組複製到write_ahead_buf中，
剩餘空間補0，修改12位元組的block頭和4位元組的block尾，再把此512位元組複製到pageCache中，覆蓋上一次的pageCache中的512位元組

程式碼詳解-log_files_write_buffer

注原始碼為mysql8.0.20
log0write.cc

/*此函主要是把buffer的redo日誌，通過write_ahead 功能，把redo日誌寫入ib_logfile的快取（pageCache）*/
1624  static void log_files_write_buffer(log_t &log, byte *buffer, size_t buffer_size,
1625                                     lsn_t start_lsn) {
1626    ut_ad(log_writer_mutex_own(log));
1627  
1628    using namespace Log_files_write_impl;
1629  
1630    validate_buffer(log, buffer, buffer_size);
1631  
1632    validate_start_lsn(log, start_lsn, buffer_size);
1633  
1634    checkpoint_no_t checkpoint_no = log.next_checkpoint_no.load();
1635    /*得到當前start_lsn在檔案中的偏移，ib_logfile是迴圈使用，start_lsn 可能是ib_logfile總大小的幾倍，start_lsn的大小轉換成檔案的偏移*/
1636    const auto real_offset = compute_real_offset(log, start_lsn);
1637  
1638    bool write_from_log_buffer;
1639    /*計算本次要寫入redo日誌的大小
         當大於8192時，則截斷為8192，一次最大寫入8192
         根據情況判斷，判斷是否需要把redo日誌複製到write_ahead_buf,確定其write_from_log_buffer狀態，為true 則不需要複製，為false ，需要把buffer中的redo複製到write_from_log_buffer
         */
1640    auto write_size = compute_how_much_to_write(log, real_offset, buffer_size,
1641                                                write_from_log_buffer);
1642    /*
         start_next_file 函式解析
         當上次寫的檔案尾時，本次返回為0，則做檔案切換
         主要更新log.current_file_lsn  log.current_file_real_offset  log.current_file_end_offset 這三個變數（這三個變數），實現檔案的切換
         這三個變數僅在啟動時 和切換檔案時修改
         log.current_file_lsn  隨著 lsn的增長而增長  
         log.current_file_real_offset 是整個檔案總和的偏移量，>=2048 且 < 全部ib_logfile檔案的總大小          
         log.current_file_end_offset 當前要寫的ib_logfile 檔案的結尾
        */
1643    if (write_size == 0) {
1644      start_next_file(log, start_lsn);
1645      return;
1646    }
1647    /*如果write_size大與512，則填充block的12位元組的頭和4位元組的校驗尾
         只填充512的block，如果不足512的部分，不在此函式的處理範圍中
         */
1648    prepare_full_blocks(log, buffer, write_size, start_lsn, checkpoint_no);
1649  
1650    byte *write_buf;
1651    uint64_t written_ahead = 0;
1652    lsn_t lsn_advance = write_size;
1653    
1654    if (write_from_log_buffer) {
1655      /* We have at least one completed log block to write.
1656      We write completed blocks from the log buffer. Note,
1657      that possibly we do not write all completed blocks,
1658      because of write-ahead strategy (described earlier). */
1659      DBUG_PRINT("ib_log",
1660                 ("write from log buffer start_lsn=" LSN_PF " write_lsn=" LSN_PF
1661                  " -> " LSN_PF,
1662                  start_lsn, log.write_lsn.load(), start_lsn + lsn_advance));
1663      /*利用write_buf指標，指向buffer快取*/
1664      write_buf = buffer;
1665  
1666      LOG_SYNC_POINT("log_writer_before_write_from_log_buffer");
1667     
1668    } else {
1669      DBUG_PRINT("ib_log",
1670                 ("incomplete write start_lsn=" LSN_PF " write_lsn=" LSN_PF
1671                  " -> " LSN_PF,
1672                  start_lsn, log.write_lsn.load(), start_lsn + lsn_advance));
1673  
1674  #ifdef UNIV_DEBUG
1675      if (start_lsn == log.write_lsn.load()) {
1676        LOG_SYNC_POINT("log_writer_before_write_new_incomplete_block");
1677      }
1678      /* Else: we are doing yet another incomplete block write within the
1679      same block as the one in which we did the previous write. */
1680  #endif /* UNIV_DEBUG */
1681      /*write_from_log_buffer 為false
            利用write_buf指標，指向write_ahead快取*/
1682      write_buf = log.write_ahead_buf;
1683  
1684      /* We write all the data directly from the write-ahead buffer,
1685      where we first need to copy the data. */
          /*write_size大小的redo從buffer中複製到log.write_ahead_buf
           把log.write_ahead_buf中最後一個不足512的block 補0 
           並把write_size大小512向下對齊，如果不能被512整除，則把最後一個block補0的長度加到write_size，使其能夠與512整除           
           */
1686      copy_to_write_ahead_buffer(log, buffer, write_size, start_lsn,
1687                                 checkpoint_no);
1688      /*判斷write_ahead的虛擬空間是否完全被佔用，用1個位元組來判斷，如果1位元組個都放不下則虛擬空間用盡*/
1689      if (!current_write_ahead_enough(log, real_offset, 1)) {
            /*write_ahead 虛擬空間用盡時,需要判斷ib_logfile 當前檔案的剩餘空間（當前偏移到檔案尾的偏移）是否足夠放下當前要寫的redo日誌的大小
            返回值為write_ahead 虛擬空間剩餘空間
*/
1690        written_ahead = prepare_for_write_ahead(log, real_offset, write_size);
1691      }
1692    }
1693  
1694    srv_stats.os_log_pending_writes.inc();
1695  
1696    /* Now, we know, that we are going to write completed
1697    blocks only (originally or copied and completed). */
        /*把write_size的小的redo日誌寫入ib_logfile的快取（pageCache）*/
1698    write_blocks(log, write_buf, write_size, real_offset);
1699  
1700    LOG_SYNC_POINT("log_writer_before_lsn_update");
1701  
1702    const lsn_t old_write_lsn = log.write_lsn.load();
1703    /*lsn_advance 不是寫入pageCache的大小，補0的部分不包含在此變數中，lsn_advance 為當前寫入redo日誌的大小
     當對於連續幾次小範圍的redo日誌寫入時，lsn_advance為幾次寫的總和
     start_lsn 為512對齊，並不是上次結束的位置
     start_lsn <= old_write_lsn
*/
1704    const lsn_t new_write_lsn = start_lsn + lsn_advance;
1705    ut_a(new_write_lsn > log.write_lsn.load());
1706    /*更新log.write_lsn為當前最新的值*/
1707    log.write_lsn.store(new_write_lsn);
1708    /*通知Log write_notifier thread*/
1709    notify_about_advanced_write_lsn(log, old_write_lsn, new_write_lsn);
1710  
1711    LOG_SYNC_POINT("log_writer_before_buf_limit_update");
1712  
1713    log_update_buf_limit(log, new_write_lsn);
1714  
1715    srv_stats.os_log_pending_writes.dec();
1716    srv_stats.log_writes.inc();
1717  
1718    /* Write ahead is included in write_size. */
1719    ut_a(write_size >= written_ahead);
1720    srv_stats.os_log_written.add(write_size - written_ahead);
1721    MONITOR_INC_VALUE(MONITOR_LOG_PADDED, written_ahead);
1722  
1723    int64_t free_space = log.lsn_capacity_for_writer - log.extra_margin;
1724  
1725    /* The free space may be negative (up to -log.extra_margin), in which
1726    case we are in the emergency mode, eating the extra margin and asking
1727    to increase concurrency_margin. */
1728    free_space -= new_write_lsn - log.last_checkpoint_lsn.load();
1729  
1730    MONITOR_SET(MONITOR_LOG_FREE_SPACE, free_space);
1731  
1732    log.n_log_ios++;
1733    /*判斷是否更新 log.write_ahead_end_offset
          本地虛擬空間用盡時不更新此變數
          當本地虛擬空間用盡後，第一次寫入時，更新log.write_ahead_end_offset，即增加8192 
         */
1734    update_current_write_ahead(log, real_offset, write_size);
1735  }

程式碼詳解-compute_how_much_to_write

簡單介紹下compute_real_offset此函，雖然短小，但很經典,主要計算出當前lsn在整個檔案的絕對偏移量（real_offset)，lsn 對映成應寫在ib_logfile檔案中的位置。

為寫入
在這裡插入圖片描述為寫入redo日誌的大小，但是start_lsn - log.current_file_lsn 不等於圖示的大小，而是一個相對大小

在這裡插入圖片描述
log0write.cc

1223  static inline uint64_t compute_real_offset(const log_t &log, lsn_t start_lsn) {
        /*start_lsn 當前開始寫的lsn的起始值
          log.current_file_lsn 在啟動或切換檔案後確定，相對於lsn的一個固定位置
          log.current_file_real_offset 在啟動或切換檔案後確定 ，相對於檔案大小的偏移量
 */
1228    const auto real_offset =
1229        log.current_file_real_offset + (start_lsn - log.current_file_lsn);
1230  
1239  
1240    return (real_offset);
1241  }

compute_how_much_to_write 函式

log0write.cc

/*
  計算本次可以寫redo日誌的大小
  real_offset 在檔案的絕對偏移量
  buffer_size 需要寫入redo日誌的大小  
  write_from_log_buffer 返回引數 
     true  buffer中的redo日誌不需要複製到write_ahead_buf中
     false buffer中的redo日誌需要複製到write_ahead_buf中 
*/
1310   static inline size_t compute_how_much_to_write(const log_t &log,
1311                                                  uint64_t real_offset,
1312                                                  size_t buffer_size,
1313                                                  bool &write_from_log_buffer) {
1314     size_t write_size;
1315   
1316     /* First we ensure, that we will write within single log file.
1317     If we had more to write and cannot fit the current log file,
1318     we first write what fits, then stops and returns to the main
1319     loop of the log writer thread. Then, the log writer will update
1320     maximum lsn up to which, it has data ready in the log buffer,
1321     and request next write operation according to its strategy. */
         /*當前的檔案的偏移量(real_offset)到檔案尾的空間大小是否滿足 buffer_size的大小*/
1322     if (!current_file_has_space(log, real_offset, buffer_size)) {
1323       /* The end of write would not fit the current log file. */
1324   
1325       /* But the beginning is guaranteed to fit or to be placed
1326       at the first byte of the next file. */
1327       ut_a(current_file_has_space(log, real_offset, 0));
1328       /*當前的檔案的偏移量(real_offset)到檔案尾的空間大小是否0 判斷上次是否已經把空間全部寫滿，如果寫滿則返回0 準備切換檔案*/
1329       if (!current_file_has_space(log, real_offset, 1)) {
1330         /* The beginning of write is at the first byte
1331         of the next log file. Flush header of the next
1332         log file, advance current log file to the next,
1333         stop and return to the main loop of log writer. */
1334         write_from_log_buffer = false;
1335         return (0);
1336   
1337       } else {
1338         /* We write across at least two consecutive log files.
1339         Limit current write to the first one and then retry for
1340         next_file. */
1341   
1342         /* If the condition for real_offset + buffer_size holds,
1343         then the expression below is < buffer_size, which is
1344         size_t, so the typecast is ok. */
             /*buffer_size大於當前的剩餘空間，則只能寫剩餘空間的大小，下次則走一個分支，切換檔案
             write_size 為剩餘空間的大小
             */
1345         write_size =
1346             static_cast<size_t>(log.current_file_end_offset - real_offset);
1347   
1348         ut_a(write_size <= buffer_size);
1349         ut_a(write_size % OS_FILE_LOG_BLOCK_SIZE == 0);
1350       }
1351   
1352     } else {
          /*如果空間足夠大，則直接賦值*/
1353       write_size = buffer_size;
1354   
1355       ut_a(write_size % OS_FILE_LOG_BLOCK_SIZE >= LOG_BLOCK_HDR_SIZE ||
1356            write_size % OS_FILE_LOG_BLOCK_SIZE == 0);
1357   
1358       ut_a(write_size % OS_FILE_LOG_BLOCK_SIZE <
1359            OS_FILE_LOG_BLOCK_SIZE - LOG_BLOCK_TRL_SIZE);
1360     }
1361   
1362     /* Now, we know we can write write_size bytes from the buffer,
1363     and we will do the write within single log file - current one. */
1364   
1365     ut_a(write_size > 0);
1366     ut_a(real_offset >= log.current_file_real_offset);
1367     ut_a(real_offset + write_size <= log.current_file_end_offset);
1368     ut_a(log.current_file_real_offset / log.file_size + 1 ==
1369          log.current_file_end_offset / log.file_size);
1370   
1371     /* We are interested in writing from log buffer only,
1372     if we had at least one completed block for write.
1373     Still we might decide not to write from the log buffer,
1374     because write-ahead is needed. In such case we could write
1375     together with the last incomplete block after copying. */
1376     write_from_log_buffer = write_size >= OS_FILE_LOG_BLOCK_SIZE;
1377   
1378     if (write_from_log_buffer) {
1379       MONITOR_INC(MONITOR_LOG_FULL_BLOCK_WRITES);
1380     } else {
1381       MONITOR_INC(MONITOR_LOG_PARTIAL_BLOCK_WRITES);
1382     }
1383   
1384     /* Check how much we have written ahead to avoid read-on-write. */
1385     /*當前的檔案的偏移量(real_offset)到write_ahead尾的空間大小是否滿足buffer_size的大小*/
1386     if (!current_write_ahead_enough(log, real_offset, write_size)) {
           /*當前的檔案的偏移量(real_offset)到write_ahead尾的空間大小是否0 判斷上次是否已經把空間全部寫滿，如果寫滿 則下次更新 write_ahead_end_offset*/
1387       if (!current_write_ahead_enough(log, real_offset, 1)) {
1388         /* Current write-ahead region has no space at all. */
1389         /*說明上次寫已經到write_ahead尾，則根據real_offset起始，計算一個8192空間大小的write_ahead尾值next_wa*/
1390         const auto next_wa = compute_next_write_ahead_end(real_offset);
1391         /*判斷新計算的write_ahead尾值 滿足 write_size的大小*/
1392         if (!write_ahead_enough(next_wa, real_offset, write_size)) {
1393           /* ... and also the next write-ahead is too small.
1394           Therefore we have more data to write than size of
1395           the write-ahead. We write from the log buffer,
1396           skipping last fragment for which the write ahead
1397           is required. */
1398   
1399           ut_a(write_from_log_buffer);
1400           /*上一次用盡，新計算的尾值也不能滿足，則write_size 已經大於write-ahead的總量，一般大於8192，如果寫入的大小大於8192 ，則只能寫8192，且不需要要把buffer中的redo日誌複製到write-ahead，而是直接寫到pageCache
*/   
1401           write_size = next_wa - real_offset;
1402   
1403           ut_a((real_offset + write_size) % srv_log_write_ahead_size == 0);
1404   
1405           ut_a(write_size % OS_FILE_LOG_BLOCK_SIZE == 0);
1406   
1407         } else {
1408           /* We copy data to write_ahead buffer,
1409           and write from there doing write-ahead
1410           of the bigger region in the same time. */
               /*
               當write-ahead上次用盡後，則需要新開闢一個8192的pageCache，
               當第一寫不滿足8192時，則把當前要寫入的redo日誌複製到write-ahead,剩餘空間補0，
               8192個位元組一次寫入redo日誌，形成一個8192大小的pageCache
               此處為整個pageCache利用的精華，有畫龍點睛的意思
               */
1411           write_from_log_buffer = false;
1412         }
1413   
1414       } else {
1415         /* We limit write up to the end of region
1416         we have written ahead already. */
             /*pageCache 剩餘空間不足要寫入write_size的大小，則重新計算write_head大小此次把pageCache用盡，下次則開闢一個塊新的空間*/
1417         write_size =
1418             static_cast<size_t>(log.write_ahead_end_offset - real_offset);
1419   
1420         ut_a(write_size >= OS_FILE_LOG_BLOCK_SIZE);
1421         ut_a(write_size % OS_FILE_LOG_BLOCK_SIZE == 0);
1422       }
1423   
1424     } else {
           
1425       if (write_from_log_buffer) {
           /*此處完全執行需要滿足幾個條件
           1 PageCache的空間足夠大
           2 不是第一次寫
           3 大於等於512 
           執行的結果為512的整數倍，不足512的丟棄，並得到實際寫入大小
          */
1426         write_size = ut_uint64_align_down(write_size, OS_FILE_LOG_BLOCK_SIZE);
1427       }
1428     }
1429     /*通過複雜的計算，返回寫pageCache大小*/
1430     return (write_size);
1431   }

程式碼詳解 copy_to_write_ahead_buffer

/*
  此函式實現兩個功能
  把buffer中的redo日誌複製到write-ahead 中
  把不足512位元組的redo日誌補0 ，並計算填充block的頭和尾
*/
1518   static inline void copy_to_write_ahead_buffer(log_t &log, const byte *buffer,
1519                                                 size_t &size, lsn_t start_lsn,
1520                                                 checkpoint_no_t checkpoint_no) {
1521     ut_a(size <= srv_log_write_ahead_size);
1522   
1523     ut_a(buffer >= log.buf);
1524     ut_a(buffer + size <= log.buf + log.buf_size);
1525   
1526     byte *write_buf = log.write_ahead_buf;
1527   
1528     LOG_SYNC_POINT("log_writer_before_copy_to_write_ahead_buffer");
1529     /*複製功能*/
1530     std::memcpy(write_buf, buffer, size);
1531   
1532     size_t completed_blocks_size;
1533     byte *incomplete_block;
1534     size_t incomplete_size;
1535     /*寫入大小與512向下對齊，得到512的整數*/
1536     completed_blocks_size = ut_uint64_align_down(size, OS_FILE_LOG_BLOCK_SIZE);
1537     /*log.write_ahead_buf redo日誌大小 512整數倍的偏移量，之後則為不足512的redo日誌，需要特殊處理*/
1538     incomplete_block = write_buf + completed_blocks_size;
1539     /*不足512的大小*/
1540     incomplete_size = size % OS_FILE_LOG_BLOCK_SIZE;
1541   
1542     ut_a(incomplete_block + incomplete_size <=
1543          write_buf + srv_log_write_ahead_size);
1544     /*存在不足512的部分*/
1545     if (incomplete_size != 0) {
1546       /* Prepare the incomplete (last) block. */
1547       ut_a(incomplete_size >= LOG_BLOCK_HDR_SIZE);
1548       /*設定當前lsn 為block塊的序號*/
1549       log_block_set_hdr_no(
1550           incomplete_block,
1551           log_block_convert_lsn_to_no(start_lsn + completed_blocks_size));
1552       /*記錄一次寫不足512的情況*/
1553       log_block_set_flush_bit(incomplete_block, completed_blocks_size == 0);
1554       /*記錄寫入塊的實際redo日誌的大小*/
1555       log_block_set_data_len(incomplete_block, incomplete_size);
1556   
1557       if (log_block_get_first_rec_group(incomplete_block) > incomplete_size) {
1558         log_block_set_first_rec_group(incomplete_block, 0);
1559       }
1560       /*記錄當前checkpoint_no的序號*/
1561       log_block_set_checkpoint_no(incomplete_block, checkpoint_no);
1562        /*不足512的剩餘部分填充0*/
1563       std::memset(incomplete_block + incomplete_size, 0x00,
1564                   OS_FILE_LOG_BLOCK_SIZE - incomplete_size);
1565       /*計算尾部校驗數*/
1566       log_block_store_checksum(incomplete_block);
1567       /*返回512的整數倍，保證每次寫都是512的倍數，如果不足512，也需要寫入512位元組*/
1568       size = completed_blocks_size + OS_FILE_LOG_BLOCK_SIZE;
1569     }
1570   
1571     /* Since now, size is about completed blocks always. */
1572     ut_a(size % OS_FILE_LOG_BLOCK_SIZE == 0);
1573   }

程式碼解讀-

/*
  此處的判斷及實現也是非常的經典
  當上次把write_ahead 的虛擬空間或者pageCache 寫滿後的補充處理
  實現兩個功能
  1 如果恰好在ib_logfile 尾部不足一個8192的大小（恰有這樣的情況產生，檔案的大小減去2048不是8192的倍數，檔案大小可以配置，產生不一樣的情況，會有很大機率可能出現），則只能使用剩餘空間的pageCache的大小
  2 本次要寫入8192位元組大小，剩餘的空間需要填充0 
  此處處理與函式compute_how_much_to_write中的
  if (!current_write_ahead_enough(log, real_offset, 1)){  
    ...    
  }對應,是對此處的完美補充
  
*/
1689 if (!current_write_ahead_enough(log, real_offset, 1)) {
1690      written_ahead = prepare_for_write_ahead(log, real_offset, write_size);
1691 }
    
1575  static inline size_t prepare_for_write_ahead(log_t &log, uint64_t real_offset,
1576                                               size_t &write_size) {
1577    /* We need to perform write ahead during this write. */
1578   /*得到下一個write-ahead尾偏移量*/
1579    const auto next_wa = compute_next_write_ahead_end(real_offset);
1580  
1581    ut_a(real_offset + write_size <= next_wa);
1582    /*此write_ahead 8192位元組大小中還未使用的部分*/
1583    size_t write_ahead =
1584        static_cast<size_t>(next_wa - (real_offset + write_size));
1585    /*判斷當前real_offset 到 ib_logfile 尾部的剩餘空間大小是否能寫下一個完整的8192的空間的大小，即最後一個pageCache 不一定是8192位元組的大小，在當前ib_logfile檔案馬上要寫滿時會出現 */
1586    if (!current_file_has_space(log, real_offset, write_size + write_ahead)) {
1587      /* We must not write further than to the end
1588      of the current log file.
1589  
1590      Note, that: log.file_size - LOG_FILE_HDR_SIZE
1591      does not have to be divisible by size of write
1592      ahead. Example given:
1593              innodb_log_file_size = 1024M,
1594              innodb_log_write_ahead_size = 4KiB,
1595              LOG_FILE_HDR_SIZE is 2KiB. */
1596      /*
            當ib_logfile的最後一個pageCache時，計算出剩餘空間
            雖然一個8192位元組的pageCache 放不下，但是要寫redo日誌的大小肯定能放得下，
            compute_how_much_to_write 此函式已提前處理
           */
1597      write_ahead = static_cast<size_t>(log.current_file_end_offset -
1598                                        real_offset - write_size);
1599    }
1600  
1601    ut_a(current_file_has_space(log, real_offset, write_size + write_ahead));
1602  
1603    LOG_SYNC_POINT("log_writer_before_write_ahead");
1604    /*剩餘空間填充0*/
1605    std::memset(log.write_ahead_buf + write_size, 0x00, write_ahead);
1606    /*得到pageCache的大小，大多數情況為8192 */
1607    write_size += write_ahead;
1608    
1609    return (write_ahead);
1610  }

程式碼解析- update_current_write_ahead

/*
  write-ahead 的pageCache的收官之作
  當第一次使用write-ahead，把write-ahead尾部的偏移量write_ahead_end_offset更新為最新的偏移的量，可以理解為加8192（除ib_logfile尾的特殊處理）
*/
1612  static inline void update_current_write_ahead(log_t &log, uint64_t real_offset,
1613                                                size_t write_size) {
1614    const auto end = real_offset + write_size;
1615  
1616    if (end > log.write_ahead_end_offset) {
1617      log.write_ahead_end_offset =
1618          ut_uint64_align_down(end, srv_log_write_ahead_size);
1619    }
1620  }
1621  
1622  }  // namespace Log_files_write_impl

mysql8.0原始碼解析 InnoDB redo log日誌寫 write ahead 巧用pageCache實現高效寫

pageCache背景

write ahead 基本原理

redo日誌大小不同情況的說明

write ahead剩餘空間大小不同情況的說明

對不同情況的畫圖描述

剩餘為0 寫入大於8192

上次剩餘為0，連續兩次512位元組或者

連續幾次寫入的大小之和小於512

程式碼詳解-log_files_write_buffer

程式碼詳解-compute_how_much_to_write

程式碼詳解 copy_to_write_ahead_buffer

程式碼解讀-

程式碼解析- update_current_write_ahead

mysql8.0原始碼解析 InnoDB redo log日誌寫 write ahead 巧用pageCache實現高效寫

MySQL 8.0原始碼學習日記——redo log的一生

pytorch（三） PyTorch 1.1.0 原始碼解析--執行機制

MySQL InnoDB redo Log 淺析

談談傳說中的redo log是什麼？有啥用？

InnoDB事務日誌（redo log 和 undo log）詳解

原始碼 | 解析 Redo Log 實現方式

CentOS7.4 原始碼安裝MySQL8.0的教程詳解

原始碼編譯安裝MySQL8.0.20的詳細教程

myBatis原始碼解析-日誌篇（1）

必須瞭解的mysql三大日誌-binlog、redo log和undo log

解析MySQL8.0新特性——事務性資料字典與原子DDL

詳解MySQL 重做日誌（redo log）與回滾日誌（undo logo）

MySQL重做日誌（redo log）

Mysql三大日誌bin log、redo log和undo log

MYSQL中的重要日誌模組REDO LOG和BINLOG

MYSQL三大日誌-binlog、redo log、undo log

mysql8.0 innodb 儲存引擎介紹加倆類索引方法 btree hash ，三類索引型別 Normal Unique Full Text 業務應用中選擇思路，及官方各類儲存引擎對服務支援情況

MySQL Innodb Engine--修改資料時先寫Buffer Pool還是先寫Redo Log

MySQL8.0使用mysqlsh配置主從複製 InnoDB ReplicaSet

mysql8.0原始碼解析 InnoDB redo log日誌 寫 write ahead 巧用pageCache實現高效寫

pageCache背景

write ahead 基本原理

redo日誌大小不同情況的說明

write ahead剩餘空間大小不同情況的說明

對不同情況的畫圖描述

剩餘為0 寫入大於8192

上次剩餘為0，連續兩次512位元組或者

連續幾次寫入的大小之和小於512

程式碼詳解-log_files_write_buffer

程式碼詳解-compute_how_much_to_write

程式碼詳解 copy_to_write_ahead_buffer

程式碼解讀-

程式碼解析- update_current_write_ahead

相關推薦

mysql8.0原始碼解析 InnoDB redo log日誌寫 write ahead 巧用pageCache實現高效寫