linux內核數據結構之kfifo

阿新 • • 發佈：2018-02-25

wiki 概述 ext 出隊簡潔而且 bubuko pow memcpy

1、前言

　　最近項目中用到一個環形緩沖區（ring buffer），代碼是由linux內核的kfifo改過來的。緩沖區在文件系統中經常用到，通過緩沖區緩解cpu讀寫內存和讀寫磁盤的速度。例如一個進程A產生數據發給另外一個進程B，進程B需要對進程A傳的數據進行處理並寫入文件，如果B沒有處理完，則A要延遲發送。為了保證進程A減少等待時間，可以在A和B之間采用一個緩沖區，A每次將數據存放在緩沖區中，B每次沖緩沖區中取。這是典型的生產者和消費者模型，緩沖區中數據滿足FIFO特性，因此可以采用隊列進行實現。Linux內核的kfifo正好是一個環形隊列，可以用來當作環形緩沖區。生產者與消費者使用緩沖區如下圖所示：

技術分享圖片

　　環形緩沖區的詳細介紹及實現方法可以參考http://en.wikipedia.org/wiki/Circular_buffer，介紹的非常詳細，列舉了實現環形隊列的幾種方法。環形隊列的不便之處在於如何判斷隊列是空還是滿。維基百科上給三種實現方法。

2、linux 內核kfifo

　　kfifo設計的非常巧妙，代碼很精簡，對於入隊和出對處理的出人意料。首先看一下kfifo的數據結構：

struct kfifo {
    unsigned char *buffer;     /* the buffer holding the data */
    unsigned int size;         /* the size of the allocated buffer */
    unsigned int in;           /* data is added at offset (in % size) */
    unsigned int out;          /* data is extracted from off. (out % size) */
    spinlock_t *lock;          /* protects concurrent modifications */
};

kfifo提供的方法有：

 1 //根據給定buffer創建一個kfifo
 2 struct kfifo *kfifo_init(unsigned char *buffer, unsigned int size,
 3                 gfp_t gfp_mask, spinlock_t *lock);
 4 //給定size分配buffer和kfifo
 5 struct kfifo *kfifo_alloc(unsigned int size, gfp_t gfp_mask,
 6                  spinlock_t *lock);
 7 //釋放kfifo空間
 8 void kfifo_free(struct kfifo *fifo)
 9 //向kfifo中添加數據
10 unsigned int kfifo_put(struct kfifo *fifo,
11                 const unsigned char *buffer, unsigned int len)
12 //從kfifo中取數據
13 unsigned int kfifo_put(struct kfifo *fifo,
14                 const unsigned char *buffer, unsigned int len)
15 //獲取kfifo中有數據的buffer大小
16 unsigned int kfifo_len(struct kfifo *fifo)

定義自旋鎖的目的為了防止多進程/線程並發使用kfifo。因為in和out在每次get和out時，發生改變。初始化和創建kfifo的源代碼如下：

 1 struct kfifo *kfifo_init(unsigned char *buffer, unsigned int size,
 2              gfp_t gfp_mask, spinlock_t *lock)
 3 {
 4     struct kfifo *fifo;
 6     /* size must be a power of 2 */
 7     BUG_ON(!is_power_of_2(size));
 9     fifo = kmalloc(sizeof(struct kfifo), gfp_mask);
10     if (!fifo)
11         return ERR_PTR(-ENOMEM);
13     fifo->buffer = buffer;
14     fifo->size = size;
15     fifo->in = fifo->out = 0;
16     fifo->lock = lock;
17 
18     return fifo;
19 }
20 struct kfifo *kfifo_alloc(unsigned int size, gfp_t gfp_mask, spinlock_t *lock)
21 {
22     unsigned char *buffer;
23     struct kfifo *ret;
29     if (!is_power_of_2(size)) {
30         BUG_ON(size > 0x80000000);
31         size = roundup_pow_of_two(size);
32     }
34     buffer = kmalloc(size, gfp_mask);
35     if (!buffer)
36         return ERR_PTR(-ENOMEM);
38     ret = kfifo_init(buffer, size, gfp_mask, lock);
39 
40     if (IS_ERR(ret))
41         kfree(buffer);
43     return ret;
44 }

　　在kfifo_init和kfifo_calloc中，kfifo->size的值總是在調用者傳進來的size參數的基礎上向2的冪擴展，這是內核一貫的做法。這樣的好處不言而喻--對kfifo->size取模運算可以轉化為與運算，如：kfifo->in % kfifo->size 可以轉化為 kfifo->in & (kfifo->size – 1)

kfifo的巧妙之處在於in和out定義為無符號類型，在put和get時，in和out都是增加，當達到最大值時，產生溢出，使得從0開始，進行循環使用。put和get代碼如下所示：

 1 static inline unsigned int kfifo_put(struct kfifo *fifo,
 2                 const unsigned char *buffer, unsigned int len)
 3 {
 4     unsigned long flags;
 5     unsigned int ret;
 6     spin_lock_irqsave(fifo->lock, flags);
 7     ret = __kfifo_put(fifo, buffer, len);
 8     spin_unlock_irqrestore(fifo->lock, flags);
 9     return ret;
10 }
11 
12 static inline unsigned int kfifo_get(struct kfifo *fifo,
13                      unsigned char *buffer, unsigned int len)
14 {
15     unsigned long flags;
16     unsigned int ret;
17     spin_lock_irqsave(fifo->lock, flags);
18     ret = __kfifo_get(fifo, buffer, len);
19         //當fifo->in == fifo->out時，buufer為空
20     if (fifo->in == fifo->out)
21         fifo->in = fifo->out = 0;
22     spin_unlock_irqrestore(fifo->lock, flags);
23     return ret;
24 }
25 
26 
27 unsigned int __kfifo_put(struct kfifo *fifo,
28             const unsigned char *buffer, unsigned int len)
29 {
30     unsigned int l;
31        //buffer中空的長度
32     len = min(len, fifo->size - fifo->in + fifo->out);
34     /*
35      * Ensure that we sample the fifo->out index -before- we
36      * start putting bytes into the kfifo.
37      */
39     smp_mb();
41     /* first put the data starting from fifo->in to buffer end */
42     l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
43     memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);
45     /* then put the rest (if any) at the beginning of the buffer */
46     memcpy(fifo->buffer, buffer + l, len - l);
47 
48     /*
49      * Ensure that we add the bytes to the kfifo -before-
50      * we update the fifo->in index.
51      */
53     smp_wmb();
55     fifo->in += len;  //每次累加，到達最大值後溢出，自動轉為0
57     return len;
58 }
59 
60 unsigned int __kfifo_get(struct kfifo *fifo,
61              unsigned char *buffer, unsigned int len)
62 {
63     unsigned int l;
64         //有數據的緩沖區的長度
65     len = min(len, fifo->in - fifo->out);
67     /*
68      * Ensure that we sample the fifo->in index -before- we
69      * start removing bytes from the kfifo.
70      */
72     smp_rmb();
74     /* first get the data from fifo->out until the end of the buffer */
75     l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
76     memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);
78     /* then get the rest (if any) from the beginning of the buffer */
79     memcpy(buffer + l, fifo->buffer, len - l);
81     /*
82      * Ensure that we remove the bytes from the kfifo -before-
83      * we update the fifo->out index.
84      */
86     smp_mb();
88     fifo->out += len; //每次累加，到達最大值後溢出，自動轉為0
90     return len;
91 }

　　put和get在調用__put和__get過程都進行加鎖，防止並發。從代碼中可以看出put和get都調用兩次memcpy，這針對的是邊界條件。例如下圖：藍色表示空閑，紅色表示占用。

（1）空的kfifo，

技術分享圖片

（2）put一個buffer後

技術分享圖片

（3）get一個buffer後

技術分享圖片

（4）當此時put的buffer長度超出in到末尾長度時，則將剩下的移到頭部去

技術分享圖片

3、測試程序

仿照kfifo編寫一個ring_buffer，現有線程互斥量進行並發控制。設計的ring_buffer如下所示：

  1 /**@brief 仿照linux kfifo寫的ring buffer
  2  *@atuher Anker  date:2013-12-18
  3 * ring_buffer.h
  4  * */
  5 
  6 #ifndef KFIFO_HEADER_H 
  7 #define KFIFO_HEADER_H
  8 
  9 #include <inttypes.h>
 10 #include <string.h>
 11 #include <stdlib.h>
 12 #include <stdio.h>
 13 #include <errno.h>
 14 #include <assert.h>
 15 
 16 //判斷x是否是2的次方
 17 #define is_power_of_2(x) ((x) != 0 && (((x) & ((x) - 1)) == 0))
 18 //取a和b中最小值
 19 #define min(a, b) (((a) < (b)) ? (a) : (b))
 20 
 21 struct ring_buffer
 22 {
 23     void         *buffer;     //緩沖區
 24     uint32_t     size;       //大小
 25     uint32_t     in;         //入口位置
 26     uint32_t       out;        //出口位置
 27     pthread_mutex_t *f_lock;    //互斥鎖
 28 };
 29 //初始化緩沖區
 30 struct ring_buffer* ring_buffer_init(void *buffer, uint32_t size, pthread_mutex_t *f_lock)
 31 {
 32     assert(buffer);
 33     struct ring_buffer *ring_buf = NULL;
 34     if (!is_power_of_2(size))
 35     {
 36     fprintf(stderr,"size must be power of 2.\n");
 37         return ring_buf;
 38     }
 39     ring_buf = (struct ring_buffer *)malloc(sizeof(struct ring_buffer));
 40     if (!ring_buf)
 41     {
 42         fprintf(stderr,"Failed to malloc memory,errno:%u,reason:%s",
 43             errno, strerror(errno));
 44         return ring_buf;
 45     }
 46     memset(ring_buf, 0, sizeof(struct ring_buffer));
 47     ring_buf->buffer = buffer;
 48     ring_buf->size = size;
 49     ring_buf->in = 0;
 50     ring_buf->out = 0;
 51         ring_buf->f_lock = f_lock;
 52     return ring_buf;
 53 }
 54 //釋放緩沖區
 55 void ring_buffer_free(struct ring_buffer *ring_buf)
 56 {
 57     if (ring_buf)
 58     {
 59     if (ring_buf->buffer)
 60     {
 61         free(ring_buf->buffer);
 62         ring_buf->buffer = NULL;
 63     }
 64     free(ring_buf);
 65     ring_buf = NULL;
 66     }
 67 }
 68 
 69 //緩沖區的長度
 70 uint32_t __ring_buffer_len(const struct ring_buffer *ring_buf)
 71 {
 72     return (ring_buf->in - ring_buf->out);
 73 }
 74 
 75 //從緩沖區中取數據
 76 uint32_t __ring_buffer_get(struct ring_buffer *ring_buf, void * buffer, uint32_t size)
 77 {
 78     assert(ring_buf || buffer);
 79     uint32_t len = 0;
 80     size  = min(size, ring_buf->in - ring_buf->out);        
 81     /* first get the data from fifo->out until the end of the buffer */
 82     len = min(size, ring_buf->size - (ring_buf->out & (ring_buf->size - 1)));
 83     memcpy(buffer, ring_buf->buffer + (ring_buf->out & (ring_buf->size - 1)), len);
 84     /* then get the rest (if any) from the beginning of the buffer */
 85     memcpy(buffer + len, ring_buf->buffer, size - len);
 86     ring_buf->out += size;
 87     return size;
 88 }
 89 //向緩沖區中存放數據
 90 uint32_t __ring_buffer_put(struct ring_buffer *ring_buf, void *buffer, uint32_t size)
 91 {
 92     assert(ring_buf || buffer);
 93     uint32_t len = 0;
 94     size = min(size, ring_buf->size - ring_buf->in + ring_buf->out);
 95     /* first put the data starting from fifo->in to buffer end */
 96     len  = min(size, ring_buf->size - (ring_buf->in & (ring_buf->size - 1)));
 97     memcpy(ring_buf->buffer + (ring_buf->in & (ring_buf->size - 1)), buffer, len);
 98     /* then put the rest (if any) at the beginning of the buffer */
 99     memcpy(ring_buf->buffer, buffer + len, size - len);
100     ring_buf->in += size;
101     return size;
102 }
103 
104 uint32_t ring_buffer_len(const struct ring_buffer *ring_buf)
105 {
106     uint32_t len = 0;
107     pthread_mutex_lock(ring_buf->f_lock);
108     len = __ring_buffer_len(ring_buf);
109     pthread_mutex_unlock(ring_buf->f_lock);
110     return len;
111 }
112 
113 uint32_t ring_buffer_get(struct ring_buffer *ring_buf, void *buffer, uint32_t size)
114 {
115     uint32_t ret;
116     pthread_mutex_lock(ring_buf->f_lock);
117     ret = __ring_buffer_get(ring_buf, buffer, size);
118     //buffer中沒有數據
119     if (ring_buf->in == ring_buf->out)
120     ring_buf->in = ring_buf->out = 0;
121     pthread_mutex_unlock(ring_buf->f_lock);
122     return ret;
123 }
124 
125 uint32_t ring_buffer_put(struct ring_buffer *ring_buf, void *buffer, uint32_t size)
126 {
127     uint32_t ret;
128     pthread_mutex_lock(ring_buf->f_lock);
129     ret = __ring_buffer_put(ring_buf, buffer, size);
130     pthread_mutex_unlock(ring_buf->f_lock);
131     return ret;
132 }
133 #endif

采用多線程模擬生產者和消費者編寫測試程序，如下所示：

  1 /**@brief ring buffer測試程序，創建兩個線程，一個生產者，一個消費者。
  2  * 生產者每隔1秒向buffer中投入數據，消費者每隔2秒去取數據。
  3  *@atuher Anker  date:2013-12-18
  4  * */
  5 #include "ring_buffer.h"
  6 #include <pthread.h>
  7 #include <time.h>
  8 
  9 #define BUFFER_SIZE  1024 * 1024
 10 
 11 typedef struct student_info
 12 {
 13     uint64_t stu_id;
 14     uint32_t age;
 15     uint32_t score;
 16 }student_info;
 17 
 18 
 19 void print_student_info(const student_info *stu_info)
 20 {
 21     assert(stu_info);
 22     printf("id:%lu\t",stu_info->stu_id);
 23     printf("age:%u\t",stu_info->age);
 24     printf("score:%u\n",stu_info->score);
 25 }
 26 
 27 student_info * get_student_info(time_t timer)
 28 {
 29     student_info *stu_info = (student_info *)malloc(sizeof(student_info));
 30     if (!stu_info)
 31     {
 32     fprintf(stderr, "Failed to malloc memory.\n");
 33     return NULL;
 34     }
 35     srand(timer);
 36     stu_info->stu_id = 10000 + rand() % 9999;
 37     stu_info->age = rand() % 30;
 38     stu_info->score = rand() % 101;
 39     print_student_info(stu_info);
 40     return stu_info;
 41 }
 42 
 43 void * consumer_proc(void *arg)
 44 {
 45     struct ring_buffer *ring_buf = (struct ring_buffer *)arg;
 46     student_info stu_info; 
 47     while(1)
 48     {
 49     sleep(2);
 50     printf("------------------------------------------\n");
 51     printf("get a student info from ring buffer.\n");
 52     ring_buffer_get(ring_buf, (void *)&stu_info, sizeof(student_info));
 53     printf("ring buffer length: %u\n", ring_buffer_len(ring_buf));
 54     print_student_info(&stu_info);
 55     printf("------------------------------------------\n");
 56     }
 57     return (void *)ring_buf;
 58 }
 59 
 60 void * producer_proc(void *arg)
 61 {
 62     time_t cur_time;
 63     struct ring_buffer *ring_buf = (struct ring_buffer *)arg;
 64     while(1)
 65     {
 66     time(&cur_time);
 67     srand(cur_time);
 68     int seed = rand() % 11111;
 69     printf("******************************************\n");
 70     student_info *stu_info = get_student_info(cur_time + seed);
 71     printf("put a student info to ring buffer.\n");
 72     ring_buffer_put(ring_buf, (void *)stu_info, sizeof(student_info));
 73     printf("ring buffer length: %u\n", ring_buffer_len(ring_buf));
 74     printf("******************************************\n");
 75     sleep(1);
 76     }
 77     return (void *)ring_buf;
 78 }
 79 
 80 int consumer_thread(void *arg)
 81 {
 82     int err;
 83     pthread_t tid;
 84     err = pthread_create(&tid, NULL, consumer_proc, arg);
 85     if (err != 0)
 86     {
 87     fprintf(stderr, "Failed to create consumer thread.errno:%u, reason:%s\n",
 88         errno, strerror(errno));
 89     return -1;
 90     }
 91     return tid;
 92 }
 93 int producer_thread(void *arg)
 94 {
 95     int err;
 96     pthread_t tid;
 97     err = pthread_create(&tid, NULL, producer_proc, arg);
 98     if (err != 0)
 99     {
100     fprintf(stderr, "Failed to create consumer thread.errno:%u, reason:%s\n",
101         errno, strerror(errno));
102     return -1;
103     }
104     return tid;
105 }
106 
107 
108 int main()
109 {
110     void * buffer = NULL;
111     uint32_t size = 0;
112     struct ring_buffer *ring_buf = NULL;
113     pthread_t consume_pid, produce_pid;
114 
115     pthread_mutex_t *f_lock = (pthread_mutex_t *)malloc(sizeof(pthread_mutex_t));
116     if (pthread_mutex_init(f_lock, NULL) != 0)
117     {
118     fprintf(stderr, "Failed init mutex,errno:%u,reason:%s\n",
119         errno, strerror(errno));
120     return -1;
121     }
122     buffer = (void *)malloc(BUFFER_SIZE);
123     if (!buffer)
124     {
125     fprintf(stderr, "Failed to malloc memory.\n");
126     return -1;
127     }
128     size = BUFFER_SIZE;
129     ring_buf = ring_buffer_init(buffer, size, f_lock);
130     if (!ring_buf)
131     {
132     fprintf(stderr, "Failed to init ring buffer.\n");
133     return -1;
134     }
135 #if 0
136     student_info *stu_info = get_student_info(638946124);
137     ring_buffer_put(ring_buf, (void *)stu_info, sizeof(student_info));
138     stu_info = get_student_info(976686464);
139     ring_buffer_put(ring_buf, (void *)stu_info, sizeof(student_info));
140     ring_buffer_get(ring_buf, (void *)stu_info, sizeof(student_info));
141     print_student_info(stu_info);
142 #endif
143     printf("multi thread test.......\n");
144     produce_pid  = producer_thread((void*)ring_buf);
145     consume_pid  = consumer_thread((void*)ring_buf);
146     pthread_join(produce_pid, NULL);
147     pthread_join(consume_pid, NULL);
148     ring_buffer_free(ring_buf);
149     free(f_lock);
150     return 0;
151 }

測試結果如下所示：

技術分享圖片

4、參考資料

http://blog.csdn.net/linyt/article/details/5764312

http://en.wikipedia.org/wiki/Circular_buffer

巧奪天工的kfifo

Linux kernel裏面從來就不缺少簡潔，優雅和高效的代碼，只是我們缺少發現和品味的眼光。在Linux kernel裏面，簡潔並不表示代碼使用神出鬼沒的超然技巧，相反，它使用的不過是大家非常熟悉的基礎數據結構，但是kernel開發者能從基礎的數據結構中，提煉出優美的特性。
kfifo就是這樣的一類優美代碼，它十分簡潔，絕無多余的一行代碼，卻非常高效。
關於kfifo信息如下：

本文分析的原代碼版本： 2.6.24.4

kfifo的定義文件： kernel/kfifo.c

kfifo的頭文件： include/linux/kfifo.h

kfifo概述

kfifo是內核裏面的一個First In First Out數據結構，它采用環形循環隊列的數據結構來實現；它提供一個無邊界的字節流服務，最重要的一點是，它使用並行無鎖編程技術，即當它用於只有一個入隊線程和一個出隊線程的場情時，兩個線程可以並發操作，而不需要任何加鎖行為，就可以保證kfifo的線程安全。
kfifo代碼既然肩負著這麽多特性，那我們先一敝它的代碼：

struct kfifo {
    unsigned char *buffer;    /* the buffer holding the data */
    unsigned int size;    /* the size of the allocated buffer */
    unsigned int in;    /* data is added at offset (in % size) */
    unsigned int out;    /* data is extracted from off. (out % size) */
    spinlock_t *lock;    /* protects concurrent modifications */
};

這是kfifo的數據結構，kfifo主要提供了兩個操作，__kfifo_put(入隊操作)和__kfifo_get(出隊操作)。它的各個數據成員如下：

buffer: 用於存放數據的緩存

size: buffer空間的大小，在初化時，將它向上擴展成2的冪

lock: 如果使用不能保證任何時間最多只有一個讀線程和寫線程，需要使用該lock實施同步。

in, out: 和buffer一起構成一個循環隊列。 in指向buffer中隊頭，而且out指向buffer中的隊尾，它的結構如示圖如下：

+--------------------------------------------------------------+
|            |<----------data---------->|                      |
+--------------------------------------------------------------+
             ^                          ^                      ^
             |                          |                      |
            out                        in                     size

當然，內核開發者使用了一種更好的技術處理了in, out和buffer的關系，我們將在下面進行詳細分析。

kfifo功能描述

kfifo提供如下對外功能規格

只支持一個讀者和一個讀者並發操作
無阻塞的讀寫操作，如果空間不夠，則返回實際訪問空間

kfifo_alloc 分配kfifo內存和初始化工作

struct kfifo *kfifo_alloc(unsigned int size, gfp_t gfp_mask, spinlock_t *lock)
{
    unsigned char *buffer;
    struct kfifo *ret;

    /*
     * round up to the next power of 2, since our ‘let the indices
     * wrap‘ tachnique works only in this case.
     */
    if (size & (size - 1)) {
        BUG_ON(size > 0x80000000);
        size = roundup_pow_of_two(size);
    }

    buffer = kmalloc(size, gfp_mask);
    if (!buffer)
        return ERR_PTR(-ENOMEM);

    ret = kfifo_init(buffer, size, gfp_mask, lock);

    if (IS_ERR(ret))
        kfree(buffer);

    return ret;
}

這裏值得一提的是，kfifo->size的值總是在調用者傳進來的size參數的基礎上向2的冪擴展，這是內核一貫的做法。這樣的好處不言而喻——對kfifo->size取模運算可以轉化為與運算，如下：

kfifo->in % kfifo->size 可以轉化為 kfifo->in & (kfifo->size – 1)

在kfifo_alloc函數中，使用size & (size – 1)來判斷size 是否為2冪，如果條件為真，則表示size不是2的冪，然後調用roundup_pow_of_two將之向上擴展為2的冪。

這都是常用的技巧，只不過大家沒有將它們結合起來使用而已，下面要分析的__kfifo_put和__kfifo_get則是將kfifo->size的特點發揮到了極致。

__kfifo_put和__kfifo_get巧妙的入隊和出隊

__kfifo_put是入隊操作，它先將數據放入buffer裏面，最後才修改in參數；__kfifo_get是出隊操作，它先將數據從buffer中移走，最後才修改out。你會發現in和out兩者各司其職。

下面是__kfifo_put和__kfifo_get的代碼

unsigned int __kfifo_put(struct kfifo *fifo,
             unsigned char *buffer, unsigned int len)
{
    unsigned int l;

    len = min(len, fifo->size - fifo->in + fifo->out);

    /*
     * Ensure that we sample the fifo->out index -before- we
     * start putting bytes into the kfifo.
     */

    smp_mb();

    /* first put the data starting from fifo->in to buffer end */
    l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
    memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);

    /* then put the rest (if any) at the beginning of the buffer */
    memcpy(fifo->buffer, buffer + l, len - l);

    /*
     * Ensure that we add the bytes to the kfifo -before-
     * we update the fifo->in index.
     */

    smp_wmb();

    fifo->in += len;

    return len;
}

奇怪嗎？代碼完全是線性結構，沒有任何if-else分支來判斷是否有足夠的空間存放數據。內核在這裏的代碼非常簡潔，沒有一行多余的代碼。

l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));

這個表達式計算當前寫入的空間，換成人可理解的語言就是：

l = kfifo可寫空間和預期寫入空間的最小值

使用min宏來代if-else分支

__kfifo_get也應用了同樣技巧，代碼如下：

unsigned int __kfifo_get(struct kfifo *fifo,
             unsigned char *buffer, unsigned int len)
{
    unsigned int l;

    len = min(len, fifo->in - fifo->out);

    /*
     * Ensure that we sample the fifo->in index -before- we
     * start removing bytes from the kfifo.
     */

    smp_rmb();

    /* first get the data from fifo->out until the end of the buffer */
    l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
    memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);

    /* then get the rest (if any) from the beginning of the buffer */
    memcpy(buffer + l, fifo->buffer, len - l);

    /*
     * Ensure that we remove the bytes from the kfifo -before-
     * we update the fifo->out index.
     */

    smp_mb();

    fifo->out += len;

    return len;
}

認真讀兩遍吧，我也讀了多次，每次總是有新發現，因為in, out和size的關系太巧妙了，竟然能利用上unsigned int回繞的特性。

原來，kfifo每次入隊或出隊，kfifo->in或kfifo->out只是簡單地kfifo->in/kfifo->out += len，並沒有對kfifo->size 進行取模運算。因此kfifo->in和kfifo->out總是一直增大，直到unsigned in最大值時，又會繞回到0這一起始端。但始終滿足：

kfifo->in - kfifo->out <= kfifo->size

即使kfifo->in回繞到了0的那一端，這個性質仍然是保持的。

對於給定的kfifo:

數據空間長度為：kfifo->in - kfifo->out

而剩余空間（可寫入空間）長度為：kfifo->size - (kfifo->in - kfifo->out)

盡管kfifo->in和kfofo->out一直超過kfifo->size進行增長，但它對應在kfifo->buffer空間的下標卻是如下：

kfifo->in % kfifo->size (i.e. kfifo->in & (kfifo->size - 1))

kfifo->out % kfifo->size (i.e. kfifo->out & (kfifo->size - 1))

往kfifo裏面寫一塊數據時，數據空間、寫入空間和kfifo->size的關系如果滿足：

kfifo->in % size + len > size

那就要做寫拆分了，見下圖：

                                                    kfifo_put（寫）空間開始地址
                                                    |
                                                   \_/
                                                    |XXXXXXXXXX
XXXXXXXX|                                                    
+--------------------------------------------------------------+
|                        |<----------data---------->|          |
+--------------------------------------------------------------+
                         ^                          ^          ^
                         |                          |          |
                       out%size                   in%size     size
        ^
        |
      寫空間結束地址

第一塊當然是: [kfifo->in % kfifo->size, kfifo->size]
第二塊當然是：[0, len - (kfifo->size - kfifo->in % kfifo->size)]

下面是代碼，細細體味吧：

/* first put the data starting from fifo->in to buffer end */   
l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));   
memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);   

/* then put the rest (if any) at the beginning of the buffer */   
memcpy(fifo->buffer, buffer + l, len - l);

對於kfifo_get過程，也是類似的，請各位自行分析。

kfifo_get和kfifo_put無鎖並發操作

計算機科學家已經證明，當只有一個讀經程和一個寫線程並發操作時，不需要任何額外的鎖，就可以確保是線程安全的，也即kfifo使用了無鎖編程技術，以提高kernel的並發。

kfifo使用in和out兩個指針來描述寫入和讀取遊標，對於寫入操作，只更新in指針，而讀取操作，只更新out指針，可謂井水不犯河水，示意圖如下：

                                               |<--寫入-->|
+--------------------------------------------------------------+
|                        |<----------data----->|               |
+--------------------------------------------------------------+
                         |<--讀取-->|
                         ^                     ^               ^
                         |                     |               |
                        out                   in              size

為了避免讀者看到寫者預計寫入，但實際沒有寫入數據的空間，寫者必須保證以下的寫入順序：

往[kfifo->in, kfifo->in + len]空間寫入數據

更新kfifo->in指針為 kfifo->in + len

在操作1完成時，讀者是還沒有看到寫入的信息的，因為kfifo->in沒有變化，認為讀者還沒有開始寫操作，只有更新kfifo->in之後，讀者才能看到。

那麽如何保證1必須在2之前完成，秘密就是使用內存屏障：smp_mb()，smp_rmb(), smp_wmb()，來保證對方觀察到的內存操作順序。

總結

讀完kfifo代碼，令我想起那首詩“眾裏尋他千百度，默然回首，那人正在燈火闌珊處”。不知你是否和我一樣，總想追求簡潔，高質量和可讀性的代碼，當用盡各種方法，江郞才盡之時，才發現Linux kernel裏面的代碼就是我們尋找和學習的對象。

linux內核數據結構之kfifo

wiki 概述 ext 出隊簡潔而且 bubuko pow memcpy 1、前言　　最近項目中用到一個環形緩沖區（ring buffer），代碼是由linux內核的kfifo改過來的。緩沖區在文件系統中經常用到，通過緩沖區緩解cpu讀寫內存和讀寫磁盤的速度。例如一

linux內核數據結構之kfifo

巧奪天工的kfifo

kfifo概述

kfifo功能描述

kfifo_alloc 分配kfifo內存和初始化工作

__kfifo_put和__kfifo_get巧妙的入隊和出隊

kfifo_get和kfifo_put無鎖並發操作

總結

linux內核數據結構之kfifo

Linux內核數據結構hlist_head

【Python】07、python內置數據結構之字符串及bytes

【Python】10、python內置數據結構之集合

【Python】11、python內置數據結構之字典

【Redis源代碼剖析】 - Redis內置數據結構之壓縮字典zipmap

python內置數據結構之list

四、文件內核數據結構和原子操作

從hook開始聊聊那些windows內核數據結構

第六章：內核數據結構

數據結構之隊列

數據結構之算法

數據結構之棧

數據結構之堆棧

數據結構之靜態隊列（循環隊列）

Redis 數據結構之dict（2）

內置數據結構：列表及常用操作

java數據結構之三叉鏈表示的二叉樹

【Python】06、python內置數據結構1

數據結構之鏈表

linux內核數據結構之kfifo

巧奪天工的kfifo

kfifo概述

kfifo功能描述

kfifo_alloc 分配kfifo內存和初始化工作

__kfifo_put和__kfifo_get巧妙的入隊和出隊

kfifo_get和kfifo_put無鎖並發操作

總結

相關推薦