內網滲透-內網穿透工具
背景
redis一直以來都是以單執行緒模式執行,這裡的單執行緒指網路IO和命令的執行部分。今年釋出了6.0版本,加上了多執行緒來處理網路IO(read,write)和命令的解析。
單執行緒模式優缺點
這個想必大家都知道,簡單介紹一下。
優點:
- 純記憶體操作,CPU不是其效能瓶頸,開多個程序也可以更容易的使用多個CPU
- 無需考慮多執行緒同步,對開發友好
- 執行命令天然原子性
- 使用IO多路複用來處理大量連線,省去了執行緒上下文切換的時間
缺點:
- 耗時的操作將引起阻塞
- 單例項不能充分利用多核CPU(read/write還是需要CPU的參與在核心態與使用者態之間copy資料)
redis網路IO模型簡介
redis採用IO多路複用來管理多個網路連線,程式碼編寫採用Reactor模式。
主執行緒是一個事件迴圈。
簡單看下原始碼:
/* State of an event based program */ typedef struct aeEventLoop { int maxfd; /* highest file descriptor currently registered */ int setsize; /* max number of file descriptors tracked */ long long timeEventNextId; time_t lastTime; /* Used to detect system clock skew */ aeFileEvent *events; /* Registered events */ aeFiredEvent *fired; /* Fired events */ aeTimeEvent *timeEventHead; int stop; void *apidata; /* This is used for polling API specific data */ aeBeforeSleepProc *beforesleep; aeBeforeSleepProc *aftersleep; int flags; } aeEventLoop; struct redisServer { aeEventLoop *el; }
el
變數儲存了事件迴圈相關的資訊,其中void *apidata;
儲存了IO多路複用API相關的資訊,redis封裝了select
、epoll
、kqueue
等多種不同的IO多路複用函式,在編譯期根據平臺型別來選擇一種。
ae.c:
/* Include the best multiplexing layer supported by this system. * The following should be ordered by performances, descending. */ #ifdef HAVE_EVPORT #include "ae_evport.c" #else #ifdef HAVE_EPOLL #include "ae_epoll.c" #else #ifdef HAVE_KQUEUE #include "ae_kqueue.c" #else #include "ae_select.c" #endif #endif #endif
FileEvent
FileEvent
其實就是網路IO事件,為fd繫結讀寫對應的事件處理函式,當通過IO多路複用獲取到其就緒時,呼叫其繫結的處理函式。
/* File event structure */
typedef struct aeFileEvent {
int mask; /* one of AE_(READABLE|WRITABLE|BARRIER) */
aeFileProc *rfileProc;
aeFileProc *wfileProc;
void *clientData;
} aeFileEvent;
TimeEvent
TimeEvent
是定時任務事件。每個定時任務都繫結一個執行函式,巧妙的利用IO多路複用API拉取就緒事件時的阻塞時間引數,來實現定時的效果。比如最近要執行的定時任務是100ms後(這裡用的迴圈遍歷的方式獲取最值,時間複雜度O(n),可以改用跳錶之類的資料結構優化到O(log n),應該是作者考慮到定時任務並不會特別多,所以這裡並沒有專門去做優化),那麼就讓select
函式的阻塞超時時間設為100ms,這樣就可以實現一個不是特別精確的定時器。
/* Time event structure */
typedef struct aeTimeEvent {
long long id; /* time event identifier. */
long when_sec; /* seconds */
long when_ms; /* milliseconds */
aeTimeProc *timeProc;
aeEventFinalizerProc *finalizerProc;
void *clientData;
struct aeTimeEvent *prev;
struct aeTimeEvent *next;
int refcount; /* refcount to prevent timer events from being
* freed in recursive time event calls. */
} aeTimeEvent;
6.0版本引入多執行緒
IO多執行緒相關的配置
先看一下6.0版本配置檔案中關於多執行緒的引數和說明:
################################ THREADED I/O #################################
# Redis is mostly single threaded, however there are certain threaded
# operations such as UNLINK, slow I/O accesses and other things that are
# performed on side threads.
#
# Now it is also possible to handle Redis clients socket reads and writes
# in different I/O threads. Since especially writing is so slow, normally
# Redis users use pipelining in order to speed up the Redis performances per
# core, and spawn multiple instances in order to scale more. Using I/O
# threads it is possible to easily speedup two times Redis without resorting
# to pipelining nor sharding of the instance.
#
# By default threading is disabled, we suggest enabling it only in machines
# that have at least 4 or more cores, leaving at least one spare core.
# Using more than 8 threads is unlikely to help much. We also recommend using
# threaded I/O only if you actually have performance problems, with Redis
# instances being able to use a quite big percentage of CPU time, otherwise
# there is no point in using this feature.
#
# So for instance if you have a four cores boxes, try to use 2 or 3 I/O
# threads, if you have a 8 cores, try to use 6 threads. In order to
# enable I/O threads use the following configuration directive:
#
# io-threads 4
#
# Setting io-threads to 1 will just use the main thread as usual.
# When I/O threads are enabled, we only use threads for writes, that is
# to thread the write(2) syscall and transfer the client buffers to the
# socket. However it is also possible to enable threading of reads and
# protocol parsing using the following configuration directive, by setting
# it to yes:
#
# io-threads-do-reads no
#
# Usually threading reads doesn't help much.
#
# NOTE 1: This configuration directive cannot be changed at runtime via
# CONFIG SET. Aso this feature currently does not work when SSL is
# enabled.
#
# NOTE 2: If you want to test the Redis speedup using redis-benchmark, make
# sure you also run the benchmark itself in threaded mode, using the
# --threads option to match the number of Redis threads, otherwise you'll not
# be able to notice the improvements.
這裡我們需要關注以下幾點:
- 預設是單執行緒模式
- IO多執行緒用於
read
、write
函式 - 不需要開多個例項執行redis也可以輕鬆加速2倍的速度
io-threads
引數指明有幾個IO執行緒- 如果
io-threads
是1,則只有一個主執行緒,如果是2,則多開一個IO執行緒,以此類推 - 預設只有
write
函式會使用多執行緒 io-threads-do-reads
控制read
是否開啟多執行緒- 多執行緒IO對read的幫助並不是特別大
- SSL模式暫時不支援這個配置
- 對多執行緒IO的redis做基準測試的時候,
redis-benchmark
也要開啟多執行緒引數
看看原始碼
IO執行緒的主要原始碼在這裡:
關鍵全域性變數
pthread_t io_threads[IO_THREADS_MAX_NUM];
pthread_mutex_t io_threads_mutex[IO_THREADS_MAX_NUM];
_Atomic unsigned long io_threads_pending[IO_THREADS_MAX_NUM];
int io_threads_op; /* IO_THREADS_OP_WRITE or IO_THREADS_OP_READ. */
/* This is the list of clients each thread will serve when threaded I/O is
* used. We spawn io_threads_num-1 threads, since one is the main thread
* itself. */
list *io_threads_list[IO_THREADS_MAX_NUM];
io_threads
:pthread多執行緒結構體
io_threads_mutex
:互斥鎖,用於在主執行緒控制IO執行緒的停止和執行
io_threads_pending
:原子型別,和主執行緒進行同步的變數,如果io_threads_pending[i]==1
說明編號為i的執行緒就緒了,可以進行讀/寫操作。
io_threads_op
:當前操作是讀還是寫
io_threads_list
:每個執行緒的client佇列,對於某個thread,遍歷list依次處理其下的client
關鍵程式碼邏輯
在函式中分別呼叫和來喚醒IO執行緒,分別處理讀和寫。
在handleClientsWithPendingReadsUsingThreads
函式中可以看到,就緒的客戶端會均勻分配到n個IO執行緒中去執行:
/* Distribute the clients across N different lists. */
listIter li;
listNode *ln;
listRewind(server.clients_pending_read,&li);
int item_id = 0;
while((ln = listNext(&li))) {
client *c = listNodeValue(ln);
int target_id = item_id % server.io_threads_num;
listAddNodeTail(io_threads_list[target_id],c);
item_id++;
}
然後會通過設定io_threads_pending
變數來喚醒IO執行緒,假設設定了io-threads=4
則會有io-threads - 1 = 3
個額外的執行緒啟動,因為主執行緒也會作為一個IO執行緒。主執行緒處理io_threads_list[0]
裡面的客戶端。
/* Give the start condition to the waiting threads, by setting the
* start condition atomic var. */
io_threads_op = IO_THREADS_OP_READ;
for (int j = 1; j < server.io_threads_num; j++) {
int count = listLength(io_threads_list[j]);
io_threads_pending[j] = count;
}
/* Also use the main thread to process a slice of clients. */
listRewind(io_threads_list[0],&li);
while((ln = listNext(&li))) {
client *c = listNodeValue(ln);
readQueryFromClient(c->conn);
}
listEmpty(io_threads_list[0]);
然後主執行緒做完IO操作之後,會死迴圈等待其他IO執行緒完成讀操作,才會執行命令的執行,這個時候讀取資料和解析命令已經在IO執行緒中完成了,主執行緒執行命令,保證了命令執行的原子性。
/* Wait for all the other threads to end their work. */
while(1) {
unsigned long pending = 0;
for (int j = 1; j < server.io_threads_num; j++)
pending += io_threads_pending[j];
if (pending == 0) break;
}
if (tio_debug) printf("I/O READ All threads finshed\n");
/* Run the list of clients again to process the new buffers. */
while(listLength(server.clients_pending_read)) {
ln = listFirst(server.clients_pending_read);
client *c = listNodeValue(ln);
c->flags &= ~CLIENT_PENDING_READ;
listDelNode(server.clients_pending_read,ln);
if (c->flags & CLIENT_PENDING_COMMAND) {
c->flags &= ~CLIENT_PENDING_COMMAND;
if (processCommandAndResetClient(c) == C_ERR) {
/* If the client is no longer valid, we avoid
* processing the client later. So we just go
* to the next. */
continue;
}
}
processInputBuffer(c);
}
IO執行緒的執行邏輯在中:
死迴圈中等待io_threads_pending
被設定為非零值,這裡如果死迴圈一直輪詢會把CPU吃滿,所以這裡還有一個互斥鎖io_threads_mutex
來暫停IO執行緒,使其阻塞在pthread_mutex_lock
這裡。
/* Wait for start */
for (int j = 0; j < 1000000; j++) {
if (io_threads_pending[id] != 0) break;
}
/* Give the main thread a chance to stop this thread. */
if (io_threads_pending[id] == 0) {
pthread_mutex_lock(&io_threads_mutex[id]);
pthread_mutex_unlock(&io_threads_mutex[id]);
continue;
}
接下來就是根據io_threads_op
來區分是讀還是寫,去執行read
或write
/* Process: note that the main thread will never touch our list
* before we drop the pending count to 0. */
listIter li;
listNode *ln;
listRewind(io_threads_list[id],&li);
while((ln = listNext(&li))) {
client *c = listNodeValue(ln);
if (io_threads_op == IO_THREADS_OP_WRITE) {
writeToClient(c,0);
} else if (io_threads_op == IO_THREADS_OP_READ) {
readQueryFromClient(c->conn);
} else {
serverPanic("io_threads_op value is unknown");
}
}
listEmpty(io_threads_list[id]);
io_threads_pending[id] = 0;
最後看下後臺程序
io-threads=4
,會多額外的3個io-thread
:
top -Hp 17339
IO多執行緒模式流程
效能測試
這裡使用redis自帶的基準測試工具redis-benchmark
來進行測試。
對比圖
詳情資料
redis6.0.9:IO執行緒數是4
多執行緒:./redis-benchmark -c 1000 -n 1000000 --threads 4 --csv
單執行緒:./redis-benchmark -c 1000 -n 1000000 --csv
表格:
cmd | 4 threads & read yes | 4 threads & read no | 1 thread |
---|---|---|---|
PING_INLINE | 472589.81 | 363240.09 | 215610.17 |
PING_BULK | 515198.34 | 423908.44 | 213766.56 |
SET | 442673.75 | 372162.25 | 213401.62 |
GET | 476644.41 | 400320.28 | 212901.84 |
INCR | 460829.47 | 389559.81 | 214408.23 |
LPUSH | 399520.56 | 346500.34 | 220896.84 |
RPUSH | 430292.62 | 358680.03 | 217391.31 |
LPOP | 404203.72 | 344946.53 | 222024.86 |
RPOP | 399680.25 | 333111.25 | 215517.25 |
SADD | 450856.66 | 363372.09 | 216590.86 |
HSET | 399680.25 | 333111.25 | 217344.06 |
SPOP | 486854.94 | 405350.62 | 213401.62 |
ZADD | 415627.62 | 333222.28 | 217912.39 |
ZPOPMIN | 444049.72 | 402900.88 | 216122.77 |
LPUSH (needed to benchmark LRANGE) | 410677.62 | 342114.25 | 218914.19 |
LRANGE_100 (first 100 elements) | 113869.28 | 110168.56 | 75483.09 |
LRANGE_300 (first 300 elements) | 45687.13 | 44081.99 | 27139.99 |
LRANGE_500 (first 450 elements) | 31991.81 | 31406.05 | 20085.56 |
LRANGE_600 (first 600 elements) | 24688.31 | 23973.34 | 15635.75 |
MSET (10 keys) | 226244.34 | 200240.30 | 175500.17 |
總結
redis6.0之後針對網路IO增加了多執行緒,IO執行緒中只負責read、解析command、write操作,命令執行操作還是在主執行緒,依然具有原子性。
開啟四個IO執行緒的情況下,GET和SET操作,相對於單執行緒模式,開啟write+read多執行緒,效能為原來的2倍,只開啟write多執行緒,效能為原來的1.68倍
最後大家可以考慮下,為腎麼,多執行緒執行read速度提升並不明顯?