[Erlang 0119] Erlang OTP 原始碼閱讀指引
阿新 • • 發佈:2018-12-29
上週Erlang討論群裡面提到lists的++實現,爭論大多基於猜測,其實開啟程式碼看一下就都明瞭.貼出程式碼截圖後有同學問這程式碼是哪裡找的?
"程式碼去哪裡找?",關於Erlang原始碼閱讀的路線圖江湖上只有一份殘卷了.我覺得"程式碼在哪兒?"這類問題是資訊不對稱造成的,本身難度不大,就像<貧民窟的百萬富翁>裡面的情節:賈馬爾知道市井生活中的零零碎碎卻說不出國徽上的文字,我們就從電影中的這一幕開始本文的探索吧
內景,演播室—夜晚
普瑞姆:這個問題的獎金四千盧比……印度的國徽是三隻獅子,獅子下面寫的是什麼?是否是……
A.惟有真理必勝 B.惟有謊言必勝 C.惟有時尚必勝 D.惟有金錢必勝 [普瑞姆假裝困惑的樣子看向觀眾,引他們發笑.] 普瑞姆:你覺得是哪一個呢,賈馬爾?這是我國曆史上最著名的一句話.或許你想給朋友打電話求助吧?
[觀眾哈哈大笑.一滴汗珠從賈馬爾的額頭流下來.普瑞姆喜歡賈馬爾的不安.] 普瑞姆:或者向現場觀眾求助?我憑直覺認為他們可能知道答案.你想怎麼辦?
賈馬爾:是的.
普瑞姆(吃驚):什麼是的?
賈馬爾:求助觀眾.
[普瑞姆吹口哨.舉目望向觀眾席.]
普瑞姆:那麼女士們、先生們,請幫他解難吧.現在請按下你們的選擇鍵.
[燈光轉暗.讓人緊張的音樂聲響起.]
內景,督察辦公室—白天
[督察按暫停鍵.嘆了口氣.]
督察:賈馬爾,我五歲大的女兒都知道答案,你卻不知道.這對一個天才百萬富翁來說,不是很奇怪嗎?怎麼回事?你的作弊同夥跑出去撒尿了是嗎?又或者是他咳得不夠大聲?
[沉默.斯里尼瓦斯警員朝賈馬爾的椅子踢了一腳.]
斯里尼瓦斯警員:督察問你話呢.
賈馬爾:在喬帕蒂海灘吉萬的小吃攤上,炸脆餅多少錢?
督察:什麼?
賈馬爾:一份炸脆餅,多少錢?
斯里尼瓦斯警員(忍不住說):十盧比.
賈馬爾:錯.排燈節過後就是十五盧比了.上個星期四,是誰在達達爾車站外面偷了瓦爾馬警員的自行車?
督察(被逗樂了):你知道是誰偷的?
賈馬爾:朱胡區的每個人都知道.連五歲的小孩兒都知道. 言 歸正傳,我們從程式碼下載開始......
A.惟有真理必勝 B.惟有謊言必勝 C.惟有時尚必勝 D.惟有金錢必勝 [普瑞姆假裝困惑的樣子看向觀眾,引他們發笑.] 普瑞姆:你覺得是哪一個呢,賈馬爾?這是我國曆史上最著名的一句話.或許你想給朋友打電話求助吧?
[觀眾哈哈大笑.一滴汗珠從賈馬爾的額頭流下來.普瑞姆喜歡賈馬爾的不安.] 普瑞姆:或者向現場觀眾求助?我憑直覺認為他們可能知道答案.你想怎麼辦?
賈馬爾:是的.
普瑞姆(吃驚):什麼是的?
賈馬爾:求助觀眾.
[普瑞姆吹口哨.舉目望向觀眾席.]
普瑞姆:那麼女士們、先生們,請幫他解難吧.現在請按下你們的選擇鍵.
[燈光轉暗.讓人緊張的音樂聲響起.]
內景,督察辦公室—白天
[督察按暫停鍵.嘆了口氣.]
督察:賈馬爾,我五歲大的女兒都知道答案,你卻不知道.這對一個天才百萬富翁來說,不是很奇怪嗎?怎麼回事?你的作弊同夥跑出去撒尿了是嗎?又或者是他咳得不夠大聲?
[沉默.斯里尼瓦斯警員朝賈馬爾的椅子踢了一腳.]
斯里尼瓦斯警員:督察問你話呢.
賈馬爾:在喬帕蒂海灘吉萬的小吃攤上,炸脆餅多少錢?
督察:什麼?
賈馬爾:一份炸脆餅,多少錢?
斯里尼瓦斯警員(忍不住說):十盧比.
賈馬爾:錯.排燈節過後就是十五盧比了.上個星期四,是誰在達達爾車站外面偷了瓦爾馬警員的自行車?
督察(被逗樂了):你知道是誰偷的?
賈馬爾:朱胡區的每個人都知道.連五歲的小孩兒都知道. 言
原始碼下載
對於選擇了Windows安裝包的同學,要特別提示一下:lib目錄中包含了對應類庫的原始碼和ebin,比如kernel,stdlib等等,但ERTS目錄裡面沒有對應原始碼,自己去下載一份來看吧,或者直接線上檢視 https://github.com/erlang/otp/tree/maint/erts原始碼閱讀工具
Erlang OTP原始碼量不小,好的工具能幫我們省很多事,比如支援資料夾查詢或者專案內搜尋的,在程式碼之間各種跳轉更是減少很多麻煩.如果是在Windows環境中Everything這樣的工具也是定位檔案利器,Visual studio 閱讀C程式碼體驗真的很棒,當然瞭如果你喜歡在純文字編輯器裡面用正則搞,也無不可;下面是在VS中程式碼截圖:Overview
大體上,otp_src的程式碼如下圖這樣組織的(開啟資料夾就可以看到,算不上什麼Thirty Thousand Feet).與我們每天寫程式碼最息息相關的是ERTS和lib;ERTS(Erlang Run-Time System)包含了Erlang執行時系統的程式碼,是Erlang的基礎設施.lib包含了所有的外圍類庫實現,有些類庫的安排是違反直覺的,不過習慣了就好了,比如file.erl不是在stdlib而是在kernel;gen_server gen_fsm的程式碼實現應該是在kernel吧?錯,它們的程式碼是在stdlib下;但是呢,application.erl是在kernel. Kernel 看一下kernel目錄,是不是有點摸不著頭腦?Erlang執行時是有一個kernel application執行,執行一下appmon我們可以動態看到kernel涉及到的程式碼模組.我們大致可以揣摩到設計者的規劃原則:kernel的範疇包含了application管理,code生命週期管理,IO(檔案IO,網路IO,io_request),HIPE,分散式基礎設施等等,見下面的思維導圖: 上面的劃分方式只是我個人的一種看法,為了方便查閱我把上圖轉成了文字,見下面:Kernel Kernel APP kernel.erl kernel_config.erl kernel.appup.src kernel.app.src application管理 application_controller.erl application_starter.erl application_master.hrl application_master.erl application.erl heart.erl HIPE hipe_ext_format.hrl hipe_unified_loader.erl 除錯& 日誌 日誌 disk_log.erl disk_log_sup.erl disk_log_server.erl disk_log_1.erl disk_log.hrl error_logger.erl wrap_log_reader.erl 除錯 error_handler.erl erts_debug.erl standard_error.erl seq_trace.erl IO 檔案IO file.erl file_server.erl file_io_server.erl ram_file.erl 網路IO gen_sctp.erl gen_udp.erl gen_tcp.erl inet.erl inet_config.hrl inet_config.erl inet_boot.hrl inet6_udp.erl inet6_tcp_dist.erl inet6_tcp.erl inet6_sctp.erl inet_db.erl inet_dns.hrl inet_dns.erl inet_gethost_native.erl inet_udp.erl inet_tcp_dist.erl inet_tcp.erl inet_sctp.erl inet_res.hrl inet_res.erl inet_parse.erl inet_int.hrl inet_hosts.erl inet_dns_record_adts.pl erl_reply.erl net_kernel.erl net_adm.erl net.erl IO Request user_drv.erl user.erl user_sup.erl group.erl Code生命週期管理 code.erl code_server.erl erl_boot_server.erl erl_ddll.erl distribute管理 dist_util.erl dist_ac.erl Distributed Applications Controller erl_distribution.erl erl_epmd.erl rpc.erl pg2.erl global_search.erl global_group.erl global.erl auth.erl OS os.erlstdlib 相比kernel,stdlib恰如起名包含了絕大多數的功能模組,比如lists,ets,各種資料結構實現,當然最重要的是它包含了OTP的gen_server gen_fsm gen_event supervisor以及幕後英雄proc_lib和sys.如果你不嫌棄,這裡有一份略微過時的文件,是我初學Erlang的時候在文件上做的筆記註釋:[Erlang STDLIB 中文註釋版] 特別值得一提的是shell和shell_default,對Erlang Shell好奇的同學看看這裡能找到答案,所謂"EShell裡面靈異的問題"也就有了一個合理的解釋. 其它的模組因為功能特別明確很容易定位到,比如專門處理XML的xmerl,資料庫mnesia等等,輔之以Google,幾乎沒有什麼障礙;
Dive into ERTS
Atom and bifs atom.names 枚舉了ERTS使用的atom,學習一下慣用法還是非常有必要的 bif.tab bif清單 注意 Use "ubif" for guard BIFs and operators; use "bif" for ordinary BIFs. Basic Type/* ** Data types: ** ** Eterm: A tagged erlang term (possibly 64 bits) ** BeamInstr: A beam code instruction unit, possibly larger than Eterm, not smaller. ** UInt: An unsigned integer exactly as large as an Eterm. ** SInt: A signed integer exactly as large as an eterm and therefor large ** enough to hold the return value of the signed_val() macro. ** UWord: An unsigned integer at least as large as a void * and also as large ** or larger than an Eterm ** SWord: A signed integer at least as large as a void * and also as large ** or larger than an Eterm ** Uint32: An unsigned integer of 32 bits exactly ** Sint32: A signed integer of 32 bits exactly ** Uint16: An unsigned integer of 16 bits exactly ** Sint16: A signed integer of 16 bits exactly. */這裡我們還能看到一些複雜資料結構的內部表示,比如: 兩個例子 看兩個例子吧,第一個例子lists的append是如何實現的,很容易找到lists.erl append(L1, L2) -> L1 ++ L2. 我們發現其實append就是使用的++,那++是在哪裡實現的呢? 比較有趣的一個地方是這兩句:
copy = last = CONS(hp, CAR(list_val(list)), make_list(hp + 2)); list = CDR(list_val(list));有同學說,CAR CDR CONS這三個東西好熟悉啊?對,沒錯,這就是Lisp列表操作的三個基礎原語,分別實現取表頭,取表頭外剩餘部分,表構造(constructs),跳轉到它們的實現,在erl_term.h:
#define CONS(hp, car, cdr) \ (CAR(hp)=(car), CDR(hp)=(cdr), make_list(hp)) #define CAR(x) ((x)[0]) #define CDR(x) ((x)[1])第二個例子 看看process的定義是什麼樣的 首先在 erl_process.h 找到 Process的定義 typedef struct process Process; 轉到struct process的定義:
struct process { ErtsPTabElementCommon common; /* *Need* to be first in struct */ /* All fields in the PCB that differs between different heap * architectures, have been moved to the end of this struct to * make sure that as few offsets as possible differ. Different * offsets between memory architectures in this struct, means that * native code have to use functions instead of constants. */ Eterm* htop; /* Heap top */ Eterm* stop; /* Stack top */ Eterm* heap; /* Heap start */ Eterm* hend; /* Heap end */ Uint heap_sz; /* Size of heap in words */ Uint min_heap_size; /* Minimum size of heap (in words). */ Uint min_vheap_size; /* Minimum size of virtual heap (in words). */ #if !defined(NO_FPE_SIGNALS) || defined(HIPE) volatile unsigned long fp_exception; #endif #ifdef HIPE /* HiPE-specific process fields. Put it early in struct process, to enable smaller & faster addressing modes on the x86. */ struct hipe_process_state hipe; #endif /* * Saved x registers. */ Uint arity; /* Number of live argument registers (only valid * when process is *not* running). */ Eterm* arg_reg; /* Pointer to argument registers. */ unsigned max_arg_reg; /* Maximum number of argument registers available. */ Eterm def_arg_reg[6]; /* Default array for argument registers. */ BeamInstr* cp; /* (untagged) Continuation pointer (for threaded code). */ BeamInstr* i; /* Program counter for threaded code. */ Sint catches; /* Number of catches on stack */ Sint fcalls; /* * Number of reductions left to execute. * Only valid for the current process. */ Uint32 rcount; /* suspend count */ int schedule_count; /* Times left to reschedule a low prio process */ Uint reds; /* No of reductions for this process */ Eterm group_leader; /* Pid in charge (can be boxed) */ Uint flags; /* Trap exit, etc (no trace flags anymore) */ Eterm fvalue; /* Exit & Throw value (failure reason) */ Uint freason; /* Reason for detected failure */ Eterm ftrace; /* Latest exception stack trace dump */ Process *next; /* Pointer to next process in run queue */ struct ErtsNodesMonitor_ *nodes_monitors; ErtsSuspendMonitor *suspend_monitors; /* Processes suspended by this process via erlang:suspend_process/1 */ ErlMessageQueue msg; /* Message queue */ union { ErtsBifTimer *bif_timers; /* Bif timers aiming at this process */ void *terminate; } u; ProcDict *dictionary; /* Process dictionary, may be NULL */ Uint seq_trace_clock; Uint seq_trace_lastcnt; Eterm seq_trace_token; /* Sequential trace token (tuple size 5 see below) */ #ifdef USE_VM_PROBES Eterm dt_utag; /* Place to store the dynamc trace user tag */ Uint dt_utag_flags; /* flag field for the dt_utag */ #endif BeamInstr initial[3]; /* Initial module(0), function(1), arity(2), often used instead of pointer to funcinfo instruction, hence the BeamInstr datatype */ BeamInstr* current; /* Current Erlang function, part of the funcinfo: * module(0), function(1), arity(2) * (module and functions are tagged atoms; * arity an untagged integer). BeamInstr * because it references code */ /* * Information mainly for post-mortem use (erl crash dump). */ Eterm parent; /* Pid of process that created this process. */ erts_approx_time_t approx_started; /* Time when started. */ /* This is the place, where all fields that differs between memory * architectures, have gone to. */ Eterm *high_water; Eterm *old_hend; /* Heap pointers for generational GC. */ Eterm *old_htop; Eterm *old_heap; Uint16 gen_gcs; /* Number of (minor) generational GCs. */ Uint16 max_gen_gcs; /* Max minor gen GCs before fullsweep. */ ErlOffHeap off_heap; /* Off-heap data updated by copy_struct(). */ ErlHeapFragment* mbuf; /* Pointer to message buffer list */ Uint mbuf_sz; /* Size of all message buffers */ ErtsPSD *psd; /* Rarely used process specific data */ Uint64 bin_vheap_sz; /* Virtual heap block size for binaries */ Uint64 bin_vheap_mature; /* Virtual heap block size for binaries */ Uint64 bin_old_vheap_sz; /* Virtual old heap block size for binaries */ Uint64 bin_old_vheap; /* Virtual old heap size for binaries */ ErtsProcSysTaskQs *sys_task_qs; erts_smp_atomic32_t state; /* Process state flags (see ERTS_PSFLG_*) */ #ifdef ERTS_SMP ErlMessageInQueue msg_inq; ErtsPendExit pending_exit; erts_proc_lock_t lock; ErtsSchedulerData *scheduler_data; Eterm suspendee; ErtsPendingSuspend *pending_suspenders; erts_smp_atomic_t run_queue; #ifdef HIPE struct hipe_process_state_smp hipe_smp; #endif #endif #ifdef CHECK_FOR_HOLES Eterm* last_htop; /* No need to scan the heap below this point. */ ErlHeapFragment* last_mbuf; /* No need to scan beyond this mbuf. */ #endif #ifdef DEBUG Eterm* last_old_htop; /* * No need to scan the old heap below this point * when looking for invalid pointers into the new heap or * heap fragments. */ #endif #ifdef FORCE_HEAP_FRAGS Uint space_verified; /* Avoid HAlloc forcing heap fragments when */ Eterm* space_verified_from; /* we rely on available heap space (TestHeap) */ #endif };莊子說:"吾生也有涯,而知也無涯.以有涯隨無涯,殆已!",所以各取所需就好,今天就到這裡,且行且珍惜吧