LLVM Essentials-Packt 2016(讀書筆記):TableGen講解並不透徹,另外我還想知道後端優化步演算法到底怎麼編寫?
阿新 • • 發佈:2019-02-20
Playing with LLVM[編輯]
- 暫存器變數(%var)、棧變數(alloca,%1 ...)、
- .c-->.bc:$ clang -emit-llvm -c main.c
- .bc-->.s:$ llc output.bc –o output.s
- .ll-->.bc:$ llvm-as add.ll –o add.bc
- opt
- -analyze選項:basicaa、da、instcount、loops、scalar evolution
Building LLVM IR[編輯]
static LLVMContext &Context = getGlobalContext(); static Module *ModuleOb = new Module("my compiler", Context);
FunctionType *funcType = llvm::FunctionType::get(Builder.getInt32Ty(), false); //注意這裡type被簡寫為Ty了 Function *fooFunc = llvm::Function::Create(funcType, llvm::Function::ExternalLinkage, Name, ModuleOb);
這裡的‘外部連結’實際上是指匯出符號;
BasicBlock* bb = BasicBlock::Create(Context, Name, fooFunc);
全域性變數:
ModuleOb->getOrInsertGlobal(Name, Builder.getInt32Ty()); GlobalVariable *gVar = ModuleOb->getNamedGlobal(Name); ... //得到: @x = common global i32, align 4
插入返回值語句:
Builder.SetInsertPoint(entry); //注意,SetInsertPoint API顯然是有狀態的; Builder.CreateRet(Builder.getInt32(0));
設定函式引數:略
分支語句:需要phi merge節點
PHINode *Phi = Builder.CreatePHI(Type::getInt32Ty(getGlobalContext()), PhiBBSize, "iftmp"); Phi->addIncoming(ThenVal, ThenBB); Phi->addIncoming(ElseVal, ElseBB); //注意這裡由於SSA,bb本身就是value;
迴圈:略
... Builder.CreateCondBr(EndCond, LoopBB, AfterBB); ...
高階IR[編輯]
- getelementptr:offset支援負值嗎?
- load
- store
- insertelement(其實不就是給陣列元素賦值嗎?)
- extractelement
- %0 = extractelement <4 x i32> %a, i32 0 //注意這裡陣列型別的寫法,型別寫在變數的前面
基本IR變換[編輯]
- runOn{Passtype}: Module、Function、BasicBlock、Loop
- getAnalysisUsage:指定pass之間的依賴關係
- AU.addRequired<AliasAnalysis>(); //注意這裡使用了成員函式模板
- addRequiredTransitive
- addPreserved
- 指令簡化
- if (match(Op0, m_Not(m_Specific(Op1))) || match(Op1, m_Not(m_Specific(Op0)))) //注意這裡的匹配模板寫法
- instcombine:化簡成等價且更少的指令
高階IR塊變換[編輯]
- Loop processing
- CFG:dominate關係
- 迴圈規範化:增加preheader、exit block,只允許一個backedge等等
- LoopPass基類、LPPassManager(llvm的類方法命名總是喜歡突然來個縮寫,fuck)
- LICM(迴圈不變式外提)
- 更多的迴圈優化:lib/Transforms/Scalar
- Scalar evolution(更高階的“抽象解釋”?)
- $ opt -analyze -scalar-evolution scalevl.ll
- LLVM intrinsics(編譯器內建函式)
- call void @llvm.memset.p0i8.i64(i8* %a2, i8 0, i64 20, i32 16, i1 false) //這讓人感覺所謂的LLVM編譯器其實只是直譯器?(runtime函式)
- %1 = getelementptr inbounds [5 x i32], [5 x i32]* %a, i64 0, i64 0
- Vectorization(不是特別的清楚,“Loop-Aware SLP in GCC”by Ira Rosen, etc?)
- 2種類型:SLP、Loop vectorization
- SIMD
- $ opt -S -basicaa -slp-vectorizer -mtriple=aarch64-unknown-linuxgnu -mcpu=cortex-a57 addsub.ll –debug
IR到Selection DAG階段[編輯]
- SelectionDAGBuilder:以%add = add nsw i32 %a, %b為例
- SelectionDAGBuilder::visit
- visitAdd
- visitBinary SDValue?
- Legalizing SelectionDAG(合法化,目標平臺適配)
- 例:X86上sdiv擴充套件到sdivrem
- Optimizing SelectionDAG
- DAGCombiner
- AArch64DAGToDAGISel::Select
- Instruction Selection(注意,指令型別平臺已經支援了,但是暫存器什麼的還沒分配呢)
- X86DAGToDAGISel::SelectCode() TableGen自動生成(llvm很難理解的地方就是TableGen的語法)
- Scheduling and emitting machine instructions
- InstrEmitter::EmitMachineNode:SDNode ==> MachineInstr(MachineBasicBlock)
- MachineInstrBuilder
- CreateVirtualRegisters(這裡還是‘虛擬暫存器’?)
- virtual AdjustInstrPostInstrSelection
- Register allocation
- spilling
- SSA form deconstruction(phi到reg copy)
- 對映虛擬暫存器到物理暫存器:2種方法
- 直接對映:TargetRegisterInfo/MachineOperand(程式設計師自己實現?)
- 間接:VirtRegMap::assignVirt2Phys(llvm內建的?)
- llvm 4種分配技術:
- Basic
- Fast
- PBQP
- Greedy
- Code Emission:LLVM JIT和MC(生成obj格式的檔案)
- AsmPrinter:使用平臺特定的MCInstLowering介面如X86MCInstLower
- MCInst指令傳遞給MCStreamer物件
- 注意,the MC Layer is one of the big difference between LLVM and GCC.(GCC生成彙編格式的程式碼,依賴於平臺外部彙編?)
- $ llc test.ll -show-mc-encoding -o -
見鬼,我還是沒有明白SDAG的作用(LLVM IR裡不是有迴圈嗎?為什麼SDAG就變成DAG了呢?)
為目標架構生成程式碼[編輯]
- 沒有tablegen,llvm本身只具有學術意義,有了tablegen,llvm才變成了可工業使用的牛逼庫
- pipeline:SelectionDAG --> MachineDAG --> MachineInstr --> MCInst
- 定義一個玩具後端:r0-3, sp, pc, cpsr(pc?)
- Defining registers and register sets
- 每個暫存器都有一個唯一編號,這要求平臺指令中的暫存器位表示是一致的(當然,有些是隱含的比如push/pop)
- Defining the calling convention(ABI)
-
def CC_TOY : CallingConv<[
- CCIfType<[i8, i16], CCPromoteToType<i32>>, //8位、16位的提升到32位
- CCIfType<[i32], CCAssignToReg<[R0, R1]>>,
- CCIfType<[i32], CCAssignToStack<4, 4>> //開始2個引數R0,R1暫存器傳遞,剩餘的通過棧傳遞
- def CC_Save : CalleeSavedRegs<(add R2, R3)>;
-
def CC_TOY : CallingConv<[
- Defining the instruction set
- def ADDrr : InstTOY<(outs GRRegs:$dst), (ins GRRegs:$src1, GRRegs:$src2), "add $dst, $src1,z$src2", [(set i32:$dst, (add i32:$src1, i32:$src2))]>;
- Implementing frame lowering
- Frame lowering involves emitting function prologue and epilogue.(llvm ir是直接定義函式的,包括ret指令)
- void TOYFrameLowering::emitPrologue(MachineFunction &MF) const {
- const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
- MachineBasicBlock &MBB = MF.front();
- MachineBasicBlock::iterator MBBI = MBB.begin();
- uint64_t StackSize = computeStackSize(MF);
- unsigned StackReg = TOY::SP;
- unsigned OffsetReg = materializeOffset(MF, MBB, MBBI, (unsigned)StackSize);
- ... //略
- Lowering instructions
- 程式碼略
- Printing an instruction
- Registering a target(略)