從零寫一個編譯器(十二):程式碼生成之生成邏輯
專案的完整程式碼在 C2j-Compiler
前言
在上一篇解釋完了一些基礎的Java位元組碼指令後,就可以正式進入真正的程式碼生成部分了。但是這部分先說的是程式碼生成依靠的幾個類,也就是用來生成指令的操作。
這一篇用到的檔案都在codegen下:
- Directive.java
- Instruction.java
- CodeGenerator.java
- ProgramGenerator.java
Directive.java
這個是列舉類,用來生成一些比較特殊的指令
都生成像宣告一個類或者一個方法的範圍的指令,比較簡單。
public enum Directive { CLASS_PUBLIC(".class public"), END_CLASS(".end class"), SUPER(".super"), FIELD_PRIVATE_STATIC(".field private static"), METHOD_STATIC(".method static"), METHOD_PUBLIC(".method public"), FIELD_PUBLIC(".field public"), METHOD_PUBBLIC_STATIC(".method public static"), END_METHOD(".end method"), LIMIT_LOCALS(".limit locals"), LIMIT_STACK(".limit stack"), VAR(".var"), LINE(".line"); private String text; Directive(String text) { this.text = text; } public String toString() { return text; } }
Instruction.java
這也是一個列舉類,用來生成一些基本的指令
public enum Instruction { LDC("ldc"), GETSTATIC("getstatic"), SIPUSH("sipush"), IADD("iadd"), IMUL("imul"), ISUB("isub"), IDIV("idiv"), INVOKEVIRTUAL("invokevirtual"), INVOKESTATIC("invokestatic"), INVOKESPECIAL("invokespecial"), RETURN("return"), IRETURN("ireturn"), ILOAD("iload"), ISTORE("istore"), NEWARRAY("newarray"), NEW("new"), DUP("dup"), ASTORE("astore"), IASTORE("iastore"), ALOAD("aload"), PUTFIELD("putfield"), GETFIELD("getfield"), ANEWARRAY("anewarray"), AASTORE("aastore"), AALOAD("aaload"), IF_ICMPEG("if_icmpeq"), IF_ICMPNE("if_icmpne"), IF_ICMPLT("if_icmplt"), IF_ICMPGE("if_icmpge"), IF_ICMPGT("if_icmpgt"), IF_ICMPLE("if_icmple"), GOTO("goto"), IALOAD("iaload"); private String text; Instruction(String s) { this.text = s; } public String toString() { return text; } }
CodeGenerator.java
重點來了,生成的邏輯主要都在CodeGenerator和ProgramGenerator裡,CodeGenerator是ProgramGenerator的父類
CodeGenerator的建構函式new了一個輸出流,用來輸出位元組碼到xxx.j裡
public CodeGenerator() { String assemblyFileName = programName + ".j"; try { bytecodeFile = new PrintWriter(new PrintStream(new File(assemblyFileName))); } catch (FileNotFoundException e) { e.printStackTrace(); } }
emit、emitString、emitDirective、emitBlankLine都屬於輸出基本指令的方法,都有多個過載方法來應對不一樣操作和運算元。需要注意的是,有的指令可能需要先快取起來,在最後的時候一起提交,比如buffered、classDefine就是用來判斷是不是應該先快取的布林值
public void emitString(String s) {
if (buffered) {
bufferedContent += s + "\n";
return;
}
if (classDefine) {
classDefinition += s + "\n";
return;
}
bytecodeFile.print(s);
bytecodeFile.flush();
}
public void emit(Instruction opcode) {
if (buffered) {
bufferedContent += "\t" + opcode.toString() + "\n";
return;
}
if (classDefine) {
classDefinition += "\t" + opcode.toString() + "\n";
return;
}
bytecodeFile.println("\t" + opcode.toString());
bytecodeFile.flush();
++instructionCount;
}
public void emitDirective(Directive directive, String operand1, String operand2, String operand3) {
if (buffered) {
bufferedContent += directive.toString() + " " + operand1 + " " + operand2 + " " + operand3 + "\n";
return;
}
if (classDefine) {
classDefinition += directive.toString() + " " + operand1 + " " + operand2 + " " + operand3 + "\n";
return;
}
bytecodeFile.println(directive.toString() + " " + operand1 + " " + operand2 + " " + operand3);
++instructionCount;
}
public void emitBlankLine() {
if (buffered) {
bufferedContent += "\n";
return;
}
if (classDefine) {
classDefinition += "\n";
return;
}
bytecodeFile.println();
bytecodeFile.flush();
}
ProgramGenerator.java
ProgramGenerator繼承了CodeGenerator,也就是繼承了一些基本的操作,在上一篇像結構體、陣列的指令輸出都在這個類裡
處理巢狀
先看四個屬性,這四個屬性主要是就來處理巢狀的分支和迴圈。
private int branch_count = 0;
private int branch_out = 0;
private String embedded = "";
private int loopCount = 0;
當沒巢狀一個ifelse語句時候 embedded屬性就會加上一個字元‘i’,而當退出一個分支的時候,就把這個‘i’切割掉
branch_count和branch_out都用來標誌相同作用域的分支跳轉
也就是說如果有巢狀就用embedded來處理,如果是用一個作用域的分支就用branch_count和branch_out來做標誌
public void incraseIfElseEmbed() {
embedded += "i";
}
public void decraseIfElseEmbed() {
embedded = embedded.substring(1);
}
public void emitBranchOut() {
String s = "\n" + embedded + "branch_out" + branch_out + ":\n";
this.emitString(s);
branch_out++;
}
loopCount則是對巢狀迴圈的處理
public void emitLoopBranch() {
String s = "\n" + "loop" + loopCount + ":" + "\n";
emitString(s);
}
public String getLoopBranch() {
return "loop" + loopCount;
}
public void increaseLoopCount() {
loopCount++;
}
處理結構體
putStructToClassDeclaration是定義結構體的,也就是new一個類。declareStructAsClass則是處理結構體裡的變數,也就是相當於處理類的屬性
- 結構體如果已經類的定義的話,就會加入structNameList,不要進行重複的定義
- symbol.getValueSetter()如果不是空的話就表明是一個結構體陣列,這樣就直接從陣列載入這個例項,不用在堆疊上建立
- declareStructAsClass則是依照上一篇說的Java位元組碼有關類的指令來建立一個類
public void putStructToClassDeclaration(Symbol symbol) {
Specifier sp = symbol.getSpecifierByType(Specifier.STRUCTURE);
if (sp == null) {
return;
}
StructDefine struct = sp.getStruct();
if (structNameList.contains(struct.getTag())) {
return;
} else {
structNameList.add(struct.getTag());
}
if (symbol.getValueSetter() == null) {
this.emit(Instruction.NEW, struct.getTag());
this.emit(Instruction.DUP);
this.emit(Instruction.INVOKESPECIAL, struct.getTag() + "/" + "<init>()V");
int idx = this.getLocalVariableIndex(symbol);
this.emit(Instruction.ASTORE, "" + idx);
}
declareStructAsClass(struct);
}
private void declareStructAsClass(StructDefine struct) {
this.setClassDefinition(true);
this.emitDirective(Directive.CLASS_PUBLIC, struct.getTag());
this.emitDirective(Directive.SUPER, "java/lang/Object");
Symbol fields = struct.getFields();
do {
String fieldName = fields.getName() + " ";
if (fields.getDeclarator(Declarator.ARRAY) != null) {
fieldName += "[";
}
if (fields.hasType(Specifier.INT)) {
fieldName += "I";
} else if (fields.hasType(Specifier.CHAR)) {
fieldName += "C";
} else if (fields.hasType(Specifier.CHAR) && fields.getDeclarator(Declarator.POINTER) != null) {
fieldName += "Ljava/lang/String;";
}
this.emitDirective(Directive.FIELD_PUBLIC, fieldName);
fields = fields.getNextSymbol();
} while (fields != null);
this.emitDirective(Directive.METHOD_PUBLIC, "<init>()V");
this.emit(Instruction.ALOAD, "0");
String superInit = "java/lang/Object/<init>()V";
this.emit(Instruction.INVOKESPECIAL, superInit);
fields = struct.getFields();
do {
this.emit(Instruction.ALOAD, "0");
String fieldName = struct.getTag() + "/" + fields.getName();
String fieldType = "";
if (fields.hasType(Specifier.INT)) {
fieldType = "I";
this.emit(Instruction.SIPUSH, "0");
} else if (fields.hasType(Specifier.CHAR)) {
fieldType = "C";
this.emit(Instruction.SIPUSH, "0");
} else if (fields.hasType(Specifier.CHAR) && fields.getDeclarator(Declarator.POINTER) != null) {
fieldType = "Ljava/lang/String;";
this.emit(Instruction.LDC, " ");
}
String classField = fieldName + " " + fieldType;
this.emit(Instruction.PUTFIELD, classField);
fields = fields.getNextSymbol();
} while (fields != null);
this.emit(Instruction.RETURN);
this.emitDirective(Directive.END_METHOD);
this.emitDirective(Directive.END_CLASS);
this.setClassDefinition(false);
}
獲取堆疊資訊
其它有關Java位元組碼其實都是根據上一篇來完成的,邏輯不復雜,現在來看一個方法:getLocalVariableIndex,這個方法是獲取變數當前在佇列裡的位置的
- 先拿到當前執行的函式,然後拿到函式的對應引數,再反轉(這和引數壓棧的順序有關)
- 然後把當前符號對應作用域的符號都新增到列表裡
- 之後遍歷這個列表就可以算出這個符號對應在佇列裡的位置
public int getLocalVariableIndex(Symbol symbol) {
TypeSystem typeSys = TypeSystem.getInstance();
String funcName = nameStack.peek();
Symbol funcSym = typeSys.getSymbolByText(funcName, 0, "main");
ArrayList<Symbol> localVariables = new ArrayList<>();
Symbol s = funcSym.getArgList();
while (s != null) {
localVariables.add(s);
s = s.getNextSymbol();
}
Collections.reverse(localVariables);
ArrayList<Symbol> list = typeSys.getSymbolsByScope(symbol.getScope());
for (int i = 0; i < list.size(); i++) {
if (!localVariables.contains(list.get(i))) {
localVariables.add(list.get(i));
}
}
for (int i = 0; i < localVariables.size(); i++) {
if (localVariables.get(i) == symbol) {
return i;
}
}
return -1;
}
小結
這一篇主要是根據上一篇的JVM位元組碼來對不同的操作提供不同的方法來去輸出這些指令
歡迎St