如何寫一個簡單的直譯器-1

阿新 • • 發佈：2018-11-02

Lan的原始碼由一些基本元素構成，我們稱之為Token，在詞法分析階段我們需要將輸入的字元流轉化成Token流（簡單說就是Token列表）。

下面是Token的型別定義，為了節省資源採用整數表示而不用列舉型別。

public class TokenType {
    public static final int PLUS        = 0;//("+")
    public static final int PLUSPLUS    = 1;//("++")
    public static final int MINUS       = 2;//("-")
    public 
 static final int MINUSMINUS  = 3;//("--")
    public static final int ASTERISK    = 4;//("*")
    public static final int SLASH       = 5;//("/")
    public static final int PERCENT     = 6;//("%")
    public static final int EQUAL       = 7;//("==")
    public static final int NOT_EQUAL   = 8;//("!=")
    public 
 static final int GT          = 9;//(">")
    public static final int GE          = 10;//(">=")
    public static final int LT          = 11;//("<")
    public static final int LE          = 12;//("<=")
    public static final int AND         = 13;//("&&")
    public static final int OR          = 14 
;//("||")
    public static final int BANG        = 15;//("!")
    public static final int LEFT_PAREN  = 16;//("(")
    public static final int RIGHT_PAREN = 17;//(")")
    public static final int LEFT_BRACE  = 18;//("{")
    public static final int RIGHT_BRACE = 19;//("}")
    public static final int COMMA       = 20;//(",")
    public static final int QUESTION    = 21;//("?")
    public static final int COLON       = 22;//(":")
    public static final int NUMBER      = 23;//("數值")
    public static final int STRING      = 24;//("字串")
    public static final int ASSIGN      = 25;//("=")
    public static final int TRUE        = 26;//("true")
    public static final int FALSE       = 27;//("false")
    public static final int NULL        = 28;//("null")
    public static final int IDENTIFIER  = 29;//("變數名")
    public static final int IF          = 30;//("if")
    public static final int ELSE        = 31;//("else")
    public static final int WHILE       = 32;//("while")
    public static final int BREAK       = 33;//("break")
    public static final int CONTINUE    = 34;//("continue")
    public static final int PRINT       = 35;//("print")
    public static final int FUNC        = 36;//("func")
    public static final int RETURN      = 37;//("return")
    public static final int EOF         = 38;//("末尾")
}

每種型別代表的內容看後面的註釋即可，沒有值得解釋的內容。然後定義Token的結構。

public class Token {
    public int type; //Token型別
    public String symbol; //Token內容，TokenType類中的註釋
    public int line; //Token所在原始碼的行號
    public Token(int type, String symbol, int line) {
        this.type = type;
        this.symbol = symbol;
        this.line = line;
    }
}

最後就是詞法分析器，我們稱之為Lexer。註釋部分已經解釋得很清楚了，沒有什麼難度。

public class Lexer {
    //關鍵字字典，每次從原始碼中取到符號後都要依此判斷是否為關鍵字
    private Map<String, Integer> keywordsFilter;
    public Lexer() {
        //初始化關鍵字字典
        keywordsFilter = new HashMap<>();
        keywordsFilter.put("true", TokenType.TRUE);
        keywordsFilter.put("false", TokenType.FALSE);
        keywordsFilter.put("null", TokenType.NULL);
        keywordsFilter.put("if", TokenType.IF);
        keywordsFilter.put("else", TokenType.ELSE);
        keywordsFilter.put("while", TokenType.WHILE);
        keywordsFilter.put("break", TokenType.BREAK);
        keywordsFilter.put("continue", TokenType.CONTINUE);
        keywordsFilter.put("print", TokenType.PRINT);
        keywordsFilter.put("func", TokenType.FUNC);
        keywordsFilter.put("return", TokenType.RETURN);
    }
    public List<Token> lex(String code) {
        //該列表用於儲存所有需要返回的Token
        List<Token> tokens = new ArrayList<>();
        //從原始碼中獲取字元的索引
        int index = 0;
        //記錄Token在原始碼中的行號
        int currentLine = 1;
        //原始碼的總字元長度
        int codeLength = code.length();
        while (index < codeLength) {
            //取出下一個字元，並且將索引加1
            char c = code.charAt(index++);
            //如果是空格，回車，製表符號直接跳過並進入下一次迴圈
            if (c == ' ' || c == '\r' || c == '\t') continue;
            //如果是換行符則將當前行號加1並進入下一次迴圈
            if (c == '\n') {
                currentLine++;
                continue;
            }
            if (c == '+') {
                if (index < codeLength && code.charAt(index) == '+') {
                    index++;
                    tokens.add(new Token(TokenType.PLUSPLUS, "++", currentLine));
                } else {
                    tokens.add(new Token(TokenType.PLUS, "+", currentLine));
                }
            } else if (c == '-') {
                if (index < codeLength && code.charAt(index) == '-') {
                    index++;
                    tokens.add(new Token(TokenType.MINUSMINUS, "--", currentLine));
                } else {
                    tokens.add(new Token(TokenType.MINUS, "-", currentLine));
                }
            } else if (c == '*') {
                tokens.add(new Token(TokenType.ASTERISK, "*", currentLine));
            } else if (c == '/') {
                if (index < codeLength && code.charAt(index) == '/') {//忽略註釋
                    do {
                        index++;
                    } while (index < codeLength && code.charAt(index) != '\n');
                } else {
                    tokens.add(new Token(TokenType.SLASH, "/", currentLine));
                }
            } else if (c == '%') {
                tokens.add(new Token(TokenType.PERCENT, "%", currentLine));
            } else if (c == '(') {
                tokens.add(new Token(TokenType.LEFT_PAREN, "(", currentLine));
            } else if (c == ')') {
                tokens.add(new Token(TokenType.RIGHT_PAREN, ")", currentLine));
            } else if (c == '{') {
                tokens.add(new Token(TokenType.LEFT_BRACE, "{", currentLine));
            } else if (c == '}') {
                tokens.add(new Token(TokenType.RIGHT_BRACE, "}", currentLine));
            } else if (c == ',') {
                tokens.add(new Token(TokenType.COMMA, ",", currentLine));
            } else if (c == '?') {
                tokens.add(new Token(TokenType.QUESTION, "?", currentLine));
            } else if (c == ':') {
                tokens.add(new Token(TokenType.COLON, ":", currentLine));
            } else if (c == '>') {
                if (index < codeLength && code.charAt(index) == '=') {
                    index++;
                    tokens.add(new Token(TokenType.GE, ">=", currentLine));
                } else {
                    tokens.add(new Token(TokenType.GT, ">", currentLine));
                }
            } else if (c == '<') {
                if (index < codeLength && code.charAt(index) == '=') {
                    index++;
                    tokens.add(new Token(TokenType.LE, "<=", currentLine));
                } else {
                    tokens.add(new Token(TokenType.LT, "<", currentLine));
                }
            } else if (c == '!') {
                if (index < codeLength && code.charAt(index) == '=') {
                    index++;
                    tokens.add(new Token(TokenType.NOT_EQUAL, "!=", currentLine));
                } else {
                    tokens.add(new Token(TokenType.BANG, "!", currentLine));
                }
            } else if (c == '|') {
                if (index < codeLength && code.charAt(index) == '|') {
                    index++;
                    tokens.add(new Token(TokenType.OR, "||", currentLine));
                } else {
                    throw new RuntimeException("Lexer Error: expect '|'");
                }
            } else if (c == '&') {
                if (index < codeLength && code.charAt(index) == '&') {
                    index++;
                    tokens.add(new Token(TokenType.AND, "&&", currentLine));
                } else {
                    throw new RuntimeException("Lexer Error: expect '&'");
                }
            } else if (c == '=') {
                if (index < codeLength && code.charAt(index) == '=') {
                    index++;
                    tokens.add(new Token(TokenType.EQUAL, "==", currentLine));
                } else {
                    tokens.add(new Token(TokenType.ASSIGN, "=", currentLine));
                }
            } else if (Character.isDigit(c)) {//數字
                int start = --index;
                do {
                    if (++index >= code.length()) break;
                    c = code.charAt(index);
                }
                while (Character.isDigit(c));
                tokens.add(new Token(TokenType.NUMBER, code.substring(start, index), currentLine));
            } else if (Character.isAlphabetic(c)) {//符號
                int start = --index;
                do {
                    if (++index >= code.length()) break;
                    c = code.charAt(index);
                }
                while (Character.isAlphabetic(c));
                String word = code.substring(start, index);
                Integer type = keywordsFilter.get(word);
                Token token = new Token(type == null ? TokenType.IDENTIFIER : type, word, currentLine);
                tokens.add(token);
            } else if (c == '"') {//字串字面量
                int start = index;
                do {
                    if (index >= code.length()) break;
                    c = code.charAt(index++);
                    if (c == '\n') break;
                }
                while (c != '\"');
                if (c != '\"') {
                    throw new RuntimeException("Lexer Error: expect \"");
                }
                String strLiteral = code.substring(start, index-1);
                tokens.add(new Token(TokenType.STRING, strLiteral, currentLine));
            }
            else {
                throw new RuntimeException(String.format("Lexer Error: unknown character \"%c\"", c));
            }
        }
        tokens.add(new Token(TokenType.EOF, "", currentLine));
        return tokens;
    }
}

最後手動測試一下

public class Main {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        Lexer lexer = new Lexer();
        while (true) {
            System.out.print(">>> ");
            String code = scanner.nextLine();
            if (code.equals(".q")) break;
            List<Token> tokens = lexer.lex(code);
            for (Token token : tokens) {
                System.out.println(token.symbol);
            }
        }
    }
}

如何寫一個簡單的直譯器-1

Lan的原始碼由一些基本元素構成，我們稱之為Token，在詞法分析階段我們需要將輸入的字元流轉化成Token流（簡單說就是Token列表）。下面是Token的型別定義，為了節省資源採用整數表示而不用列舉型別。 public class TokenType { publi

如何寫一個簡單的直譯器-0

在接下來的幾篇文章中，我們一起用Java寫一個簡單的程式語言（我稱之為Lan）直譯器。該語言不會有實際用處，僅僅用於演示Pratt解析演算法。目標讀者是對程式語言的解析感興趣的初學者，當然我也是。先看看Lan的一些程式碼：變數型別（數字，布林值，字串，函式，null） n =

linux設備驅動第三篇：寫一個簡單的字符設備驅動

提示 copy flags 驅動程序相關 clas open ugo param 在linux設備驅動第一篇：設備驅動程序簡介中簡單介紹了字符驅動，本篇簡單介紹如何寫一個簡單的字符設備驅動。本篇借鑒LDD中的源碼，實現一個與硬件設備無關的字符設備驅動，僅僅操

采用jsp頁面與java代碼分離的方式寫一個簡單的二維表

color arraylist 一個 3-9 業務動態顯示復雜分層架構方式前提:在我們做程序時追求的是高內聚，低耦合，但是如果我們把jsp頁面的的代碼和java的代碼都放在了jsp的代碼編寫中，使java和jsp高耦合這樣的話不僅使jsp代碼頁面顯得很復雜，而

寫一個簡單的導航

utf-8 mar title shee 小圖標 list .cn display left 制作一個如下圖的導航按鈕。當鼠標移入導航欄的首頁，商店等字體時，前面的小圖標和字顏色一起變紅！代碼如下： <meta charset="UTF-8"> <

python寫一個簡單的接口

結果服務 web框架簡單的 bsp 16px 這樣的 flask span 寫一個接口： 1、用到的模塊是flask，flask是一個python的一個web框架，可以用來開發接口和web頁面 2、啟動服務的效果是這樣的：用postman測試的結

寫一個簡單的JQ插件(例子)

ont ava 兼容 app js代碼 lsp 是把生成 order 雖然現在 vue angular react 當道啊但是那 JQ還是有一席之地很多很多的小單位啊.其實還會用到我也放一個例子吧雖然我也不是很肯定有沒有人寫的比我更好啊但是我相信我這個還是蠻實用的話不

用集合寫一個簡單的隨機分組，以及集合內元素數量查詢

移除以及表示元素 move spa color 查詢 println 12個人，隨機分為4組 public static void main(String[] args) { List list = new ArrayList();

寫一個簡單的struts2

return 簡單的 index taglib struts2 text apach prepare mil 導包：struts2-core-2.5.1 寫action類， package web; public class HelloWorldAction {

寫一個簡單的servlet

打開 lips cli 簡單找不到 -m 輸入12 右下角 dex 昨天寫完hibernate的小demo後，想寫一個簡單structs2，但是發現好像自己連servlet都忘了怎麽寫了，所以一切從頭開始，先寫一個簡單servlet把第一步肯定是建立項目了，打開自己e

寫一個簡單的配置文件和日誌管理（shell）

客戶端數據時間 r+ socket編程 har stdout scan 語言最近在做一個Linux系統方案的設計，寫了一個之前升級服務程序的配置和日誌管理。共4個文件，服務端一個UpdateServer.conf配置文件和一個UpdateServer腳本，客戶端一個

python寫一個循環1+到10打印計算步驟的腳本——純粹無聊玩的

python寫一個循環1+到10打印計算[root@13cml10 ~]# cat a.py #_*_coding:utf-8_*_for i in range(0,12): for a in range(0,i): print "+", print a, print "=&

寫一個簡單的python腳本來返回ip地址的掩碼，子網個數等

pypi package 多少 ask pri 1.2 bfc pty rom 如果我們想快速得到一個IP地址段有多少個ip，快速得到IP地址段的子網掩碼，或者快速得到一個IP地址的二進制，那麽可以來學習一下。本文利用python的一個IP分析模塊IPy實現，首先安裝IPy

用python寫一個簡單的excel表格獲取當時的linux系統信息

psutil 生成之前建立 set ces ext 流量關閉最近在學習excel表格的制作，順便結合之前學習的內容，利用python的兩個模板，分別是獲取系統信息的psutil，和生成excel表格的xlsxwriter。利用這兩個模板將生成一個簡單的excel表格

寫一個簡單的form表單，當光標離開表單的時候表單的值發送給後臺

bsp name clas blog var tex txt rip () 1 <body> 2 <form action="index.php"> 3 <input type="text" name="txt" id="txt

用Canvas寫一個簡單的遊戲--別踩白塊兒

來吧 ber -c [] for 輸入 itl event 內部　　第一次寫博客也不知怎麽寫，反正就按照我自己的想法來吧！怎麽說呢？還是不要扯那些多余的話了，直接上正題吧! 第一次用canvas寫遊戲，所以挑個簡單實現點的來幹：別踩白塊兒，其他那些怎麽操作的那些就不用再扯

python學習（8）實例：寫一個簡單商城購物車的代碼

商品流程圖 index blog pen 什麽 author 數字 git 要求： 1、寫一段商城程購物車序的代碼2、用列表把商城的商品清單存儲下來，存到列表 shopping_mail3、購物車的列表為shopping_cart4、用戶首先輸入工資金額，判斷輸入為數字5

用shell寫一個簡單的告警系統

shell用shell寫一個簡單的告警系統創建目錄結構 mkdir -p /usr/local/sbin/mon/{bin,conf,shares,mail,log} mon //主目錄 bin //主程序目錄 shares //子程序目錄 mail //發郵件目錄 log //日誌目錄程序主入

寫一個簡單vue 中間件，$emit、$on

發布-訂閱模式 pre 原理 lse 取出 als new on() 訂閱前言使用過vue的同學大多數都知道$emit 與$on的使用。我們僅僅知道使用，有時候是完全不夠的。現在我就帶領大家寫一個簡單類似於vue空實例的中間件。非父子組件的通信非父子組件的通信vue

Layui 寫一個簡單的後臺頁面

觸發 scale item href method pts iframe 都是 rem 參考如下： 1、layui 官方文檔 http://www.layui.com/doc/ 2、https://blog.csdn.net/huyanliang/article/detai

如何寫一個簡單的直譯器-1

相關推薦