To be a Tough Man——liushuaikobe

阿新 • • 發佈：2019-01-07

   <while, _>
   <(, _>
   <id, 指向i的符號表項的指標>
   <>=, _>
   <id, 指向j的符號表項的指標>
   <), _>
   <id, 指向i的符號表項的指標>
   <--, _>
   <;, _>

詞法分析分析器作為一個獨立子程式：
詞法分析是編譯過程中的一個階段，在語法分析前進行。詞法分析作為一遍，可以簡化設計，改進編譯效率，增加編譯系統的可移植性。也可以和語法分析結合在一起作為一遍，由語法分析程式呼叫詞法分析程式來獲得當前單詞供語法分析使用。

-----------------------------------------------------------------------------------

我寫的這個詞法分析器，不是很健全，尤其是錯誤處理機制，像在字串識別中，'ab'是C語言中不合法的char變數，但是我的詞法分析器不能判斷出錯誤，會死迴圈；此外，只能識別出有限的關鍵字、有限形式的字串（相信讀者看懂我的狀態機就知道哪裡有限了），由於時間不夠了，我不想再改了，下面貼出程式碼，供大家參考。

對了，貼程式碼之前，先說說我的詞法分析器的狀態機的設計。

我對“數字”的詞法分析用了一個狀態機，包括浮點數、整形數，狀態機如下：

對“字元（串）”的識別用了一個狀態機，包括關鍵字、char、以及char *，如下：

當然，對C語言的註釋的識別也用了一個狀態機，必須先把原始碼中的註釋cut掉才能進行分析，如下：

我對運算子的識別（包括雙目和單目）沒有采用明顯的狀態機，都是直接分析判斷的，實際從某種意義上來講對它們的分析也是採用了狀態機的原理，只是狀態機結構比較簡單，就沒再顯式用state表示，它們的狀態機實際上如下：

下面上程式碼：

Scanner.py，作為主模組來執行：

'''
Created on 2012-10-18

@author: liushuai
'''
import string
import Category
import FileAccess

_currentIndex = 0
_Tokens = []
_prog = ""
_categoryNo = -1

_stateNumber = 0
_stateString = 0
_potentialNumber = ""
_potentialString = ""

def readComments(prog):
    '''Read the comments of a program'''
    state = 0
    currentIndex, beginIndex, endIndex = (0, 0, 0)
    commentsIndexs = []
    for c in prog:
        if state == 0:
            if c == '/':
                beginIndex = currentIndex
                state = 1
            else:
                pass
        elif state == 1:
            if c == '*':
                state = 2
            else :
                state = 0
        elif state == 2:
            if c == '*':
                state = 3
            else:
                pass
        elif state == 3:
            if c == '*':
                pass
            elif c == '/':
                endIndex = currentIndex
                commentsIndexs.append([beginIndex, endIndex])
                state = 0 #set 0 state
            else:
                state = 2
        currentIndex += 1
    return commentsIndexs
        
def cutComments(prog, commentsIndexs):
    '''cut the comments of the program prog'''
    num = len(commentsIndexs)
    if num == 0:
        return prog
    else :
        comments = []
        for i in xrange(num):
            comments.append(prog[commentsIndexs[i][0]:commentsIndexs[i][1] + 1])
        for item in comments:
            prog = prog.replace(item, "")
        return prog
    
def scan(helper):
    '''scan the program, and analysis it'''
    global _stateNumber, _stateString, _currentIndex, _Tokens, _prog, _categoryNo, _potentialNumber, _potentialString
    currentChar = _prog[_currentIndex]
    ######################################CHAR STRING######################################
    if currentChar == '\'' or currentChar == '\"' or currentChar in string.letters + "_$\\%\@"  or (currentChar in string.digits and _stateString != 0):
        if _stateString == 0:
            if currentChar == '\'':
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 1
                _currentIndex += 1
            elif currentChar == "\"":
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 2
                _currentIndex += 1
            elif currentChar in string.letters + "$_":
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 7
                _currentIndex += 1
            else:
                _currentIndex += 1
                _stateNumber = 10
        elif _stateString == 1:
            if currentChar in string.letters + "# 
[email protected]%":
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 3
                _currentIndex += 1
            elif currentChar == '\\':
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 9
                _currentIndex += 1
            else:
                _currentIndex += 1
                _stateNumber = 10
        elif _stateString == 2:
            if currentChar in string.letters + "\\% ":
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 4
                _currentIndex += 1
            else:
                _currentIndex += 1
                _stateNumber = 10
        elif _stateString == 3:
            if currentChar == '\'':
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 5
                _currentIndex += 1
            else:
                _currentIndex += 1
                _stateNumber = 10
        elif _stateString == 4:
            if currentChar == '\"':
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 6
                _currentIndex += 1
            elif currentChar in string.letters + "\\% ":
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 4
                _currentIndex += 1
            else:
                _currentIndex += 1
                _stateNumber = 10
        elif _stateString == 7:
            if currentChar in string.digits + string.letters + "$_":
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 8
                _currentIndex += 1
            else:
                _currentIndex += 1
                _stateNumber = 10
        elif _stateString == 8:
            if currentChar in string.digits + string.letters + "$_":
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 8
                _currentIndex += 1
            else:
                _currentIndex += 1
                _stateNumber = 10
        elif _stateString == 9:
            if currentChar in ['b', 'n', 't', '\\', '\'', '\"']:
                _potentialString = "%s%s" % (_potentialString, currentChar)
                _stateString = 3
                _currentIndex += 1
            else:
                _currentIndex += 1
                _stateNumber = 10
        else:
            _currentIndex += 1
    ######################################  NUMBERS  ######################################
    elif currentChar in string.digits + ".":
        if _stateNumber == 0:
            if currentChar in "123456789":
                _potentialNumber = "%s%s" % (_potentialNumber, currentChar)
                _stateNumber = 6
                _currentIndex += 1
            elif currentChar == '0':
                _potentialNumber = "%s%s" % (_potentialNumber, currentChar)
                _stateNumber = 4
                _currentIndex += 1
            else:
                _stateNumber = 8
                _currentIndex += 1
        elif _stateNumber == 4:
            if currentChar == '.':
                _potentialNumber = "%s%s" % (_potentialNumber, currentChar)
                _stateNumber = 5
                _currentIndex += 1
            else:
                _stateNumber = 8
                _currentIndex += 1
        elif _stateNumber == 5:
            if currentChar in string.digits:
                _potentialNumber = "%s%s" % (_potentialNumber, currentChar)
                _stateNumber = 7
                _currentIndex += 1
            else:
                _stateNumber = 8
                _currentIndex += 1
        elif _stateNumber == 6: 
            if currentChar in string.digits:
                _potentialNumber = "%s%s" % (_potentialNumber, currentChar)
                _stateNumber = 6
                _currentIndex += 1
            elif currentChar == '.':
                _potentialNumber = "%s%s" % (_potentialNumber, currentChar)
                _stateNumber = 5
                _currentIndex += 1
            else:
                _stateNumber = 8
                _currentIndex += 1
        elif _stateNumber == 7:
            if currentChar in string.digits:
                _potentialNumber = "%s%s" % (_potentialNumber, currentChar)
                _stateNumber = 7
                _currentIndex += 1
            else:
                _stateNumber = 8
                _currentIndex += 1
        else:
            _currentIndex += 1
    ######################################OTEAR OPERATERS######################################
    else:
        if _stateNumber == 6 or _stateNumber == 4:
            helper.outPutToken(_potentialNumber, "INT", Category.IdentifierTable["INT"])
        elif _stateNumber == 7:
            helper.outPutToken(_potentialNumber, "FLOAT", Category.IdentifierTable["FLOAT"])
        elif _stateNumber != 0:
            helper.outPutToken("ERROR NUMBER", "None", "None")
        _stateNumber = 0
        _potentialNumber = ""
        
        if _stateString == 7 or _stateString == 8:
            if _potentialString in Category.KeyWordsTable:
                helper.outPutToken(_potentialString, _potentialString.upper(), Category.IdentifierTable[_potentialString.upper()])
            else:
                helper.outPutToken(_potentialString, "IDN" , Category.IdentifierTable["IDN"])
                helper.setSymbolTable(_potentialString, "IDN" , Category.IdentifierTable["IDN"])
        elif _stateString == 5:
            helper.outPutToken(_potentialString, "CHAR", Category.IdentifierTable["CHAR"])
        elif _stateString == 6:
            helper.outPutToken(_potentialString, "CHAR *", Category.IdentifierTable["CHAR *"])
        elif _stateString != 0:
            helper.outPutToken("ERROR STRING", "None", "None")
        _stateString = 0
        _potentialString = ""
        
        if currentChar == " ":
            _currentIndex += 1
        elif currentChar == '>':
            _currentIndex += 1
            currentChar = _prog[_currentIndex]
            if currentChar == "=":
                helper.outPutToken(">=", ">=", Category.IdentifierTable[">="])
                _currentIndex += 1
            else :
                helper.outPutToken(">", ">", Category.IdentifierTable[">"])
        elif currentChar == '<':
            _currentIndex += 1
            currentChar = _prog[_currentIndex]
            if currentChar == "=":
                helper.outPutToken("<=", "<=", Category.IdentifierTable["<="])
                _currentIndex += 1
            else :
                helper.outPutToken("<", "<", Category.IdentifierTable["<"])
        elif currentChar == '+':
            _currentIndex += 1
            currentChar = _prog[_currentIndex]
            if currentChar == '+':
                helper.outPutToken("++", "++", Category.IdentifierTable["++"])
                _currentIndex += 1
            else :
                helper.outPutToken("+", "+", Category.IdentifierTable["+"])
        elif currentChar == '-':
            _currentIndex += 1
            currentChar = _prog[_currentIndex]
            if currentChar == '-':
                helper.outPutToken("--", "--", Category.IdentifierTable["--"])
            else:
                helper.outPutToken("-", "-", Category.IdentifierTable["-"])
        elif currentChar == '=':
            _currentIndex += 1
            currentChar = _prog[_currentIndex]
            if currentChar == '=':
                helper.outPutToken("==", "==", Category.IdentifierTable["=="])
                _currentIndex += 1
            else :
                helper.outPutToken("=", "=", Category.IdentifierTable["="])
        elif currentChar == '!':
            _currentIndex += 1
            currentChar = _prog[_currentIndex]
            if currentChar == '=':
                helper.outPutToken("!=", "!=", Category.IdentifierTable["!="])
                _currentIndex += 1
            else :
                helper.outPutToken("!", "!", Category.IdentifierTable["!"])  
        elif currentChar == '&':
            _currentIndex += 1
            currentChar = _prog[_currentIndex]
            if currentChar == '&':
                helper.outPutToken("&&", "&&", Category.IdentifierTable["&&"])
                _currentIndex += 1
            else :
                helper.outPutToken("&", "&", Category.IdentifierTable["&"])
        elif currentChar == '|':
            _currentIndex += 1
            currentChar = _prog[_currentIndex]
            if currentChar == '|':
                helper.outPutToken("||", "||", Category.IdentifierTable["||"])
                _currentIndex += 1
            else :
                helper.outPutToken("|", "|", Category.IdentifierTable["||"])    
        elif currentChar == '*':
            helper.outPutToken("*", "*", Category.IdentifierTable["*"])
            _currentIndex += 1
        elif currentChar == '/':
            helper.outPutToken("/", "/", Category.IdentifierTable["/"])
            _currentIndex += 1
        elif currentChar == ';':
            helper.outPutToken(";", ";", Category.IdentifierTable[";"])
            _currentIndex += 1
        elif currentChar == ",":
            helper.outPutToken(",", ",", Category.IdentifierTable[","])
            _currentIndex += 1
        elif currentChar == '{':
            helper.outPutToken("{", "{", Category.IdentifierTable["{"])
            _currentIndex += 1
        elif currentChar == '}':
            helper.outPutToken("}", "}", Category.IdentifierTable["}"])
            _currentIndex += 1
        elif currentChar == '[':
            helper.outPutToken("[", "[", Category.IdentifierTable["["])
            _currentIndex += 1
        elif currentChar == ']':
            helper.outPutToken("]", "]", Category.IdentifierTable["]"])
            _currentIndex += 1
        elif currentChar == '(':
            helper.outPutToken("(", "(", Category.IdentifierTable["("])
            _currentIndex += 1
        elif currentChar == ')':
            helper.outPutToken(")", ")", Category.IdentifierTable[")"])
            _currentIndex += 1
            

if __name__ == '__main__':
    helper = FileAccess.FileHelper("H://test.c", "H://token.txt", "H://symbol_table.txt")
    prog = helper.readProg()
    print prog
    comments = readComments(prog)
    _prog = cutComments(prog, comments)
    print _prog
    while _currentIndex < len(_prog):
        scan(helper)
    helper.closeFiles()

Category.py，這個模組裡面定義了一些C語言中的關鍵字、運算子等等，是種別碼錶：

To be a Tough Man——liushuaikobe

To be a Tough Man——liushuaikobe

To Be a Dog Man

To be a Better man

To be a better man！

To be a better man~

I have a dream to be a real man,a man of.....

I want to be a Great Web Front-end Developer

執行git命令時出現fatal: 'origin' does not appear to be a git repository錯誤

Umbraco項目發布錯誤 --More than one type want to be a model for content type authorize

IOError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a

I'm going to be a coder !

The server cannot or will not process the request due to something that is perceived to be a client

origin does not to be a git repository 問題解決

China plans to be a world leader in Artificial Intelligence by 2030

Sask. weather likely to be a challenge for self

Ask HN: How to be a Head of Engineering?

Box and other minicars prove to be a hit with Japan's elderly drivers | AITopics

HOW TO BE A GOOD LEADER

So You Want to be a Functional Programmer (Part 1)

git上傳檔案時出現origin does not to be a git repository

To be a Tough Man——liushuaikobe

相關推薦