To be a Tough Man——liushuaikobe
阿新 • • 發佈:2019-01-07
<while, _>
<(, _>
<id, 指向i的符號表項的指標>
<>=, _>
<id, 指向j的符號表項的指標>
<), _>
<id, 指向i的符號表項的指標>
<--, _>
<;, _>
詞法分析分析器作為一個獨立子程式:詞法分析是編譯過程中的一個階段,在語法分析前進行。詞法分析作為一遍,可以簡化設計,改進編譯效率,增加編譯系統的可移植性。也可以和語法分析結合在一起作為一遍,由語法分析程式呼叫詞法分析程式來獲得當前單詞供語法分析使用。
-----------------------------------------------------------------------------------
我寫的這個詞法分析器,不是很健全,尤其是錯誤處理機制,像在字串識別中,'ab'是C語言中不合法的char變數,但是我的詞法分析器不能判斷出錯誤,會死迴圈;此外,只能識別出有限的關鍵字、有限形式的字串(相信讀者看懂我的狀態機就知道哪裡有限了),由於時間不夠了,我不想再改了,下面貼出程式碼,供大家參考。
對了,貼程式碼之前,先說說我的詞法分析器的狀態機的設計。
我對“數字”的詞法分析用了一個狀態機,包括浮點數、整形數,狀態機如下:
對“字元(串)”的識別用了一個狀態機,包括關鍵字、char、以及char *,如下:
當然,對C語言的註釋的識別也用了一個狀態機,必須先把原始碼中的註釋cut掉才能進行分析,如下:
我對運算子的識別(包括雙目和單目)沒有采用明顯的狀態機,都是直接分析判斷的,實際從某種意義上來講對它們的分析也是採用了狀態機的原理,只是狀態機結構比較簡單,就沒再顯式用state表示,它們的狀態機實際上如下:
下面上程式碼:
Scanner.py,作為主模組來執行:
'''
Created on 2012-10-18
@author: liushuai
'''
import string
import Category
import FileAccess
_currentIndex = 0
_Tokens = []
_prog = ""
_categoryNo = -1
_stateNumber = 0
_stateString = 0
_potentialNumber = ""
_potentialString = ""
def readComments(prog):
'''Read the comments of a program'''
state = 0
currentIndex, beginIndex, endIndex = (0, 0, 0)
commentsIndexs = []
for c in prog:
if state == 0:
if c == '/':
beginIndex = currentIndex
state = 1
else:
pass
elif state == 1:
if c == '*':
state = 2
else :
state = 0
elif state == 2:
if c == '*':
state = 3
else:
pass
elif state == 3:
if c == '*':
pass
elif c == '/':
endIndex = currentIndex
commentsIndexs.append([beginIndex, endIndex])
state = 0 #set 0 state
else:
state = 2
currentIndex += 1
return commentsIndexs
def cutComments(prog, commentsIndexs):
'''cut the comments of the program prog'''
num = len(commentsIndexs)
if num == 0:
return prog
else :
comments = []
for i in xrange(num):
comments.append(prog[commentsIndexs[i][0]:commentsIndexs[i][1] + 1])
for item in comments:
prog = prog.replace(item, "")
return prog
def scan(helper):
'''scan the program, and analysis it'''
global _stateNumber, _stateString, _currentIndex, _Tokens, _prog, _categoryNo, _potentialNumber, _potentialString
currentChar = _prog[_currentIndex]
######################################CHAR STRING######################################
if currentChar == '\'' or currentChar == '\"' or currentChar in string.letters + "_$\\%\@" or (currentChar in string.digits and _stateString != 0):
if _stateString == 0:
if currentChar == '\'':
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 1
_currentIndex += 1
elif currentChar == "\"":
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 2
_currentIndex += 1
elif currentChar in string.letters + "$_":
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 7
_currentIndex += 1
else:
_currentIndex += 1
_stateNumber = 10
elif _stateString == 1:
if currentChar in string.letters + "# [email protected]%":
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 3
_currentIndex += 1
elif currentChar == '\\':
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 9
_currentIndex += 1
else:
_currentIndex += 1
_stateNumber = 10
elif _stateString == 2:
if currentChar in string.letters + "\\% ":
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 4
_currentIndex += 1
else:
_currentIndex += 1
_stateNumber = 10
elif _stateString == 3:
if currentChar == '\'':
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 5
_currentIndex += 1
else:
_currentIndex += 1
_stateNumber = 10
elif _stateString == 4:
if currentChar == '\"':
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 6
_currentIndex += 1
elif currentChar in string.letters + "\\% ":
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 4
_currentIndex += 1
else:
_currentIndex += 1
_stateNumber = 10
elif _stateString == 7:
if currentChar in string.digits + string.letters + "$_":
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 8
_currentIndex += 1
else:
_currentIndex += 1
_stateNumber = 10
elif _stateString == 8:
if currentChar in string.digits + string.letters + "$_":
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 8
_currentIndex += 1
else:
_currentIndex += 1
_stateNumber = 10
elif _stateString == 9:
if currentChar in ['b', 'n', 't', '\\', '\'', '\"']:
_potentialString = "%s%s" % (_potentialString, currentChar)
_stateString = 3
_currentIndex += 1
else:
_currentIndex += 1
_stateNumber = 10
else:
_currentIndex += 1
###################################### NUMBERS ######################################
elif currentChar in string.digits + ".":
if _stateNumber == 0:
if currentChar in "123456789":
_potentialNumber = "%s%s" % (_potentialNumber, currentChar)
_stateNumber = 6
_currentIndex += 1
elif currentChar == '0':
_potentialNumber = "%s%s" % (_potentialNumber, currentChar)
_stateNumber = 4
_currentIndex += 1
else:
_stateNumber = 8
_currentIndex += 1
elif _stateNumber == 4:
if currentChar == '.':
_potentialNumber = "%s%s" % (_potentialNumber, currentChar)
_stateNumber = 5
_currentIndex += 1
else:
_stateNumber = 8
_currentIndex += 1
elif _stateNumber == 5:
if currentChar in string.digits:
_potentialNumber = "%s%s" % (_potentialNumber, currentChar)
_stateNumber = 7
_currentIndex += 1
else:
_stateNumber = 8
_currentIndex += 1
elif _stateNumber == 6:
if currentChar in string.digits:
_potentialNumber = "%s%s" % (_potentialNumber, currentChar)
_stateNumber = 6
_currentIndex += 1
elif currentChar == '.':
_potentialNumber = "%s%s" % (_potentialNumber, currentChar)
_stateNumber = 5
_currentIndex += 1
else:
_stateNumber = 8
_currentIndex += 1
elif _stateNumber == 7:
if currentChar in string.digits:
_potentialNumber = "%s%s" % (_potentialNumber, currentChar)
_stateNumber = 7
_currentIndex += 1
else:
_stateNumber = 8
_currentIndex += 1
else:
_currentIndex += 1
######################################OTEAR OPERATERS######################################
else:
if _stateNumber == 6 or _stateNumber == 4:
helper.outPutToken(_potentialNumber, "INT", Category.IdentifierTable["INT"])
elif _stateNumber == 7:
helper.outPutToken(_potentialNumber, "FLOAT", Category.IdentifierTable["FLOAT"])
elif _stateNumber != 0:
helper.outPutToken("ERROR NUMBER", "None", "None")
_stateNumber = 0
_potentialNumber = ""
if _stateString == 7 or _stateString == 8:
if _potentialString in Category.KeyWordsTable:
helper.outPutToken(_potentialString, _potentialString.upper(), Category.IdentifierTable[_potentialString.upper()])
else:
helper.outPutToken(_potentialString, "IDN" , Category.IdentifierTable["IDN"])
helper.setSymbolTable(_potentialString, "IDN" , Category.IdentifierTable["IDN"])
elif _stateString == 5:
helper.outPutToken(_potentialString, "CHAR", Category.IdentifierTable["CHAR"])
elif _stateString == 6:
helper.outPutToken(_potentialString, "CHAR *", Category.IdentifierTable["CHAR *"])
elif _stateString != 0:
helper.outPutToken("ERROR STRING", "None", "None")
_stateString = 0
_potentialString = ""
if currentChar == " ":
_currentIndex += 1
elif currentChar == '>':
_currentIndex += 1
currentChar = _prog[_currentIndex]
if currentChar == "=":
helper.outPutToken(">=", ">=", Category.IdentifierTable[">="])
_currentIndex += 1
else :
helper.outPutToken(">", ">", Category.IdentifierTable[">"])
elif currentChar == '<':
_currentIndex += 1
currentChar = _prog[_currentIndex]
if currentChar == "=":
helper.outPutToken("<=", "<=", Category.IdentifierTable["<="])
_currentIndex += 1
else :
helper.outPutToken("<", "<", Category.IdentifierTable["<"])
elif currentChar == '+':
_currentIndex += 1
currentChar = _prog[_currentIndex]
if currentChar == '+':
helper.outPutToken("++", "++", Category.IdentifierTable["++"])
_currentIndex += 1
else :
helper.outPutToken("+", "+", Category.IdentifierTable["+"])
elif currentChar == '-':
_currentIndex += 1
currentChar = _prog[_currentIndex]
if currentChar == '-':
helper.outPutToken("--", "--", Category.IdentifierTable["--"])
else:
helper.outPutToken("-", "-", Category.IdentifierTable["-"])
elif currentChar == '=':
_currentIndex += 1
currentChar = _prog[_currentIndex]
if currentChar == '=':
helper.outPutToken("==", "==", Category.IdentifierTable["=="])
_currentIndex += 1
else :
helper.outPutToken("=", "=", Category.IdentifierTable["="])
elif currentChar == '!':
_currentIndex += 1
currentChar = _prog[_currentIndex]
if currentChar == '=':
helper.outPutToken("!=", "!=", Category.IdentifierTable["!="])
_currentIndex += 1
else :
helper.outPutToken("!", "!", Category.IdentifierTable["!"])
elif currentChar == '&':
_currentIndex += 1
currentChar = _prog[_currentIndex]
if currentChar == '&':
helper.outPutToken("&&", "&&", Category.IdentifierTable["&&"])
_currentIndex += 1
else :
helper.outPutToken("&", "&", Category.IdentifierTable["&"])
elif currentChar == '|':
_currentIndex += 1
currentChar = _prog[_currentIndex]
if currentChar == '|':
helper.outPutToken("||", "||", Category.IdentifierTable["||"])
_currentIndex += 1
else :
helper.outPutToken("|", "|", Category.IdentifierTable["||"])
elif currentChar == '*':
helper.outPutToken("*", "*", Category.IdentifierTable["*"])
_currentIndex += 1
elif currentChar == '/':
helper.outPutToken("/", "/", Category.IdentifierTable["/"])
_currentIndex += 1
elif currentChar == ';':
helper.outPutToken(";", ";", Category.IdentifierTable[";"])
_currentIndex += 1
elif currentChar == ",":
helper.outPutToken(",", ",", Category.IdentifierTable[","])
_currentIndex += 1
elif currentChar == '{':
helper.outPutToken("{", "{", Category.IdentifierTable["{"])
_currentIndex += 1
elif currentChar == '}':
helper.outPutToken("}", "}", Category.IdentifierTable["}"])
_currentIndex += 1
elif currentChar == '[':
helper.outPutToken("[", "[", Category.IdentifierTable["["])
_currentIndex += 1
elif currentChar == ']':
helper.outPutToken("]", "]", Category.IdentifierTable["]"])
_currentIndex += 1
elif currentChar == '(':
helper.outPutToken("(", "(", Category.IdentifierTable["("])
_currentIndex += 1
elif currentChar == ')':
helper.outPutToken(")", ")", Category.IdentifierTable[")"])
_currentIndex += 1
if __name__ == '__main__':
helper = FileAccess.FileHelper("H://test.c", "H://token.txt", "H://symbol_table.txt")
prog = helper.readProg()
print prog
comments = readComments(prog)
_prog = cutComments(prog, comments)
print _prog
while _currentIndex < len(_prog):
scan(helper)
helper.closeFiles()
Category.py,這個模組裡面定義了一些C語言中的關鍵字、運算子等等,是種別碼錶: